The molecular role of GTF2IRD1: a

involved in the

neurodevelopmental abnormalities

of Williams-Beuren syndrome

Paulina Carmona-Mora

A thesis in fulfilment of the requirements for the degree of Doctor of Philosophy

School of Medical Sciences, Faculty of Medicine UNSW Australia Sydney, Australia

May 2015

i

ii

ACKNOWLEDGMENTS

I would like to express my gratitude to the people that have been with me in this important stage of my life. First, infinite thanks to my supervisors, Prof. Edna Hardeman and Dr Stephen Palmer, for your continual support throughout these years and for mentoring me. My PhD has been an intense and challenging journey, both at the professional and personal level, but I am thankful I went through this road with you. Both were always there for understanding and encouraging me, but also to enjoy the rewarding results of this experience. I will definitely take with me what I learned from you, Edna; I admire you as a strong, experienced and successful woman in science. Steve, your attitude so sensible and patient has shaped the way I want to do science.

The Cellular and Genetic Medicine Unit is an enriching and multicultural environment that made my time here so enjoyable. Thanks to Prof. Peter Gunning, your lively interest in science is pure inspiration. I also want to thank the CGMU supervisors Dr Galina Schevzov and Dr Annemiek Beverdam for their advice and delightful conversations. I am grateful for having great people around me in the lab, first of all Florence Tomasetig, who besides being so important in the project, helped me with the experiments and made our days in the lab so vibrant and pleasant, your friendship is a real gem. I also want to thank you April, Wei, Cecilia, Jeff and Annamaria for being so helpful and cheerful. To my fellow PhD students, Bin, Melissa, Iman, Bassem, Veronica and Nadia, we shared the bitter, the sweet and the celebrations of this experience.

I was fortunate to collaborate with great scientists from whom I learned so much. Thanks to Prof. Mark Wilkins and Natalie Twine, from the NSW Systems Biology Initiative, for their help with the microarray experiments. Thanks to Iveta Slapetova and Dr Renee Whan, from the Biomedical Imaging Facility for their excellent training and assistance with the confocal microscopy. And thanks to Dr Mirella Dottori, from the University of Melbourne for sharing her expertise on embryonic stem cells and provide us with neurospheres.

I dedicate my achievements to Cesar, my husband and lab mate, indeed this PhD candidature has been a shared effort, both in the lab and at home; without your understanding and love I would not have been able to finish this work. To my family in Chile, my mom, dad and sisters Emma and Valeria, for their endless support and source iii of strength. To my son, Diego, who without knowing, is my constant motivation for organising and balancing my life and who is my inspiration to go through everything life brings.

iv

PUBLICATIONS ARISING FROM WORK PRESENTED IN THIS THESIS

P. Carmona-Mora, J. Widagdo, F. Tomasetig, CP. Canales, Y. Cha, WS. Lee, A.

Alshawaf, M. Dottori, EC. Hardeman, SJ. Palmer (2015) The nuclear localization pattern and interaction partners of GTF2IRD1 demonstrate a role in chromatin regulation. (Submitted to Human Genetics, in production of revised manuscript).

In preparation:

P. Carmona-Mora, F. Tomasetig, CP. Canales, A. Alshawaf, M. Dottori, JI. Young, L.

Hesson, R. Barres, EC. Hardeman, SJ. Palmer (2015) The epigenetic role of GTF2IRD1 as a means to understand features of Williams-Beuren syndrome.

Selected conference abstracts

P. Carmona-Mora, J. Widagdo, KM. Taylor, R. Tsz-Wai Pang, PW. Gunning, EC.

Hardeman, SJ. Palmer (2012) The molecular role of GTF2IRD1, a protein implicated in the neurodevelopmental features of Williams-Beuren syndrome. 62th Annual Meeting,

The American Society of Human Genetics. San Francisco, CA, USA.

P. Carmona-Mora, J. Widagdo, KM. Taylor, R. Tsz-Wai Pang, F. Tomasetig, NA.

Twine, MR. Wilkins, PW. Gunning, EC. Hardeman, SJ. Palmer (2013) The molecular role of the transcriptional regulator GTF2IR1 in the pathogenesis of Williams-Beuren syndrome. 34th Lorne Genome Conference, Lorne, Vic, Australia.

P. Carmona-Mora, J. Widagdo, F. Tomasetig, KM. Taylor, R. Tsz-Wai Pang, NA.

Twine, MR. Wilkins, PW. Gunning, EC. Hardeman, SJ. Palmer (2013) A

v implicated in the neurobehavioural abnormalities of Williams-Beuren syndrome,

GTF2IRD1, is a novel epigenetic regulator. Genetics Society of Australasia, Sydney,

NSW, Australia.

P. Carmona-Mora, J. Widagdo, F. Tomasetig, KM. Taylor, Y. Cha, R. Tsz-Wai Pang,

NA. Twine, MR. Wilkins, PW. Gunning, EC. Hardeman, SJ. Palmer. A gene implicated in the neurobehavioural abnormalities of Williams-Beuren syndrome, GTF2IRD1, encodes a novel epigenetic regulator (2013) ComBio, Perth, WA, Australia. (Oral presentation)

P. Carmona-Mora, J. Widagdo, F. Tomasetig, KM. Taylor, Y. Cha, R. Tsz-Wai Pang,

NA. Twine, MR. Wilkins, PW. Gunning, EC. Hardeman, SJ. Palmer (2013) A gene implicated in the neurobehavioural abnormalities of Williams-Beuren syndrome,

GTF2IRD1, encodes a novel epigenetic regulator. 63th Annual Meeting The American

Society of Human Genetics, Boston, MA, USA. (Oral presentation)

P. Carmona-Mora, F. Tomasetig, CP. Canales, A. Alshawaf, M. Dottori, EC.

Hardeman, SJ. Palmer (2014) Unravelling epigenetic complexes associated with

GTF2IRD1 to understand the cognitive features of Williams-Beuren syndrome.

ComBio, Canberra, ACT, Australia. (Oral presentation)

P. Carmona-Mora, F. Tomasetig, CP. Canales, A. Alshawaf, M. Dottori, EC.

Hardeman, SJ. Palmer (2014) Defining the presence of GTF2IRD1 in epigenetic complexes as a means to understand features of Williams-Beuren syndrome. 64th

Annual Meeting The American Society of Human Genetics, San Diego, CA, USA.

vi

ABSTRACT

Background: GTF2IRD1 is a member of the GTF2I gene family, located on 7 in a region prone to duplications and deletions in . Hemizygous deletions cause Williams-Beuren syndrome (WBS) and duplications cause WBS duplication syndrome. Human mapping data and analyses of mouse knockouts implicate

GTF2IRD1 as the prime candidate for the craniofacial abnormalities, mental retardation, visuospatial construction deficits and hypersociability of WBS. Aims: The aim of this work was to study the cellular and molecular role of GTF2IRD1 by investigating: i) the cellular localisation of GTF2IRD1; ii) its protein interacting partners; iii) the gene dysregulation caused by GTF2IRD1 loss; and iv) the presence of GTF2IRD1 in epigenetic complexes regulating gene expression.

Results: i) Immunofluorescence analyses in mammalian cell lines and in human ES cell-derived neurons showed endogenous GTF2IRD1 as a nuclear speckle protein. The comparison of this punctate pattern with markers of nuclear sub-compartments and chromatin marks supports an association with developmentally regulated silent chromatin. ii) To define functional relationships, yeast two-hybrid screenings were used to isolate novel interaction partners. Most of the nuclear-localised interactions were validated in mammalian cells, being predominantly involved in chromatin modification and transcriptional regulation. The sites of interaction in GTF2IRD1 were mapped to specific domains. iii) To identify transcriptional changes arising from

GTF2IRD1 loss, microarray studies were conducted in siRNA-treated HeLa cells and tissue from Gtf2ird1 knockout mice. In the corpus striatum, qPCR validation indicated up-regulation of involved in neuronal development and immediate-early response genes that may explain some of the observed neurobehavioural phenotypes. iv)

vii

GTF2IRD1 was found to be involved in chromatin modifying complexes by direct associations with histone deacetylases and can affect their enzymatic activity.

Conclusions: The results of this thesis indicate that GTF2IRD1 forms complexes with

DNA-binding and chromatin modifying proteins to regulate gene expression through epigenetic mechanisms that are controlled in a tissue specific manner. The sites of protein interactions indicate key features regarding the evolution of GTF2IRD1 and integration with tight post-translational regulation, fitting well with the concept of human disease states caused by copy number variation.

viii

ABBREVIATIONS

Ade adenine ADHD attention deficit hyperactive disorder ASD autism spectrum disorder BCA bicinchoninic acid assay BSA bovine serum albumin ChIP chromatin immunoprecipitation Co-IP coimmunoprecipitation CNV copy number variation DDO double dropout DMEM Dulbecco’s modified Eagle’s medium DNA deoxyribonucleic acid DNAse deoxyribonuclease ECL enhanced chemiluminescence GFP green fluorescent protein GUR GTF2IRD1 upstream region HDAC histone deacetylase His Histidine IgG immunoglobulin G IP immunoprecipitation LCR low copy repeat Leu leucine Lys LZ leucine zipper NAHR non-allelic homologous recombination NLS nuclear localisation signal OMIM online mendelian inheritance in man ORF open reading frame PBS phosphate-buffered saline PCR polymerase chain reaction

ix

PFA paraformaldehyde PLA proximity ligation assay QDO quadruple dropout qRT-PCR quantitative real time PCR RD repeat domain RNA ribonucleic acid SDS-PAGE sodium dodecyl sulphate polyacrylamide gel electrophoresis SEM standard error of the mean SUMO small ubiquitin-like modifier SVAS supravalvular aortic stenosis TAE tris-acetate-EDTA buffer Trp tryptophan UV ultra violet WBS Williams-Beuren syndrome WSCP Williams syndrome cognitive profile WBSCR Williams-Beuren syndrome critical region Y2H yeast two-hybrid

x

Table of Contents

ORIGINALITY STATEMENT ...... Error! Bookmark not defined.

ACKNOWLEDGMENTS ...... iii

PUBLICATIONS ...... v

ABSTRACT ...... vii

ABBREVIATIONS ...... ix

LIST OF FIGURES ...... xv

LIST OF TABLES ...... xvii

CHAPTER 1 - INTRODUCTION ...... 1

1.1 Williams-Beuren syndrome ...... 2

1.1.1 Genetics basis of Williams-Beuren syndrome ...... 2 1.1.2 Clinical features ...... 5 1.1.3 Other chromosomal rearrangements in 7q11.23 ...... 11 1.1.4 Genotype-phenotype correlations ...... 15 1.1.5 CNVs in the 7q11.23 region as a means for studying the genetic basis of behaviour, cognition and facial morphology ...... 21 1.2 Mouse models for Williams-Beuren syndrome ...... 22

1.2.1 Common WBS deletion mice ...... 22 1.2.2 Partial deletions mice ...... 23 1.2.3 Monogenic mutations in mice ...... 24 1.3 The TFII-I protein family ...... 32

1.3.1 TFII-I ...... 34 1.3.2 GTF2IRD2 ...... 36 1.3.2 GTF2IRD1 ...... 37 Aims of this thesis ...... 43

CHAPTER 2 - MATERIALS AND METHODS ...... 44

xi

2.1 Materials ...... 45

2.2 Methods...... 58

2.2.1 Molecular Biology ...... 58 2.3 Cell Biology ...... 62

2.3.1 Animals ...... 62 2.3.2 Tissue preparation ...... 63 2.3.3 Cell culture ...... 63 2.3.4 Transient transfection ...... 65 2.3.5 Immunofluorescence ...... 65 2.3.6 Proximity ligation assay (PLA) ...... 66 2.3.7 Confocal microscopy and image analyses ...... 69 2.4 Protein Biochemistry ...... 69

2.4.1 Total protein extraction from cells ...... 69 2.4.2 Protein quantitation ...... 70 2.4.3 SDS-polyacrylamide gel electrophoresis (SDS-PAGE) ...... 70 2.4.4 Western blotting...... 71 2.4.5 Immunoprecipitation ...... 71 2.4.6 Histone deacetylase activity ...... 72 2.5 Yeast two-hybrid assays ...... 73

2.5.1 Small-scale yeast transformation ...... 73 2.5.2 Yeast library screening...... 74 2.5.3 Plasmid rescue from yeast ...... 77 2.5.4 Identification of isolated yeast library clones ...... 78 2.6 Online resources ...... 79

2.7 Statistical analyses ...... 79

CHAPTER 3 - CELLULAR CHARACTERISATION OF GTF2IRD1 ...... 80

3.1 Introduction ...... 81

3.2 Results ...... 83

xii

3.2.1 Detection of human GTF2IRD1 and antibody validation ...... 83 3.2.2 Endogenous GTF2IRD1 exists in a punctate pattern in the nucleus ...... 84 3.2.3 Endogenous GTF2IRD1 shows overlap with markers of silenced chromatin ...... 91 3.3 Discussion ...... 95

CHAPTER 4 - IDENTIFICATION OF PROTEINS INTERACTING WITH GTF2IRD1 ...... 101

4.1 Introduction ...... 102

4.2 Results ...... 105

4.2.1 Yeast two-hybrid library screening for novel GTF2IRD1 interacting partners ...... 105 4.2.2 Domain characterization for the novel protein interactions of GTF2IRD1 ...... 113 4.2.3 Subcellular localisation of GTF2IRD1 and its novel protein partners ...... 116 4.2.4 GTF2IRD1 interactions with chromatin modifiers and transcriptional regulators can be demonstrated in mammalian cells ...... 118 4.2.5 Coimmunofluorescence analysis of endogenous partners in mammalian cells ...... 124 4.2.6 Overview of the novel GTF2IRD1 interactional network ...... 128 4.3 Discussion ...... 131

CHAPTER 5- GENE REGULATION ASSOCIATED WITH GTF2IRD1 ...... 136

5.1 Introduction ...... 137

5.2 Results ...... 139

5.2.1 Microarray analysis of GTF2IRD1 knock down in a mammalian cell line ...... 139 5.2.2 Microarray analysis in corpus striatum tissue of GTF2IRD1 knock out mice ...... 143 5.2.3 Analysis of the differentially expressed genes by qRT-PCR ...... 146 5.3 Discussion ...... 154

5.3.1 Transcriptional profiling of SiRNA-treated HeLa cells ...... 154 5.3.2 Transcript profiling of Gtf2ird1 knock out mice striatum samples ...... 156 5.3.3 Context dependence of the in vitro and in vivo gene expression profiles ...... 162 CHAPTER 6 - UNRAVELLING EPIGENETIC COMPLEXES ASSOCIATED WITH GTF2IRD1 ...... 167

6.1 Introduction ...... 168

6.2 Results ...... 170

xiii

6.2.1 Expansion of GTF2IRD1 protein network with MBD and HDAC proteins ...... 170 6.2.2 Visualisation of direct protein interactions using the proximity ligation assay ...... 174 6.2.3 Functional consequences of GTF2IRD1 binding to HDACs ...... 183 6.3 Discussion ...... 188

CHAPTER 7 - GENERAL DISCUSSION ...... 194

7.1 Overview ...... 195

7.1.1 GTF2IRD1 is a nuclear speckling protein ...... 195 7.1.2 Effect of GTF2IRD1 on gene regulation ...... 196 7.1.3 Novel interacting partners of GTF2IRD1 ...... 197 7.2 Future directions ...... 200

7.3 Concluding remarks ...... 202

xiv

LIST OF FIGURES

Figure 1.1 The Williams-Beuren syndrome rearranged region ...... 4

Figure 1.2 Typical and atypical deletions of the 7q11.23 region and the phenotype correlation ...... 20 Figure 1.3 Key structural features of the TFII-I protein family ...... 33

Figure 1.4 Model for DNA binding of GTF2IRD1 to its own promoter region ...... 42 Figure 2.1 Key steps of the in situ proximity ligation assay (PLA) ...... 68

Figure 2.2 Yeast two-hybrid library screening workflow ...... 76

Figure 3.1 Detection of endogenous human GTF2IRD1 ...... 86

Figure 3.2 Endogenous GTF2IRD1 is expressed in a speckled pattern within the nucleus ...... 88

Figure 3.3 Endogenous GTF2IRD1 adopts a speckled nuclear pattern in hESC-derived neuronal cell cultures ...... 90

Figure 3.4. Confocal co-immunofluorescence analysis of endogenous GTF2IRD1 with markers of nuclear sub-compartments and chromatin sub-domains in HeLa cells ...... 93

Figure 3.5. Comparison of GTF2IRD1 localization with markers of nuclear sub- compartments ...... 94

Figure 4.1 Confirmation of Y2H interactions by retransformation of bait and prey plasmids………………………………………………………………………………………..109

Figure 4.2 Mapping of interaction domains in GTF2IRD1 with the proteins identified in the Y2H screens ...... 114

Figure 4.3 Subcellular localisation of constitutively expressed GTF2IRD1 and the novel putative protein partners ...... 117

Figure 4.4 Novel GTF2IRD1 interactions with nuclear proteins revealed by coimmunoprecipitation in mammalian cells ...... 120

Figure 4.5 Endogenous expression of several GTF2IRD1 partners in mammalian cells ...... 126

Figure 4.6 Comparison of endogenous colocalisation of GTF2IRD1 and TFII-I with ZMYM2 and ZMYM3 ...... 127

Figure 4.7 A GTF2IRD1 interactional network ...... 130

Figure 5.1 (GO) enrichment analysis of the dysregulated genes found in HeLa GTF2IRD1 knock down samples ...... 142

xv

Figure 5.2 Gene ontology (GO) enrichment analysis of the dysregulated genes found in the Gtf2ird1-/- striatum samples ...... 145

Figure 5.3 qRT-PCR validation of selected genes from siRNA treated HeLa cells ...... 149

Figure 5.4 qRT-PCR validation of selected genes in the striatum of Gtf2ird1-/- mice ...... 151

Figure 5.5 Comparative expression levels of genes in the corpus striatum tissue of Gtf2ird1 knock out mice ...... 152

Figure 5.6 Venn diagram of genes with a minimal fold change of ±1.2 in both microarray analyses ...... 166

Figure 6.1 GTF2IRD1 interacts with MBD and HDAC proteins ...... 173

Figure 6.2 Visualisation of interactions between GTF2IRD1 and HDAC1/2 in HeLa cells .... 177

Figure 6.3 Interactions between GTF2IRD1 and HDAC1/2 in hESCs-derived neurons ...... 178 Figure 6.4 Quantitation of the PLA interactions between GTF2IRD1 and HDACs ...... 179

Figure 6.5 Negative GTF2IRD1 PLA interactions ...... 180

Figure 6.6 Comparison of the levels of HDAC interaction by PLA ...... 182

Figure 6.7 GTF2IRD1 over-expression increases histone deacetylase activity ...... 185

Figure 6.8 Effect of GTF2IRD1 knock down on the assembly of HDACs complexes ...... 187

Figure 7.1 Proposed model for the molecular role of GTF2IRD1 ...... 204

xvi

LIST OF TABLES

Table 1.1 Comparison of the clinical features of Williams-Beuren syndrome and Williams- Beuren region duplication syndrome ...... 14

Table 2.1 Commercially available reagents and kits ...... 45

Table 2.2 Primary antibodies ...... 48

Table 2.3 Plasmids ...... 51

Table 2.4 Oligonucleotide primers used for cloning open reading frames (ORFs) ...... 56

Table 2.5 Oligonucleotide primers used for quantitative PCR ...... 57

Table 4.1 Summary of hits obtained in both Y2H screens ...... 111

Table 4.2 Notes on the functional properties of novel GTF2IRD1 protein partners validated in mammalian cells ...... 122

Table 5.1 Overview of the differentially expressed genes chosen for qRT-PCR validation in the HeLa siRNA treated cells ...... 148

Table 5.2 Overview of the dysregulated genes in the striatum of Gtf2ird1 knock out mice .... 153

xvii

CHAPTER 1 - INTRODUCTION

1

1.1 Williams-Beuren syndrome

Williams-Beuren syndrome (WBS, OMIM 194050) is a genomic disorder caused by the microdeletion of 1.55 to 1.84 Mb at the region 7q11.23 which includes up to 28 genes (Bayes et al., 2003; Ewart et al., 1993; Pober, 2010). Its estimated prevalence is 1:7,500 live births (Stromme et al., 2002).

1.1.1 Genetics basis of Williams-Beuren syndrome

1.1.1.1 Genomic disorders and contiguous gene syndromes

Genomic disorders are DNA rearrangements causing copy number variations

(CNVs) in the that result in pathological presentations (Lupski, 1998).

When the genomic disorder is produced by the rearrangement of a chromosomal interval that spans multiple adjacent genes, it can be classified as a contiguous gene syndrome (CGS) (Schmickel, 1986).

The phenotypes observed in patients with genomic disorders are generally due to the presence of one or more dosage sensitive genes within the rearranged interval. In addition, the change in the genomic architecture of the rearranged region, not only affects the genes with altered CNV, but also produces expression changes in the genes flanking the rearrangement (Merla et al., 2006; Molina et al., 2008; Reymond et al.,

2007), thus suggesting these can also contribute or modify the phenotypical outcome of a specific CNV region.

2

1.1.1.2 Molecular mechanism

The WBS region is flanked by low copy repeats sequences (LCRs), which are located in blocks (named A, B and C) on the centromeric, medial and telomeric part of the WBS locus (Bayes et al., 2003). LCRs confer genomic instability by providing a substrate for non-allelic homologous recombination (NAHR) during meiosis, which leads to the rearrangements responsible for genomic disorders (Schubert, 2009;

Stankiewicz and Lupski, 2002). The LCR sequences in 7q11.23 are approximately 320

Kb long and are organised in complexes composed of A, B and C blocks with very high homology; block B having the highest similarity. 95 % of WBS patients have the common ~1.5 Mb deletion and have centromeric and medial LCR breakpoints in block

B (Bayes et al., 2003; Schubert, 2009). Block B in the medial LCR contains three genes

(GTF2I, NCF1 and GTF2IRD2), while block B in the centromeric and telomeric LCRs contain their corresponding putative pseudogenes. As shown in figure 1.1, a single-copy gene region of ~1.2 Mb is located between blocks C and B in the centromeric and medial LCRs (Bayes et al., 2003).

A larger deletion, only seen in 5 % of the cases, is mediated by recombination between the A blocks of the centromeric and medial LCRs (Bayes et al., 2003).

In general, WBS deletions occur sporadically, but a few inherited cases of the autosomal dominant type have been reported (Morris et al., 1993; Ounap et al., 1998;

Pankau et al., 2001; Sadler et al., 1993). In some cases, the diagnosis was only based on clinical features and no genetic or molecular analysis was performed, and some of the transmitting parents had a very mild clinical presentation. All the cases showed considerable clinical variation.

3

This figure has been removed

due to copyright restrictions.

Figure 1.1 The Williams-Beuren syndrome rearranged region.

Its location in chromosome 7q11.23 is shown. The single copy gene region is shown in black font. LCR blocks A, B and C are represented by green, blue and red arrows respectively, and the lowercase letters represent their location: c, centromeric; m, medial; and t, telomeric. The figures below the WBS region indicate the structure of the

1.5 and 1.8 Mb deletions (Adapted from Merla et al. (2010)).

4

1.1.2 Clinical features

WBS is a multisystem disorder with clinical features that include a distinctive facial gestalt, growth, endocrine and neurological problems, cardiovascular abnormalities and a unique cognitive and behavioural profile (Merla et al., 2010). A summary of the main characteristic features of WBS is presented in table 1.1.

1.1.2.1 Multisystem Features of WBS

1.1.2.1.1 Organ abnormalities

One of the most prominent features of WBS is the stenosis of arteries, with supravalvular aortic stenosis (SVAS) showing a particularly high incidence (70 %) with a broad severity range (Pober, 2010). Hypertension is also frequent, affecting ~50 % of the patients (as reviewed by (Pober, 2010)). Electrocardiogram (ECG) abnormalities such as prolongation of the QT interval contribute to a greater risk of sudden death.

These features contribute to the fact that cardiovascular complications represent the major cause of deaths amongst WBS patients (Collins, 2013; Pober, 2010; Wessel et al.,

2004).

Other symptoms usually found associated with WBS include diverticulitis, abdominal pain, structural renal anomalies, mild premature aging of the skin, joint laxity, amongst many other symptoms affecting multiple organ systems (Pober, 2010).

5

1.1.2.1.2 Endocrine and metabolic abnormalities

Endocrine and metabolic features include hypothyroidism, hypercalcaemia and an increased prevalence of diabetes (Palacios-Verdu et al., 2015; Pober, 2010).

Hypothyroidism is generally subclinical, but focused studies have shown increased levels of thyroid stimulating hormone (TSH) accompanied with mild thyroid hypoplasia. Hypercalcaemia is found in 15 % of children and is transient and mild. The most prevalent endocrine feature is the development of diabetes or impaired glucose tolerance, which is found in 75 % of adult WBS patients tested. Other features frequently found in children or adults with WBS include hypotriglyceridemia and increased blood levels of bilirubin, total protein and albumin (Palacios-Verdu et al.,

2015).

1.1.2.1.3 Craniofacial abnormalities

An important factor for WBS diagnosis is the identification of the characteristic craniofacial abnormalities, which are composed of skeletal and soft tissue alterations

(Mass and Belostoky, 1993; Pober, 2010). The skull features have been defined as a short anterior cranial base, steep mandibular plane angle, deficient chin button, shorter upper facial height and a longer lower facial height line. The soft tissue abnormalities are thought to constitute the major part of the overall profile and include; a long philtrum, periorbital fullness, full lips and chin and a wide smile (Mass and Belostoky,

1993). This profile has been well characterised with the help of photoanthropometric tools and three-dimensional (3D) imaging (Hammond et al., 2005; Hovis and Butler,

1997) which supports the clinical diagnosis and facilitates animal model assessment.

6

1.1.2.2 WBS behavioural profile

The WBS behavioural profile is distinctive and represents one of the most interesting characteristics of the disease that has the potential to reveal much regarding the connection between genetics and human behaviour. One of the classic features is hypersociability, which is a consistent characteristic in the patients analysed, according to a review of behavioural studies performed in WBS cases (Martens et al., 2008).

Individuals with WBS show empathy and social disinhibition, and a high interest in seeking social interaction (Doyle et al., 2004; Jarvinen-Pasley et al., 2008; Morris,

2010). Evidence of hypersociability begins to become apparent in the early months of life. Studies in infants and toddlers with WBS showed that they gazed at strangers more than the control group of children (Mervis et al., 2003) and there is a detectable preference for the observation of faces rather than objects. However, despite being described as overfriendly, patients have been reported to be unable to sustain social relationships (Gosch and Pankau, 1994).

Another aspect of the neurobehavioural features of WBS patients is the high incidence of psychopathologies, as observed in a study in 4-16 year old children (Leyfer et al., 2006). The most frequent was attention deficit hyperactivity disorder (ADHD), with a 64.7 % prevalence rate, contrasting with the 3-7 % of the general population

(Leyfer et al., 2006). Specific phobias had the second highest prevalence rate, the most characteristic and prevalent phobia being the fear of loud noises. Generalized anxiety disorder (GAD) was also observed in about 12 % of WBS children, although the anxiety seen is more of an anticipatory type, which is manifested as an anticipation worry towards a specific event (Leyfer et al., 2006). Therefore, the hypersociability, which seems to constitute a lack of social anxiety and social inhibition, is contrasted sharply by increased susceptibility to non-social cues. 7

Enjoyment of music and strong musical skills has been traditionally associated with WBS in an anecdotal fashion. Studies on musical abilities in WBS were reviewed by Martens et al. (2008) and so far, results have been contradictory. Some studies showed that WBS subjects have absolute pitch and positive emotional responses to music associated with decreased anxiety, whilst other reports showed a performance similar or below to the control groups, and increased anxiety in response to musical activities.

Recent analysis of a few WBS patients has showed that the syndrome and autism can co-occur, despite this seeming to be opposite to the hypersociability phenotype. In one study, two WBS patients analysed showed positive standard markers for autism; polymorphism of the promoter of the serotonin transporter gene (5-

HTTLPR) and hyperserotonemia (Tordjman et al., 2013). These factors might suggest that the HTT gene and serotonin levels could act as modifiers of the WBS behavioural profile. Another study reported 9 new patients with autistic features, including autistic stereotyped behaviours, although social withdrawal was absent (Tordjman et al., 2013).

Therefore, more studies are required to support the mechanism by which autism can coexist in a small number of WBS cases. One hypothesis is that these two clinical entities may just represent coincidental separate genetic insults (Meyer-Lindenberg et al., 2006).

1.1.2.2.1 WBS cognitive profile

Besides the craniofacial abnormalities, the most distinctive feature of WBS is the characteristic cognitive and behavioural profile, which is marked by an uneven set of severe impairments and relative strengths (Bellugi et al., 1999). The pattern of the

8

WBS cognitive profile (WSCP) is composed of distinctive features in the following areas.

General cognition: mild to moderate intellectual disability ranging from severe impairments to within the normal range. When assessed in children and adults, IQ ranges from 40-100, with a mean IQ of about 55 (as reviewed by (Pober, 2010)).

Language: development of language usually involves speech delay in most patients, but once language is acquired, strong verbal skills are typical. WBS patients show strengths in expressive language, verbal memory, word fluency and grammar, which is striking considering their cognitive and spatial memory impairments. But when language requires a spatial description, children with WBS make more errors in the use of spatial prepositions (reviewed by (Bellugi et al. (1999); Martens et al., 2008)).

Visuospatial skills: The performance of WBS patients in tests to assess visuospatial skills is poor, when compared to age and IQ matched subjects with Down syndrome or right hemisphere lesions, which results in a lack of global spatial organisation. In contrast, WBS patients have remarkable strength in facial recognition (Bellugi et al.,

1999) suggesting that this process involves separate mechanisms.

1.1.2.3 Neurological features

Neurological abnormalities observed in WBS children include hypotonia, strabismus and hyperreflexia (Morris, 2010). Motor dysfunctions are also present, in the form of gait, coordination problems or fine motor skill abnormalities (Chapman et al.,

1996; Hocking et al., 2008).

9

Increased sensitivity to sound is highly common in WBS patients, which is commonly referred to as hyperacusis, but is more aptly described as auditory allodynia, because it involves strong reactions to sudden sounds that are not normally aversive to the general population. Hyperacusis generally means increased auditory sensitivity and in fact the opposite is true; WBS patients are known to have progressive sensorineural hearing loss (Gothelf et al., 2006; Marler et al., 2010).

1.1.2.4 Neuroanatomical abnormalities

MRI analyses have identified several neuroanatomical anomalies in WBS patients; a review of collected studies (Jackowski et al., 2009) indicates decreased total brain volume, increased cerebellum reduction, and reduction of the corpus callosum, basal ganglia and the amygdala. Abnormalities in the frontostriatal circuit and an enlarged amygdala have also been reported, which may be associated with the increased social interactions (Haas et al., 2014; Mobbs et al., 2007).

In addition, several areas of reduced grey matter have been reported: in the orbitofrontal cortex, near the third ventricle and in the intraparietal sulcus, the latter being identifiable in MRI studies in WBS children (reviewed by Osborne and Mervis

(2007). It has been suggested that the alterations in the intraparietal sulcus are related to the visuospatial construction deficits, the reduced grey matter in the orbitofrontal cortex is linked with the hypersociability and the region around the third ventricle could be associated with the hormonal disturbances (Meyer-Lindenberg et al., 2004; Meyer-

Lindenberg et al., 2005).

10

1.1.3 Other chromosomal rearrangements in 7q11.23

The mechanism that creates the WBS deletion (NAHR) should, theoretically, be capable of generating duplications at an equal frequency. However, for many years these CNVs have escaped clinical detection. Recently, it has emerged that such duplications do exist as well as triplication and inversion cases (as reviewed by (Merla et al. (2010); Schubert, 2009). The duplications predispose to a range of clinical features. Genome-wide association studies (GWAS) have identified 7q11.23 duplications as a new risk factor for schizophrenia (Kirov et al., 2012; Mulle et al.,

2014), and similar studies in patients diagnosed with autism spectrum disorders (ASDs) have found an increased incidence of 7q11.23 recurrent de novo duplications amongst the sample group (Luo et al., 2012; Sanders et al., 2011). Integration with transcriptional data showed a differential expression of several genes in the WBS region including BCL7B, EIF4H, and LAT2, and also 85 dysregulated genes outside the region.

1.1.3.1 Williams-Beuren region duplication syndrome

Case studies of patients carrying the reciprocal duplication of WBS

(OMIM#609757) have a more recent history than WBS, with the first report in 2005 by

(Somerville et al., 2005). To date, only a few dozen patients have been identified, despite its frequency in the population being estimated at 1:13,000-20,000 (Van der Aa et al., 2009). The cases are either familial or de novo, and share similar phenotypic characteristics (Dixit et al., 2013b; Merla et al., 2010), which reveals further insights into the role of several dosage sensitive genes in the Williams-Beuren syndrome and how increased dosage impacts on neurodevelopmental and behavioural mechanisms.

11

1.1.3.1.1 Clinical features

The most prominent feature of the 7q11.23 duplication is a severe expressive speech delay (Osborne and Mervis, 2007), combined with developmental, craniofacial and neurocognitive abnormalities (Dixit et al., 2013a; Merla et al., 2010).

A summary of the features associated with the duplication and a comparison with the clinical presentation of WBS is presented in table 1.1. As expected in syndromes caused by reciprocal chromosomal rearrangements, most of the features from the duplication syndrome are opposite to WBS, suggesting that the genes impacting at the reduced dose also have an impact in the opposite direction when then are increased. However, there are a number of symptoms that are conserved in both disorders.

The distinctive craniofacial abnormalities include thin lips, a short philtrum and a high broad nose, which contrasts with the thick lips and long philtrum of WBS. The key neurobehavioural and cognitive features, apart from speech delay, that are present at a high frequency are mental retardation and autism spectrum behaviours, amongst others (table 1.1) (Berg et al., 2007; Dixit et al., 2013a; Merla et al., 2010; Van der Aa et al., 2009).

Despite the presence of a recognisable set of phenotypes for the 7q11.23 duplication, current observations suggest a high variability of these features and incomplete penetrance. Moreover, the duplication has been found in healthy carriers and in parents of 7q11.23 duplication syndrome patients, showing a much milder set of phenotypes (Berg et al., 2007; Dixit et al., 2013b; Van der Aa et al., 2009). This may explain why the occurrence of 7q11.23 duplication parental transmission has been observed more frequently than for WBS. One representative example includes a patient

12 with speech delay, autistic features and behavioural problems, whose transmitting parent presented learning difficulties but normal speech (Dixit et al., 2013b).

13

Table 1.1 Comparison of the clinical features of Williams-Beuren syndrome and

Williams-Beuren region duplication syndrome (Reprinted from Merla et al. (2010)).

Finding 7q11.23 deletion 7q11.23 duplication Facial characteristics Broad forehead Broad forehead Low nasal root High, broad nose Long philtrum Short philtrum Full lips Thin lips Growth and endocrine Growth retardation Normal growth a problems Hypercalcemia Normocalcemia Cardiovascular SVAS Congenital heart abnormalities Hypertension defects Connective tissue Joint laxity Joint laxity abnormalities Hypotonia Hypotonia Neurological problems Seizures Brain MRI Brain MRI abnormalities abnormalities (non-specific) (non-specific) Cognitive abnormalities Developmental delay Developmental delay Mental retardation Mental retardation b

Relative strength in Speech and language expressive language delay Behavioral problems Deficit of visuospatial Visuospatial skills c skills spared Excessively social Deficits of social Autism spectrum interaction/aggressive behaviours behaviour Autism spectrum ADHD behaviours ADHD

(a) Few patients with growth retardation have been reported. (b) Transmitting parents with normal cognition have been reported. (c) Poor visuospatial skills reported in two patients with 7q11.23 duplication.

14

1.1.3.2 WBS triplication syndrome

To date, only one patient has been reported to carry a triplication of the 7q11.23 region (Beunders et al., 2010). This 1.25 Mb rearrangement was found to be de novo and shared the common distal WBS breakpoint; however the proximal breakpoint was located between the genes FZD9 and BAZ1B. The clinical features are similar to those seen in patients with 7q11.23 duplication, but features such as the mental retardation, behavioural problems and facial dysmorphism were more severe than the typical duplication presentation. The most prominent feature observed was the severe expressive language delay.

1.1.3.3 Inversions

7q11.23 inversions have been reported in 27 % of patients with atypical WBS

(Osborne et al., 2001). Interestingly, 33 % of these cases were of parental transmission, which compared to an estimated 5 % of the general population that carry the inversion, further suggests that this polymorphism can predispose to other rearrangements in the region (Hobart et al., 2010; Osborne et al., 2001; Scherer et al., 2005). A study that assessed families with the inversion, showed no WBS symptoms in the inversion carriers, nor gene expression changes in the WBS common region or the surrounding genes (Tam et al., 2008).

1.1.4 Genotype-phenotype correlations

Out of the single copy genes present in the 7q11.23 rearranged region, so far only ELN has been unequivocally identified as responsible for a specific clinical feature

15 in WBS patients. However, the discovery of patients that carry atypical deletions

(smaller or larger) of the region and either have the usual range of WBS features or a reduced set of WBS features has permitted a relatively sophisticated level of phenotype mapping to specific parts of the deletion region. The association of some of these phenotypes with specific genes has also been strongly supported by work in genetically modified mouse models, as detailed in the next section.

Mutations in ELN locus have been associated to SVAS present in the WBS patients, but also in a non-syndromic condition (Merla et al., 2012; Tassabehji and

Urban, 2006).

Additionally, due to its role in the vascular stenosis, ELN has been recognised as a contributor for the development of hypertension in WBS patients. Interestingly, hemizygous deletion of NCF1, one of the genes at the distal side of the rearranged region, has been reported to have a protective role from such hypertension. This result emerged from association studies in patients carrying atypical deletions which are mostly classified as normal blood pressure group (Del Campo et al., 2006). Examination of several 7q11.23 duplication patients has revealed that there is evidence of aortic dilation, a feature that is opposite to the SVAS seen in WBS. These findings suggest that ELN could also be involved in the cardiovascular phenotype of the duplication syndrome through increased elastin abundance (Del Campo et al., 2006; Parrott et al.,

2015).

Most of the recent phenotype-genotype correlation analyses have pointed towards an important role for the genes in the telomeric region of the deletion (reviewed in (Antonell et al., 2010a)). However, the desire to refine this map even further has been hampered by the reduced number of cases with breakpoints in informative locations.

16

Therefore, these analyses are based on a small pool of atypical deletion patients, often single case reports. In this situation, variability in phenotypic penetrance, modifier effects and variability in clinical diagnosis create inconsistencies that limit the power of this type of genetic mapping.

One such recent case report described a patient carrying an 81.8 Kb deletion containing just the genes ELN and LIMK1, and showed vascular abnormalities, mild facial dysmorphism and a delay in fine motor skill development (Euteneuer et al.,

2014). A comparison of this case was made against other similar cases and other partial deletions, and showed that the cardiovascular abnormalities were more severe than those in larger deletions, suggesting the contribution of other factors that modify the outcome. A previous study compared one case with a small deletion encompassing the genes from FKBP6 to RFC2 where the patient presented cardiovascular problems but no WSCP, against a typical deletion case with WSCP, which further suggested that the sole heterozygosity of LIMK1 is not enough to cause the WSCP, but it can be accounted as a contributor (Tassabehji et al., 1999).

Regarding the metabolic and endocrine abnormalities seen in WBS, correlations in patients have shown the gene coding for MLX-interacting protein-like (MLXIPL) as the main candidate for the glucose intolerance (Antonell et al., 2010a), which is supported by the impaired glucose metabolism observed in a knock out mouse model for this gene (Iizuka et al., 2004).

In relation to the visuospatial deficits, an initial report had positioned LIMK1 as responsible of this feature (Frangiskakis et al., 1996). However, later studies in different patients dismissed this role (Gray et al., 2006; Tassabehji et al., 1999).

17

CLIP2, GTF2I and GTF2IRD1 have been previously suggested as the genes involved in the cognitive and behavioural phenotype of the WBS (Hoogenraad et al.,

2002; Tassabehji, 2003; van Hagen et al., 2007), but the recent discovery of two siblings with a pure deletion of CLIP2 and no characteristics associated to WBS, discarded the possibility of an exclusive contribution of this gene to the phenotype, highlighting the importance of GTF2I and GTF2IRD1 in the neurobehavioural features of WBS (Vandeweyer et al., 2012).

A number of atypical deletion cases further support the role of GTF2I and

GTF2IRD1in the WSCP. A study with three other cases that harboured smaller deletions suggested that these two genes contribute to the visual spatial processing defect in patients (Hirota et al., 2003). Additionally, analyses in five families that presented visuospatial deficits but no mental retardation, with deletions extending up to

GTF2IRD1 attributed a role in intellectual disability to GTF2I and further implicated

GTF2IRD1 with the visuospatial defects (Morris et al., 2003).

A larger deletion of 2.4-3.1 Mb that extended towards the telomere from the first intron of GTF2IRD1 implicated the deletion of the genes GTF2IRD1, GTF2I and

GTF2IRD2 with the WBS cognitive and behavioural profile (Edelmann et al., 2007).

More evidence that the genes GTF2IRD1 and GTF2I are critical for the WSCP, came from the report of two partial deletion cases, one case that presents mild mental retardation and craniofacial features, with hypersociability and no visuospatial defects, which deletion includes GTF2IRD1 (Antonell et al., 2010a). The other case had haploinsufficiency of GTF2IRD1 and GTF2I, and only presented mild craniofacial features, not showing the WSCP. For the craniofacial features observed in WBS, again the additive dosage effect of GTF2IRD1 and GTF2I is pointed a responsible for these features (Tassabehji et al., 1999). 18

The figure 1.2 illustrates how it was refined the genetic contribution of these genes to the craniofacial abnormalities and the WSCP when analysing the clinical presentation in patients with different size deletions.

19

This figure has been removed

due to copyright restrictions.

Figure 1.2 Typical and atypical deletions of the 7q11.23 region and the phenotype correlations.

Most of the atypical deletions reported to date in WBS cases are shown along with the transcript map of the WBS critical region (genes depicted by yellow boxes). The LCRs blocks A, B and C are depicted by orange, red and green arrows respectively; and their positions are indicated as central (c) or medial (m). The continuous black lines below the transcript map represent the length of the deletions, and dotted lines represent unclear breakpoints. (Symbols for the clinical information table, +: present; −: not present; +/−: ambiguous or borderline; NA: not available). (Reproduced from Antonell et al. (2010a)).

20

1.1.5 CNVs in the 7q11.23 region as a means for studying the genetic basis of behaviour, cognition and facial morphology

The comparison of some of the phenotypes that are found consistently in patients that are affected by the deletion or duplication of the 7q11.23 region, provides not only an opportunity to understand the pathogenesis of these disorders but also a means to understand the genetic basis of the traits involved. As mentioned earlier, WBS patients present with deficits in visuospatial construction, relative strengths in expressive language, excessive social interaction and a distinctive set of facial characteristics that includes full lips and a wide smile. All of these features are contrasted in patients that carry the reciprocal 7q11.23 duplication, who display spared visuospatial skills, expressive language delay, deficits in social interaction and thin lips

(Merla et al., 2010). The deficits in social interaction and expressive language delay are also reflected in the fact that duplications in this region are found at a higher rate in patients diagnosed with autism (Luo et al., 2012; Sanders et al., 2011). These contrasting features suggest the existence of one or more dosage sensitive genes within

7q11.23 that are responsible for the formation of such characteristic traits. This model predicts that more extreme CNVs would exaggerate the features even further, which has been confirmed in a 7q11.23 triplication case, in which all of the typical duplication characteristics are more severe, including expressive language delay, mental retardation, behavioural problems and facial dysmorphism (Beunders et al., 2010).

The existence of relatively unaffected healthy transmitting parents that also carry the duplication suggests there may be complex mechanisms that are able to compensate for the detrimental gene dosage effects. These complex secondary effects suggest the existence of synergistic and epistatic relationships that act on the combinations of genes that are deleted or duplicated in the 7q11.23 region. However, in order to understand the 21 genetic basis of these diseases and to understand the role of such dosage sensitive genes in human social behaviour, visuospatial construction, language development and facial dysmorphism, it is important to pinpoint the causative genes and understand their function at the cellular and molecular level.

1.2 Mouse models for Williams-Beuren syndrome

The human region 7q11.23 shows complete conservation of synteny in Mus musculus (DeSilva et al., 2002; Valero et al., 2000), although the deletion region is inverted with respect to the position of the centromere and the LCRs are not present as they are unique to the human species. Nevertheless, the contiguous order of the deleted gene series is conserved, thus allowing the generation of mouse models to study WBS.

Animal models are valuable tools for analysing genotype-phenotype relationships, for identifying the genes responsible for specific traits, for understanding the causative mechanisms and sometimes for developing clinical treatment strategies. An array of mouse models have been developed for WBS, using different technologies: some mimic the common WBS deletion, while others involve partial deletions or target single genes.

1.2.1 Common WBS deletion mice

A mouse model that mimics the complete WBS deletion (CD, from and including Gtf2i through to Fkbp6) was generated using Cre-Lox recombination of LoxP sites inserted at the location of these breakpoints. Hemizygous mice were viable while the homozygous deletion was lethal at the embryonic stage (Segura-Puimedon et al.,

2014). The heterozygous mice presented with growth delay (reduced body weight),

22 reduced brain weight, craniofacial abnormalities, and mild cardiovascular defects (such as mild hypertension and an increase of the arterial wall thickness). The neurobehavioural phenotypes included motor coordination abnormalities, hypersociability and an increased startle response. Correlating neuroanatomical findings included a decreased cell density in the basolateral amygdala and increased density of immature neurons in the dentate gyrus. In this best-possible model of WBS, it is clear while severity may be reduced, such as the mild cardiovascular phenotype; CD mice are able to recapitulate the majority of the features observed in WBS.

1.2.2 Partial deletions mice

Several models of partial WBS deletions have been reported. Two were constructed using Cre-Lox recombination with the LoxP sites targeted to Gtf2i, Limk1 and Fkbp6. Proximal deletion (PD) mice were generated by recombination of LoxP sites targeted to Gtf2i and Limk1 and distal deletion (DD) mice were generated by recombination of LoxP sites targeted to Limk1 and Fkbp6 (Li et al., 2009). A third line of mice were generated (D/P) by the crossing of the PD and DD lines, thus recapitulating the complete human deletion, although it is important to take into account that the two deletions are not therefore contiguous but exist in trans, and since Limk1 is targeted in both deletions, these mice are homozygous null for Limk1. Phenotypic characterisation of these mice showed a recapitulation of a number of features observed in WBS patients, and also allowed a degree of genetic mapping of the features to either the PD or DD region (Goergen et al., 2011; Li et al., 2009). For example, an assessment of cranial morphology showed that DD mice have foreshortened skulls, with a similar

23 trend in the D/P mice, thus suggesting that genes in the DD region (Limk1-Fkbp6) are primary contributors to the hard tissue craniofacial defects of WBS (Li et al., 2009).

Analyses of PD mice suggested that the genes in this region (Gtf2i to Limk1) are associated with reduced brain lateral ventricle volume, increased neuronal density in the somatosensory cortex and abnormal social behaviours (Li et al., 2009). Behavioural tests showed that PD mice have altered social interaction, increased anxiety, motor coordination deficits and increased sensitivity to sound.

NCF1 is a gene encoding a subunit of NADPH-oxidase (NOX), and altered dosage of NCF1 has been postulated as a modulator of the cardiovascular phenotype in

WBS (Del Campo et al., 2006). Partial knock out mouse models that have NCF1 deleted present with hypertension, while mice with other deletion sizes resulting in normal dosage of this gene are normotensive (Goergen et al., 2011). Moreover, the hypertension in these models can be reversed by pharmacological inhibition of NOX

(Campuzano et al., 2012). This example illustrates the power of such models to pinpoint specific aspects of disease and provide the means to explore mechanisms and treatments.

1.2.3 Monogenic mutations in mice

As described earlier, the WBS cardiovascular phenotype has been unequivocally attributed to haploinsufficiency of the ELN gene product, as heterozygous point mutations in the human ELN locus also lead to SVAS (Li et al., 1997). Cardiovascular abnormalities in mouse models lacking the orthologous Eln gene further corroborate the conclusion from the human correlation and illustrate the importance of ELN in arterial pathology (Li et al., 1998a; Li et al., 1998b). Homozygous Eln mutations cause 24 embryonic lethality, showing severe obstructive arterial disease, which supports a role for ELN in later arterial development (Li et al., 1998a). Heterozygous mice present with hypertension, decreased aortic compliance and mild cardiac hypertrophy but no SVAS, suggesting that humans are more sensitive to ELN haploinsufficiency (Le et al., 2011;

Li et al., 1998b).

Cardiac defects and cardiovascular abnormalities had also been previously associated with Baz1b, through the analysis of a Baz1b null mouse: however, this study was later retracted (Yoshimura et al., 2014). Targeted disruption of Baz1b in another mouse line has shown that it is essential for spermatogenesis through abnormalities of proper gene regulation. No other phenotypes were reported in this line (Dowdle et al.,

2013). However, a chemically-induced ENU point mutation in Baz1b, that causes decreased protein levels, was found to result in a mouse line with skull shape abnormalities that are reminiscent of the WBS phenotype (Ashe et al., 2008). Baz1b is known to be involved in chromatin remodelling as the ENU mutagenesis screen was part of an unbiased screen for novel epigenetic modulators (Ashe et al., 2008), further strengthening the notion that 7q11.23 CNV disorders involve genes that are strongly implicated in epigenetic control mechanisms.

Regarding the endocrine abnormalities seen in WBS patients, mouse lines with either a targeted knock out of Mlxipl or overexpression of the syntaxin-1A (Stx1a) gene, have been reported to show evidence of diabetes or alterations in glucose metabolism

(Iizuka and Horikawa, 2008; Lam et al., 2005).

Other animal models, such as those targeting the genes Fzd9, Stx1a, Clip2 and

Limk1 have been reported to produce neurobehavioural phenotypes in a dosage dependent manner. However, in general, hemizygous mutations either generate no

25 difference from wild type or a much milder phenotype than observed in homozygous null mice (as reviewed by Osborne (2010)). Fzd9 null mice present with structural and functional changes of the hippocampus, as revealed by deficits in tests assessing visuospatial learning and memory (Zhao et al., 2005). Mice with a homozygous Stx1a mutation that leads to truncated protein expression show alterations in synaptic plasticity (Fujiwara et al., 2006). Hippocampal dysfunction is also seen in mice that are haploinsufficient for Clip2; mutant mice show mild brain abnormalities (reduced size of corpus callosum), deficits in motor coordination and mild growth retardation

(Hoogenraad et al., 2002; van Hagen et al., 2007). Limk1 homozygous null mice present with hippocampal dysfunction and mild deficits in spatial learning (Meng et al., 2002;

Todorovski et al., 2015). Furthermore, it was observed that the null mice have impaired long term memory but no detectable defects of short-term memory. However, the effect of Limk1 haploinsufficiency has not yet been reported.

As is the case for the other genes deleted in the WBSCR, efforts to understand the function of the TFII-I family have focused on analysing single gene mutations in various mouse lines. Over the last decade, several mutant mouse lines have emerged, including three different Gtf2i mutant lines generated by gene-trap mutagenesis or targeted deletion (Enkhmandakh et al., 2009; Lucena et al., 2010; Sakurai et al., 2011).

Mice that are homozygous for one of these gene-trap insertions (Gtf2iGt(XE029)Byg), generated through the insertion of a LacZ-neo cassette into the intron immediately after exon 3 were found to die during embryogenesis (Enkhmandakh et al., 2009). Analysis of these embryos indicated that the homozygous null mice die from multiple developmental defects, suggesting that TFII-I is critical for embryonic development. No data were reported for the heterozygous mutant mice of this line, but in an effort to identify possible molecular changes caused by the lack of TFII-I protein, a

26 developmental microarray analysis performed at E9.5, revealed down-regulation of many genes involves in core biological processes such as oxidative metabolism, cell division, , translation, the ubiquitin cycle, cytoskeleton regulation and cell motility (Enkhmandakh et al., 2009).

Homozygous embryos from the Gtf2iGtBux line showed similar but less pronounced embryonic lethality, with neural tube defects being evident at E10.5

(Sakurai et al., 2011). Heterozygous mice from both lines were viable, with 10% of the mice from the Gtf2iGt(XE029)Byg line being smaller than the wild type control mice, whereas heterozygous mice from the Gtf2iGtBux line were indistinguishable from their wild type siblings. Behavioural analyses of these mice demonstrated measurable alterations in social interaction compared to wild type controls, showing increased interactions with stranger mice and a lack of habituation to familiar partners. No difference was observed in levels of anxiety in response to non-social cues. The increased sociability evident in the Gtf2iGtBux heterozygous mice suggests that GTF2I could be the gene at the telomeric end of the WBS region that is responsible for the hypersociability in WBS individuals.

The third mutant mouse line Gtf2itm(Δex2) is a mouse model generated using a gene targeting strategy, in which exon 2 of the gene was replaced by a PGK- neo cassette (Lucena et al., 2010). This gene mutation resulted in the production of a truncated protein lacking the first 140 amino acids, due to the use of an alternative in- frame start codon in exon 5. Nuclear localization of the truncated TFII-I was not affected and, contrary to the impact of the mutations in the previous two models, the modification of Gtf2i in these mice did not result in complete homozygous loss, although a reduction in the Mendelian ratio of homogozygous Gtf2itm(Δex2) mice was reported, indicating a higher probability of embryonic failure. 8 % of the homozygous 27

Gtf2itm(Δex2) mice survived to birth and were virtually indistinguishable from their wild type or heterozygous littermates, although a craniofacial dysmorphism was apparent in homozygous and heterozygous mice, showing a short symmetrical snout and midface hypoplasia.

More recently, separation anxiety has also been attributed to Gtf2i dosage using mice that carry different copy numbers of the gene (Mervis et al., 2012). Maternal separation induced anxiety was assessed in mice that had 1, 2, 3 or 4 copies of the

Gtf2ird1 and Gtf2i genes. Separation induced anxiety was found to be significantly increased in mice overexpressing Gtf2i. These findings correlate with findings in humans, as separation anxiety is found at higher rates in patients with 7q11.23 duplication syndrome (Mervis et al., 2012).

As with Gtf2i, there are several knock out mouse lines that have different genetic mutations of Gtf2ird1, which show a variety of some overlapping, and some different, phenotypic traits. The first of these reported resulted from a fortuitous c-myc transgene insertion, which induced a ~40 Kb genomic deletion that removed all of the upstream regulatory sequences and the first exon of Gtf2ird1 (Durkin et al., 2001;

Tassabehji et al., 2005). One potential criticism of this model is that the 40 Kb deletion may have removed regulatory sequences that affect more than just Gtf2ird1 expression.

These mice (Gtf2ird1Tg(Alb1-Myc)166.8Sst) were viable in the homozygous state and, together with their heterozygous littermates, showed several phenotypes relevant to WBS. The homozygous Gtf2ird1Tg(Alb1-Myc)166.8Sst mice showed growth retardation and craniofacial abnormalities (i.e. periorbital fullness and a short snout), with about 20 % having a more severe craniofacial abnormalities, a variation that is also found in human patients.

However, the growth reduction and craniofacial abnormalities were not present in the

Gtf2ird1Tg(Alb1-Myc)166.8Sst heterozygous mice, leading to the conclusion that humans may 28 be more sensitive to the craniofacial effects of GTF2IRD1 haploinsufficiency

(Tassabehji et al., 2005). Homozygous null mice were also shown to have increased brain ventricle volume (van Hagen et al., 2007). Behavioural analysis demonstrated that the homozygous Gtf2ird1Tg(Alb1-Myc)166.8Sst mice showed decreased spontaneous and circadian locomotor activity, diminished motor coordination, strength and gait abnormalities, increased anxiety and an elevated endocrinological response to stress.

The heterozygous mice had lower weight and circadian activity, minor motorcoordination defects and anxiety related behaviour (Schneider et al., 2012).

Two targeted Gtf2ird1 knock out mouse lines have been reported;

CD1x[ICRx129]-Gtf2ird1tm1Lro (Young et al., 2008) and C57BL/6-Gtf2ird1tm1Hrd

(Palmer et al., 2010). Both mutations created deletions that removed exon 2, which contains the start codon of Gtf2ird1, producing viable homozygous null mice that showed no major physical abnormalities. One criticism of these models is that the gene targeting may allow a low level of truncated GTF2IRD1 protein production from the mutant allele. An abnormal Gtf2ird1 transcript is generated and the first ATG start codon is out of frame. However, translation beginning at a secondary start site could result in a truncated in-frame product, which has been detected in an in vitro cell culture model at an efficiency of 3 % of normal levels (Palmer et al., 2010).

Homozygous and heterozygous CD1x[ICRx129]-Gtf2ird1tm1Lro mice (Young et al., 2008) showed mild growth retardation (reduced body weight), decreased anxiety, reduced aggression and increased social interaction. The mice also showed normal spatial learning and no craniofacial abnormalities, which was atributed to low phenotypic penetrance due to the random bred CD1 genetic background of the animals.

Neither model displayed the craniofacial abnormalities described in the homozygous

29

Gtf2ird1Tg(Alb1-Myc)166.8Sst (Durkin et al., 2001; Tassabehji et al., 2005) and in both models the gross brain morphology appeared normal (Palmer et al., 2010; Young et al., 2008).

Gtf2ird1tm1Hrd mice displayed some similar characteristics that reflect features of WBS patients, including increased soft tissue thickness in the nose and lip areas, reduced body weight, and abnormal vocalizations and increased serum corticosterone levels in response to specific non-social stress-inducing cues. Behavioural phenotyping revealed a deficit in motor coordination in the rotarod and hanging wire tests and alterations in motor activity in relation to circadian rhythms with a gender component.

During the dark cycle, which is the most active phase for these nocturnal animals, homozygous Gtf2ird1tm1Hrd were found to be significantly hypoactive, whereas males were significantly hyperactive (Howard et al., 2012). These findings could correlate with the ADHD and sleep disturbances seen in patients, although no gender association has been reported in humans with WBS (Davies et al., 1998).

Gtf2ird1 was also found to be expressed in a number of cell types within the cochlea, and analysis of the hearing capacity of knockout mice showed higher auditory thresholds in the auditory brainstem response and the distortion product of otoacoustic emissions in a broad range of frequencies (Canales et al., 2014). This observation implicates GTF2IRD1 in the cause of the sensitivity to certain auditory stimuli and the sensorineural hearing loss frequently found in WBS patients (Barozzi et al., 2012;

Cherniske et al., 2004; Marler et al., 2005).

A fourth mouse model isolated from a gene-trap mutagenesis screen,

Gtf2ird1Gt(XE465)Byg, was characterised alongside the Gtf2iGt(XE029)Byg mouse line

(Enkhmandakh et al., 2009). This mutant mouse line carries an insertion of the LacZ- neo cassette in intron 22 of Gtf2ird1, which results in the generation of a GTF2IRD1- β-

30 galactosidase-neo fusion protein product that contains the majority of the N-terminal region of GTF2IRD1. As with the Gtf2i gene-trap model, this mouse line displayed more severe phenotypes than the other three Gtf2ird1 mouse models described above.

Homozygous gene-trap mice were embryonic lethal with haemorrhages and severe delayed development, which was apparent from E9.5. The heterozygous mice displayed growth retardation and a subset of animals demonstrated craniofacial and severe skeletal abnormalities, such as kyphosis, which is a pronounced curvature of the spine. It is possible that the increased severity of the gene-trap lines results from a dominant- negative effect of the GTF2IRD1-β-galactosidase-neo fusion peptide. This argument was countered by the assertion that normal nuclear translocation should be inhibited by the truncation of the C-terminal region of GTF2IRD1, which removes the NLS

(Enkhmandakh et al., 2009).

To date, only one mouse model has been generated to study the role of

GTF2IRD2, the third member of the TFII-I protein family. A transgenic mouse was created with a construct that contained the mouse Gtf2ird2 ORF, driven by the human skeletal actin (HSA) promoter/enhancer from the ACTA1 gene, allowing exclusive and high-level expression in skeletal muscle tissue (Palmer et al., 2012). Gtf2ird2 overexpression led to postnatal fibre-type conversion, characterised by an increase in

MyHC type I fibres and a reduction in the numbers of fast type IIa fibres in the soleus and surrounding muscles. Other features observed included reduced body weight, reduced muscle fibre size and increased fibre number. Remarkably, these changes are directly opposite to what is observed in a mouse model generated in the same manner, but overexpressing human GTF2IRD1 (Issa et al., 2006), in which the reverse shift towards fast fibre conversion is observed. These data suggest that GTF2IRD2 acts in a manner that is directly antagonistic to GTF2IRD1 (Palmer et al., 2012).

31

1.3 The TFII-I protein family

As mentioned previously, GTF2IRD1 and GTF2I have been implicated as the genes that are responsible for most of the phenotypes in WBS (Antonell et al., 2010a;

Hirota et al., 2003). These genes are part of the TFII-I family of transcription factors, that also includes GTF2IRD2, which cluster at the telomeric end of the WBS typical deletion. Two characteristic motifs are found in this family of proteins; the leucine zipper, essential for hetero/homodimerisation, and the I-repeat domains that have been predicted to adopt a helix-loop-helix structure. The I-repeat domains are known to mediate DNA binding but may also be involved in protein interaction and dimerisation

(Roy et al., 1997; Vullhorst and Buonanno, 2003b). A schematic representation of these key structural features is shown in figure 1.3.

GTF2I was the first member identified and described, while GTF2IRD2 is the most recent addition. Only the N-terminal of GTF2IRD2 is like the other members of the family; the C-terminal region contains a transposon-like element (CHARLIE8) that has been inserted in-frame (Tipney et al., 2004). Both GTF2I and GTF2IRD2 are also present as pseudogenes within the LCRs of region 7q11.23; named GTF2IP and

GTF2IRD2P, they lie in centromeric block B (Perez Jurado et al., 1998; Tipney et al.,

2004).

Besides the shared protein domains, the TFII-I family shows evidence of common regulatory mechanisms at the protein level; sharing motifs for protein degradation and stability such as PEST (rich in [P], [E], serine [S], and

[T]), SUMO (small ubiquitin-like modifier) and SCM (synergy control motif), highlighting the requirement for tight control of the amount of protein present in the cell

(Hinsley et al., 2004).

32

This figure has been removed

due to copyright restrictions.

Figure 1.3 Key structural features of the TFII-I protein family.

Schematic representation of the protein structure of the human TFII-I protein family, including the characteristic I-repeat domains (R1-5) and the well conserved N-terminal leucine zipper (LZ). For TFII-I, a basic region (BR) is indicated adjacent to R2, and for

GTF2IRD1, the position of a polyserine tract of is depicted (S-S). The nuclear localisation signal (NLS) is shown for both TFII-I and GTF2IRD1. The distinctive

Charlie8 transposon-like domain is shown at the C-terminal of GTF2IRD2 (Reprinted from Roy (2012)).

33

1.3.1 TFII-I

First identified as a transcription initiation factor that binds to Inr and E-box elements (Roy et al., 1991; Roy et al., 1997), TFII-I exists in four isoforms (named α, β,

γ, Δ) which are distributed between cytoplasm and nucleus (Hakre et al., 2006). All the isoforms are able to bind to DNA (Cheriyath and Roy, 2000) and a wide range of gene targets have been reported. One of the most prominent is the c-fos promoter

(Grueneberg et al., 1997) and this interaction has been reported to be essential for cell cycle entry (Roy, 2007). The dimerisation of different isoforms, leads to diverse heterodimers that have differential regulation of the genes targets (Cheriyath and Roy,

2000).

Besides its established function inside the nucleus, TFII-I is also thought to act as a cytoplasmic regulator, involved in calcium entry into the cell (Caraveo et al., 2006;

Roy, 2006). Evidence indicates that cytoplasmic TFII-I is able to regulate the activity of the surface calcium channel TRPC3, competing for the binding of phospholipase C

(PLC), thus inhibiting calcium entry into the cell (Caraveo et al., 2006). Isoforms of

TFII-I have also been shown to be tethered in the cytoplasm by Bruton’s tyrosine kinase, a key enzyme in calcium signalling and phospholipid metabolism (Yang and

Desiderio, 1997). of TFII-I has been shown to lead to nuclear translocation.

TFII-I has been implicated in other signalling pathways, including the activation and repression of VEGFR2 and cyclin D1 in angiogenesis and cell proliferation. The molecular action of TFII-I is thought to be exerted by sequence- specific binding to the promoters of VEGFR2 and cyclin D1 through Inr elements, in response to serum stimulation (Desgranges and Roy, 2006; Wu and Patterson, 1999).

34

TFII-I has also been associated with regulation of genes involved in the TGFβ signalling pathway, including Smad5 and Bmp2, and factors that play important roles in epigenetic regulation, such as components of the PRC2 complex (Polycomb Repressive

Complex 2), Ezh2 and Eed (reviewed by Bayarsaihan et al. (2012)).

TFII-I is a ubiquitous protein that it is widely expressed in embryos and uniformly found in the brain; while in the adult mouse brain the protein is found exclusively in neurons and is highly expressed in the cortex, hippocampal neurons and cerebellar Purkinje cells (Danoff et al., 2004). A wide number of TFII-I protein interactions have been characterised so far, including factors involved in transcriptional regulation, chromatin remodelling, histone modification and cell signalling (as reviewed by Roy (2012)). These interactions constitute complex connections between interactions at the protein level and target gene regulation through DNA binding, and illustrate the diversity of TFII-I’s molecular role. The protein partners play important roles in cell regulation, such as Elk-1, STAT1, STAT3 and SRF (Kim et al., 1998); PIASxβ (Tussie-

Luna et al., 2002b); HDAC3 (Tussie-Luna et al., 2002a) and a HDAC1/2/BHC110- containing chromatin remodelling complex that includes the Co-REST, Sin3,

RBAP46,48 components and the zinc finger proteins, ZMYM2 and ZMYM3 (Hakimi et al., 2003). Furthermore, studies in induced pluripotent stem cells (IPSCs) derived from

WBS and 7q11.23 duplication patients, showed a role for TFII-I in chromatin regulation through interactions with the histone demethylase LSD1 and the histone deacetylase

HDAC2, forming a repressive complex that mediated the regulation of a subset of genes

(Adamo et al., 2015).

Recently, several studies have shown different phenotypic associations with polymorphisms in the GTF2I gene. A missense mutation was found to occur at an elevated frequency in patients with thymomas, but correlated with improved survival 35

(Petrini et al., 2014). Familial genetic association studies found two SNPs (single nucleotide polimorphisms) in GTF2I associated with autism (Malenfant et al., 2012).

Furthermore, these same SNPs were also found in healthy individuals, although assessment indicated an association with low social anxiety and communication skills

(Crespi and Hurd, 2014).

Our current understanding of TFII-I is the most comprehensive for this protein family and encompasses a wide range of cellular processes and pathways. The embryonic lethality of the Gtf2i knock outs suggests that this gene has non-redundant functions. However, the sequence similarity of the other family members demonstrates that they have evolved from a common ancestor and conserve some of the same functional elements. Therefore, it is likely that GTF2IRD1 and GTF2IRD2 should have overlapping roles and intertwined mechanistic pathways may be expected.

1.3.2 GTF2IRD2

The third member of the TFII-I family, is the most recently described and the most recent from an evolutionary point of view also. It is structurally quite different as it only contains two I-repeats in the N-terminal half of the protein, while the remaining

C-terminal half is taken up by an in-frame CHARLIE8 transposon-like element (Tipney et al., 2004). Sequence analysis shows that the preserved elements of the N-terminal region are more closely related to GTF2I than to GTF2IRD1 (Makeyev et al., 2004).

The gene is located in the duplicated blocks of the LCRs and, therefore, two loci for

GTF2IRD2 have been identified at the telomeric end of the 7q11.23 region, which produce the variants GTF2IRD2α and GTF2IRD2β (Tipney et al., 2004).

36

RT-PCR analyses have indicated that human GTF2IRD2 is ubiquitously expressed, being detectable in developing embryonic organs as well as adult brain, bone, muscle, testis, and a variety of other tissues (Tipney et al., 2004). At the cellular level, recombinant GTF2IRD2 localises to the microtubules and also inside the nucleus, in a distinct punctate pattern that overlaps with the localisation of GTF2IRD1 and TFII-

Iβ generated from transfected expression constructs (Palmer et al., 2012). Functional studies also illustrated the potential for direct interaction between GTF2IRD2 and the other family members, although homodimerisation was the most favoured structure, followed by interactions with TFII-Iβ (Palmer et al., 2012). Based on the antagonistic behaviour of GTF2IRD2 to TFII-I and GTF2IRD1 in transgenic mouse systems, it was proposed that GTF2IRD2 may inhibit the function of its other family members by such direct interactions (Palmer et al., 2012).

1.3.2 GTF2IRD1

1.3.2.1 Molecular aspects of GTF2IRD1

1.3.2.1.1 DNA binding properties

GTF2IRD1 was first reported as a protein that binds to sequences within a DNA enhancer element of the Troponin I slow (TNNI1) gene using a yeast one-hybrid screen

(O'Mahoney et al., 1998b) under the original name of MusTRD1. Using the same experimental approach, other studies also isolated GTF2IRD1 binding to the Xenopus goosecoid (GSC) promoter and regulatory sequences of the Hoxc8 gene, although at the time, these groups chose to refer to the protein as XWBSCR11 and BEN respectively

(Bayarsaihan and Ruddle, 2000a; Ring et al., 2002). Transactivation studies have indicated that GTF2IRD1 acts mainly as a transcriptional (Masuda et al., 37

2014; O'Mahoney et al., 1998b; Polly et al., 2003a), although activation of goosecoid expression has been reported for the xenopus orthologue of GTF2IRD1 (Ring et al.,

2002) and for both positive and negative regulation of an immunoglobulin heavy-chain promoter (Tantin et al., 2004).

Several other gene targets have been identified for GTF2IRD1 (Chimge et al.,

2012; Makeyev and Bayarsaihan, 2009; Masuda et al., 2014), but definition of a universal binding site has proved elusive, probably because binding may occur in a context specific manner. ChIP analyses have indicated that GTF2IRD1 binds to the fibroblast growth factor gene Fgf15 in C2C12 mouse myoblasts and an 8 bp core consensus sequence was proposed based on this study GTF2IRD1 (Lazebnik et al.,

2008). The DNA binding properties of GTF2IRD1 have also been studied in detail through the knowledge that this protein is capable of negatively autoregulating its own transcription both in vitro and in vivo (Palmer et al., 2010). GTF2IRD1 was shown to bind to the GTF2IRD1 upstream region (GUR) in a specific-sequence manner through 3 highly-conserved GGATTA motifs within the GUR. Using the electrophoretic mobility shift assay (EMSA), it was demonstrated that GTF2IRD1 requires at least two of the canonical GGATTA sequences in order for the protein to bind to DNA, and they must be no more than 57 bp apart, otherwise binding becomes undetectable (Palmer et al.,

2010) (figure 1.4). This is further supported by luciferase transactivation experiments where GTF2IRD1 binds to GUR and represses the reporter expression (Palmer et al.,

2010). This negative autoregulatory mechanism leads to elevated production of the mutant transcript in knock out mice and an increased level of GTF2IRD1 transcript in lymphoblasts obtained from WBS patients, such that it exceeds the expected level of 50

% of normal. However, measurements of GTF2RD1 transcript in fibroblasts or IPSCs from WBS patients indicate that levels are much closer to the expected 50 % than is

38 found in lymphoblastoid cell lines (Adamo et al., 2015; Merla et al., 2006), suggesting that the degree of autoregulation varies in a tissue specific manner.

1.3.2.1.2 Protein structure and function

The protein structure of human GTF2IRD1 includes a leucine zipper, essential for dimerisation, and five I-repeats (RD1-5) (as shown in figure 1.3). At the N-terminus, a transcriptional activation domain has been described and four RDs were found to have the potential to bind to DNA, while RD1 does not show any DNA binding properties

(Polly et al., 2003a; Yan et al., 2000). Data from EMSA studies imply that more than one RD binds to each of the GGATTA motifs simultaneously and that presence of the leucine zipper is required in order to achieve dimeric binding to the GUR (Palmer et al.,

2010).

Evidence indicates that GTF2IRD1 acts as a repressor of its family member

TFII-I (Tussie-Luna et al., 2001) and is able to interact with it directly. However, despite both proteins being imported into the nucleus when they are overexpressed in cell lines, forced coexpression results in TFII-I being excluded from the nucleus, causing repression of c-fos gene. A polyserine tract near the C-terminus of GTF2IRD1 has been shown to be responsible for the TFII-I change of localisation, since the effect is lost when this region is deleted.

Amino acidic sequence analysis indicates that GTF2IRD1 contains putative

PEST and SUMO motifs (Hinsley et al., 2004), which are related to reduced protein half-life, transcriptional regulation and protein interactions. However, little more can be learned from predictive sequence analysis as most of the conserved domains are unique to the TFII-I family. 39

Experimental evidence of GTF2IRD1 function is also limited. The E3 SUMO ligase, PIASxβ (PIAS2), and the histone deacetylase, HDAC3, have been found to be interacting partners of GTF2IRD1, hence linking GTF2IRD1 with histone modification and the SUMO pathway (Tussie-Luna et al., 2002a; Tussie-Luna et al., 2002b). Given the presence of a putative SUMO motif in GTF2IRD1, it is possible that the interaction with PIASxβ results in GTF2IRD1 SUMOylation. This result was confirmed experimentally in assays that showed GTF2IRD1 can be SUMOylated by the E2 SUMO ligase, UBC9, and its SUMOylation level is enhanced by the presence of PIASxβ

(Widagdo et al., 2012). GTF2IRD1 SUMOylation was shown to have an effect over

GTF2IRD1 protein interactions with proteins that have SUMO interacting motifs

(SIMs), as SUMOylation increases the affinity of the interaction between GTF2IRD1 and PIASxβ or ZMYM5 (Widagdo et al., 2012). Beyond these data, little is known regarding the protein partners of GTF2IRD1, apart from evidence indicating that it can also bind to the retinoblastoma 1 (RB1) protein (Yan et al., 2000).

1.3.2.1.3 Expression analyses

Western blot analyses of endogenous GTF2IRD1 (designated CREAM1 in this publication), showed low expression levels in a normal monkey kidney cell line (CV1)

(Yan et al., 2000). Transfected constructs containing cDNA encoding GTF2IRD1 showed localisation of the recombinant protein to the (Yan et al., 2000). In the mouse, Gtf2ird1 expression was found to be particularly high during development in a wide variety of tissues, including cartilage, skeletal muscle, the central nervous system, teeth and hair follicles (Palmer et al., 2007). In adult mice, the expression pattern becomes much more restricted to brown adipose tissue, peripheral ganglia and

40 regions of the brain, such as the olfactory bulb, the Purkinje neurons of the cerebellum, layer II of the piriform cortex, the striatum and layer V of the cortex (Howard et al.,

2012).

41

Figure 1.4 Model for DNA binding of GTF2IRD1 to its own promoter region.

Three conserved consensus sequences exist at the GTF2IRD1 upstream region (GUR), at least two are required to achieve any level of GTF2IRD1 binding. EMSA evidence indicates that two of the five RDs bind to the GUR sites in monomeric interactions, while the formation of a dimer on the GUR is dependent on the presence of the leucine zipper domain (adapted from Palmer et al. (2010)).

42

Aims of this thesis

Although there are some clues concerning the function of GTF2IRD1 from several lines of evidence, a comprehensive analysis of the interactional network of GTF2IRD1 is needed to determine the specific molecular function and pathways in which this protein is involved, which will provide a molecular basis for the mechanisms associated with the features of WBS. Characterizing GTF2IRD1’s partners, and uncovering the complex networks upstream and downstream of this protein, will provide a deeper understanding of the pathogenesis of WBS, but it will also generate testable hypotheses explaining the properties of the TFII-I family and the contribution of GTF2IRD1 to the development and function of the brain and the control of mood and behaviour in humans.

Therefore, this study aims to find answers related to:

I. The endogenous localisation of human GTF2IRD1 (chapter 3)

II. The GTF2IRD1 interactional network (chapter 4)

III. Effects of GTF2IRD1 on gene regulation (chapter 5)

IV. The functional consequences of GTF2IRD1 interactions in the context of a

protein complex (chapter 6)

43

CHAPTER 2 - MATERIALS AND METHODS

44

2.1 Materials

A list of key reagents is presented in table 2.1. All the primary antibodies utilised in this work are catalogued in table 2.2. The plasmid constructs are described in table 2.3. A list of all the oligonucleotides used for ORF cloning is shown in table 2.4, and the primers used for quantitative PCR (qPCR) are listed in table 2.5.

Table 2.1 Commercially available reagents and kits.

Reagent Manufacturer Acid-washed glass beads 425-600 μm Sigma 30 % Acrylamide/bis-acrylamide 29:1 solution Bio-Rad Laboratories Antartic phosphatase 5000 units/mL New England Biolabs, Inc Anti-goat secondary antibody-horse radish Abcam, PLC peroxidase (HRP) linked Veryblot Anti-mouse secondary antibody HRP linked Abcam, PLC Veryblot Bicinchoninic acid (BCA) protein assay kit Thermo Scientific Pierce Clarity Bio-Rad Laboratories Drop-out (DO) amino acid supplements for yeast selection media; single (SDO), -Leu or -Trp, double Clontech Laboratories (DDO) -Leu/-Trp or quadruple (QDO), –Ade/– His/–Leu/–Trp Donkey Anti Goat Alexa Fluor 488 2ry antibody Molecular Probes Donkey Anti Goat Alexa Fluor 594 2ry antibody Molecular Probes Donkey anti-goat IgG-HRP 2ry antibody Santa Cruz Biotech Donkey Anti Rabbit Alexa Fluor 488 2ry antibody Molecular Probes Jackson Immunoresearch Donkey Anti Rabbit IgG HRP Linked 2ry antibody Laboratories Duolink In Situ Proximity Ligation Assay (PLA) Olink AB kit

45

Table 2.1 Continued. Commercially available reagents and kits.

Reagent Manufacturer Fab fragment affinity-purified goat anti-rabbit Jackson Immunoresearch antibody Laboratories Goat Anti Mouse Alexa Fluor 568 2ry antibody Molecular Probes Goat Anti Mouse IgG HRP 2ry antibody DAKO Goat Anti Rabbit IgG Alexa Fluor 555 2ry Molecular Probes antibody Histone deacetylase (HDAC) fluorometric Abcam, PLC activity assay kit Immun-Blot PVDF Membrane Bio-Rad Laboratories Lipofectamine LTX Life Technologies Lipofectamine 2000 Life Technologies LongAmp Taq DNA Polymerase New England Biolabs, Inc Luminata Forte, Western HRP Merck Millipore chemiluminescence substrate Mate and Plate Universal Mouse (Normalized) Clontech Laboratories Yeast two-hybrid library Mate and Plate Human Brain (Normalized) Clontech Laboratories cDNA Yeast two-hybrid library Minimal media synthetic defined (SD) Base Clontech Laboratories Yeast M-MLV Reverse Transcriptase, RNase H Minus, Promega corporation Point Mutant

NEB 5-alpha Competent E. coli (Subcloning New England Biolabs, Inc Efficiency and High Efficiency) NEB 5-alpha Electrocompetent E. coli New England Biolabs, Inc Recombinant human Noggin protein Peprotech Oligo(dT) 15 Primer Promega Corporation Gibco Opti-MEM Media Life Technologies Phusion High-Fidelity DNA Polymerase New England Biolabs, Inc

46

Table 2.1 Continued. Commercial available reagents and kits.

Reagent Manufacturer Pure Proteome Protein A/G magnetic beads Merck Millipore PureYield Plasmid Maxiprep System Promega Corporation ProLong Gold Antifade reagent with DAPI Molecular Probes Protease inhibitor cocktail (for use with mammalian cell and tissue extracts, DMSO Sigma solution) QIAprep Spin Miniprep Kit Qiagen RNeasy Mini Kit Qiagen SB431542 Activin receptor-like kinase inhibitor Tocris SsoFast EvaGreen real-time PCR Supermix Bio-Rad Laboratories, Inc T4 DNA ligase New England Biolabs, Inc Taq DNA Polymerase with ThermoPol Buffer New England Biolabs, Inc Tri- reagent Sigma Wizard SV gel and PCR Clean-Up system Promega Corporation Yeast extract peptone dextrose (YPD) Medium Clontech Laboratories

47

Table 2.2 Primary antibodies.

All antibodies are listed according to the target protein (Antibody). Dilutions are shown according to the type of experiments carried out, including immunofluorescence and/or proximity ligation assay (IF/PLA), Western blot (WB) and coimmunoprecipitation (co-

IP).

Dilution Antibody Source Product # Host IF / WB Co-IP PLA β-tubulin Merck MAB1637 mouse 1:200 - - III Millipore Coilin Abcam ab11822-50 mouse 1:800 - - DCAF6 Bethyl A302-435A rabbit 1:200 - - EZH2 Active Motif 39876 mouse 1:500 - - FLAG Sigma F7425 rabbit 1:400 - - 5µg/10cm GFP Abcam ab290 rabbit - 1:3000 plate GTF2IRD Santa Cruz sc-14714 goat - 1:500 - 1 GTF2IRD 3µg/10cm Bethyl A301-333A-1 rabbit 1:1000 - 1 plate GTF2IRD H00084163- Abnova mouse - 1:200 - 2 B01P MMS-101R- HA Covance mouse 1:1000 - - 500 Cell HDAC1 Signaling 5356 mouse 1:500 - - Technology Cell HDAC2 Signaling 5113 mouse 1:1000 - - Technology HP1α Active Motif 39978 mouse 1:500 - -

48

Table 2.2 Continued. List of primary antibodies used in this study.

Dilution IF / Antibody Source Product # Host WB Co-IP PLA HP1β Active Motif 39980 mouse 1:500 - - HP1γ Active Motif 39982 mouse 1:500 - - H3 tri- Abcam ab12209 mouse 1:200 - - methyl K4 H3 tri- Active Motif 39286 mouse 1:100 - - methyl K9 H3 di/tri- methyl Active Motif 39538 mouse 1:500 - - K27 LAP2 Abcam ab11823 mouse 1:500 - - Novus LSD1 NB100-1762 mouse 1:500 - - Biologicals Thermo MAP2ab MA5-12823 mouse 1:200 - - Scientific Novus MBD1 100B272.1 mouse 1:100 - - Biologicals Myc clone Sigma M4439 mouse 1:500 1:500 - 9E10 Custom made. Gift from Dr Archa Fox, NONO - mouse 1:500 - - The University of Western Australia NPC Abcam ab24609 mouse 1:500 - -

49

Table 2.2 Continued. List of primary antibodies used in this study.

Dilution IF / Antibody Source Product # Host WB Co-IP PLA PML Santa Cruz sc-9862 goat 1:500 - - Cell Signaling RBAP46 6882 rabbit 1:600 - - Technology RNA polymerase II CTD Abcam ab5408 mouse 1:1000 - - domain phospho serine 5 SC-35 Abcam ab11826 mouse 1:200 - - H00006667- SP1 Abnova mouse 1:500 - - M02 Cell Signaling TFII-I 4562 rabbit - 1:500 - Technology TFII-I Santa Cruz sc-9943 goat 1:500 - - ZC4H2 Abcam ab88814 mouse 1:100 - - ZMYM2 Bethyl A301-710A rabbit 1:500 - - ZMYM3 Bethyl A300-200A rabbit 1:500 - -

50

Table 2.3 Plasmids.

Constructs containing the vectors pGADT7, pGBKT7 and pACT2 were used for yeast two-hybrid experiments. For experiments involving transfection of HeLa cells, plasmids encoding fusion proteins with the epitope tags Myc, FLAG or HA or the fluorescent protein EGFP were used. Some constructs were generated by other laboratory members at UNSW Australia or were gifts from the researchers specified in the column Insert origin. Species origin of the protein encoded by the insert is indicated as mouse (m) or human (h).

Protein Vector Used tag Insert origin mAKIRIN2 pGADT7 GAL4-AD Amplified by PCR from mouse cDNA hALMS1 pGADT7 GAL4-AD Truncated clone rescued from yeast DNA Gift from Dr Thomas Hearn, Human pCMV- hALMS1 HA Genetics Division, University of HA Southampton, UK. mARMCX5 pGADT7 GAL4-AD Constructed by Widagdo (2011) mARMCX5 pRK5 Myc Constructed by Widagdo (2011) mBBS4 pGADT7 GAL4-AD Truncated clone rescued from yeast DNA mDCAF6 pGADT7 GAL4-AD Amplified by PCR from mouse cDNA pEGFP- mDCAF6 EGFP Subcloned from pGADT7-mDcaf6 C2 mELF2 pGADT7 GAL4-AD Truncated clone rescued from yeast DNA mFAM47E pGADT7 GAL4-AD Truncated clone rescued from yeast DNA mFHAD1 pGADT7 GAL4-AD Constructed by Widagdo (2011) mFBWX10 pGADT7 GAL4-AD Truncated clone rescued from yeast DNA Amplified by PCR from human cDNA hGTF2IRD1 pGBKT7 GAL4-BD (Widagdo et al., 2012)

51

Table 2.3 Continued. List of plasmids used in this study.

Protein Vector Used tag Insert origin GAL4- Subcloned from pCDNA3.1- hGTF2IRD1 pGADT7 AD hGTF2IRD1 Amplified by PCR from human cDNA hGTF2IRD1 pCDNA3.1 Myc (Widagdo et al., 2012) Subcloned from pCDNA-hGTF2IRD1 hGTF2IRD1 pEGFP-C3 EGFP (Widagdo et al., 2012) hGTF2IRD1- GAL4- Amplified by PCR from human cDNA pGBKT7 RD1 BD (Widagdo et al., 2012) hGTF2IRD1- GAL4- Amplified by PCR from human cDNA pGBKT7 RD2 BD (Widagdo et al., 2012) hGTF2IRD1- GAL4- Amplified by PCR from human cDNA pGBKT7 RD3 BD (Widagdo et al., 2012) hGTF2IRD1- GAL4- Amplified by PCR from human cDNA pGBKT7 RD4 BD (Widagdo et al., 2012) hGTF2IRD1- GAL4- Amplified by PCR from human cDNA pGBKT7 RD5 BD (Widagdo et al., 2012) hGTF2IRD1- GAL4- Amplified by PCR from human pGBKT7 SUMO1 BD cDNA (Widagdo et al., 2012) hGTF2IRD1- GAL4- Amplified by PCR from human cDNA pGBKT7 SUMO2 BD (Widagdo et al., 2012) hGTF2IRD1- GAL4- Amplified by PCR from human cDNA pGBKT7 LZ BD (Widagdo et al., 2012) mHDAC1 pCMV-Myc Myc Amplified by PCR from mouse cDNA mHDAC2 pCMV-Myc Myc Amplified by PCR from mouse cDNA mHOMEZ pGADT7 GAL4-AD Amplified by PCR from mouse cDNA mHOMEZ pEGFP-C2 EGFP Amplified by PCR from mouse cDNA hINST12 pGADT7 GAL4-AD Amplified by PCR from human cDNA hINTS12 pEGFP-C2 EGFP Amplified by PCR from human cDNA Truncated clone rescued from yeast mKPNA1 pGADT7 GAL4-AD DNA

52

Table 2.3 Continued. List of plasmids used in this study.

Protein Vector Used tag Insert origin Truncated clone rescued from yeast mKPNA2 pGADT7 GAL4-AD DNA Truncated clone rescued from yeast hKPNA3 pGADT7 GAL4-AD DNA Truncated clone rescued from yeast mKPNA4 pGADT7 GAL4-AD DNA Gift from Dr Irina Stancheva, Wellcome hMBD1 pGBKT7 GAL4-BD Trust Centre for Cell Biology, University of Edinburgh, UK. Gift from Prof. Gerd Pfeifer, hMBD2 pACT2 GAL4-AD Department of Cancer Biology, City of Hope, USA. hMBD3L1 pGADT7 GAL4-AD Subcloned from pEGFP-N2-hMBD3L1 Gift from Prof. Gerd Pfeifer, hMBD3L1 pEGFP-N2 EGFP Department of Cancer Biology, City of Hope, USA. hMCAF1/ pGADT7 GAL4-AD Subcloned from pEGFP-C3-hMCAF ATF7IP Gift from Prof. Mitsuyoshi Nakao, hMCAF1/ pEGFP-C3 EGFP Institute of Molecular Embryology and ATF7IP Genetics, Kumamoto University, Japan mNAP1L2 pGADT7 GAL4-AD Amplified by PCR from mouse cDNA mNAP1L2 pEGFP-C3 EGFP Amplified by PCR from mouse cDNA Amplified by PCR from mouse cDNA mOPHN1 pGADT7 GAL4-AD (Widagdo, 2011) Truncated clone rescued from yeast mPARPBP pGADT7 GAL4-AD DNA Truncated clone rescued from yeast mPIAS1 pGADT7 GAL4-AD DNA pCMV- Gift from Prof. Ke Shuai, UCLA, USA. mPIAS1 FLAG FLAG (Addgene #15206)

53

Table 2.3 Continued. List of plasmids used in this study.

Protein Vector Used tag Insert origin Gift from Prof. Frans Van Roy, Department for Molecular hPKP1 pGADT7 GAL4-AD Biomedical Research, VIB-Ghent University, Belgium Gift from Prof. Frans Van Roy, Department for Molecular hPKP1 pEGFP-C2 EGFP Biomedical Research, VIB-Ghent University, Belgium Subcloned from pFLAG-CMV-5- hPKP2a pGADT7 GAL4-AD hPKP2a Gift from Prof. Kathleen J. Green, Department of Pathology, pFLAG- hPKP2a FLAG Northwestern University Feinberg CMV-5 School of Medicine, USA. (Addgene #32230) Truncated clone rescued from yeast mSCNM1 pGADT7 GAL4-AD DNA Amplified by PCR from mouse mSETD6 pGADT7 GAL4-AD cDNA pEGFP-C1, Subcloned from pGADT7-mSetd6 mSETD6 EGFP excised, Myc (Widagdo, 2011) Myc inserted Amplified by PCR from mouse mTAF1B pGADT7 GAL4-AD cDNA (Widagdo, 2011) pEGFP-C1, Subcloned from pGADT7-mTaf1b mTAF1B EGFP excised, Myc (Widagdo, 2011) Myc inserted Truncated clone rescued from yeast mTRIP11 pGADT7 GAL4-AD DNA Truncated clone rescued from yeast mUSP20 pGADT7 GAL4-AD DNA

54

Table 2.3 Continued. List of plasmids used in this study.

Protein Vector Used tag Insert origin Gift from Prof. Wade Harper, pDEST-LTR- Department of Cell Biology, Harvard hUSP20 FLAG N-FLAG-HA Medical School, USA. (Addgene #22573) Truncated clone rescued from yeast mUSP33 pGADT7 GAL4-AD DNA mZC4H2 pEGFP-C2 EGFP Subcloned from pGADT7-mZc4h2 Gift from Dr Vera Kalscheuer, Department Human Molecular hZC4H2 pEGFP-C2 EGFP Genetics, Max Planck Institute for Molecular Genetics, Germany Subcloned from pCS2-MT-FA- hZMYM2 pGADT7 GAL4-AD hZMYM2 Gift from Prof. Hongtao Yu, Department of Pharmacology, Howard pCS2-MT- hZMYM2 Myc Hughes Medical Institute, University FA of Texas Southwestern Medical Center, USA. mZMYM3 pGADT7 GAL4-AD Amplified by PCR from mouse cDNA mZMYM3 pEGFP-C2 EGFP Subcloned from pGADT7-mZmym3 Empty vector pEGFP-C1 EGFP Clontech Laboratories Empty vector pEGFP-C2 EGFP Clontech Laboratories Empty vector pGADT7 GAL4-AD Clontech Laboratories Empty vector pGBKT7 GAL4-BD Clontech Laboratories

55

Table 2.4 Oligonucleotide primers used for cloning open reading frames (ORFs).

Official gene nomenclature indicates the name of the gene (uppercase for human genes, lowercase for mouse symbols). Each primer incorporates a restriction enzyme recognition sequence (underlined) to facilitate insertion into the vector in-frame.

PCR Forward Primer (5-3’) Reverse Primer (5’-3’) target AATGAATTCATGGCGTGCGGAG GCGGGATCCTCATGAAAC Akirin2 CC ATAACTAGCAG TTGAATTCATGGCTCGGAGTGG TTGGATCCTTATTCTTCAT Dcaf6 CTCCTG CCTCATTTTC AATGAATTCATGAGTCCTAATA CGGGATCCTCAGTCCCAT Homez AAGATGCCA ATGATGA TTGAATTCATGGCTGCTACTGT TTGGATCCTTACTTCTTGA INST12 GAACTT GTTTCTTTTG Nap1l2 AATGAATTCATGGCTGAATCAG AATCTCGAGTTAACGATC (pGADT7) TCG AATATCTTCT Nap1l2 AATGAATTCAAATGGCTGAATC AACCGCGGTTAACGATCA (pEGFP- AGTCG ATATCTTCT C3) AATGAATTCATGGCAGATGAAC TAAGGATCCTCATTCATCT Zc4h2 AAGAAATCA TGCTTCCGT

TTGAATTCATGGACCCCAGTGA TTATCGATTCAGTCTAGGT Zmym3 TTTCCC CTTCTTCCC TTGAATTCATGGCGCAGACTCA TTGGATCCTCAGGCCAAC Hdac1 GGGCACC TTGACCTCTTC TTGAATTCATGGCGTACAGTCA TTGGATCCTCAAGGGTTG Hdac2 AGGAGGC CTGAGTTGTTC

56

Table 2.5 Oligonucleotide primers used for quantitative PCR.

Mouse gene targets are indicated by lowercase symbols and human genes by uppercase symbols.

PCR Forward Primer (5-3’) Reverse Primer (5’-3’) target

FOS CCGGGGATAGCCTCTCTTAC TGGTCGAGATGGCAGTGAC Fos ACTTTTCGCCAGATCTGTCC GTTCCCTTCGGATTCTCCGT Fosl2 CCAGCAGAAGTTCCGGGTAG GTAGGGATGTGAGCGTGGATA Egr2 TTGACCAGATGAACGGAGT CGCACTCACAATATTGATGATAC ELK3 GTCCTCCTAGAAATTCCCCC AGGTCCAGCAGATCAAATGC Eya1 AGGAAAGCTGTTTTGAGAGGA ACAGGTACTCTAATTCCAAGGC HDAC1 TAAATTCTTGCGCTCCATCC AACAGGCCATCGAATACTGG HDAC2 ATGGCGTACAGTCAAGGAGG ATGAGGCTTCATGGGATGAC Hdac1 GGGCACCAAGAGGAAAGTCT AGCAAATTGTGAGTCATGCG Hdac2 CATGGCGTACAGTCAAGGAG TCATCCGGATTCTATGAGGC HPRT GGTGGAGATGATCTCTCAAC CTTTTCACCAGCAAGCTTGC Hprt AGCTTGCTGGTGAAAAGGAC TCAACTTGCGCTCATCTTAG KAT2B CCAGCAAAAGAAAGGCAAAC AGTGAAGACCGAGCGAAGCA NDRG1 CTGAGGTGAAGCCTTTGGTG ACAGCGTGACGTGAACAGAG Nr4a1 GCTCATCTTCTGCTCAGG AAT GCG ATT CTG CAG CTC Nr4a2 TCCGGTGAGTCTGATCAGT CTGATGATCTCCATAGAGCC Nr4a3 CTTGCAGAGCCTGAACCTTGA CAGGACCTTAGGCTCCGA Pkp2 GACACAGTCCCAAGTACTG GTGATGCAGCTCTGTATGTG Ubash3b CTGATTGTGGCCCACGCAT TCTGTCAGTTGCCATATTCCA Zbtb16 GTGACCACCCATATGAGTGTG CTTGATCATGGTCGAGTAGTC

57

2.2 Methods

2.2.1 Molecular Biology

2.2.1.2 Polymerase chain reaction amplification (PCR)

Most routine PCR reactions were performed using Taq DNA Polymerase with

ThermoPol Buffer (New England Biolabs, Inc) following the product manual. For cDNA cloning, Phusion High-Fidelity DNA Polymerase (New England Biolabs, Inc) was utilised according to manufacturer’s protocol, using cDNA templates (see 2.2.1.11

Reverse transcription) from different mouse tissues or HeLa cells for human sequences.

For long amplicons, LongAmp Taq DNA Polymerase (New England Biolabs, Inc) was chosen.

2.2.1.3 Restriction endonuclease digestion

For digestions of PCR products or plasmid DNA, reactions were set up in a final volume of 30 μL. The amount of restriction enzymes (New England Biolabs, Inc) and substrate, as well as the incubation time at 37° C, were adjusted according to the purpose of the digestion.

2.2.1.4 DNA electrophoresis

DNA was analysed by agarose gel electrophoresis, where agarose concentration

(0.8-1.5%) was chosen accordingly to the size of the DNA fragments to separate.

Typically, 1% gels were prepared by dissolving 1 g of agarose in 100 mL of TAE buffer with 0.1 μg/mL ethidium bromide. Electrophoresis was performed in 1x TAE, at a

58 constant voltage of 100 V for 40 min, then DNA was visualised using an UV transilluminator and images were captured using a Bio-Rad Gel Doc EZ system.

2.2.1.5 Purification of DNA from gel or solution

DNA fragments or PCR products were purified from agarose gels, or directly from PCR reactions, using the Wizard SV gel and PCR Clean-Up system (Promega

Corporation). Excised gel pieces or PCR products were processed and eluted according to the product manual.

2.2.1.6 Ligation of DNA fragments

Ligations were performed using T4 DNA ligase (New England Biolabs, Inc) with a 1:3 molar ratio of vector to insert, according to the product instructions.

Ligations were performed for 2 hours at room temperature or overnight at 4° C. Room temperature ligation was generally used for simple ligations or blunt-end ligations. For blunt-end fragments, a dephosphorylation step was sometimes included to prevent self- ligation using 1 μL of Antartic phosphatase (New England Biolabs, Inc) for up to 5 μg of DNA. The reaction was incubated at 37° C for 15 min and the enzyme was inactivated at 70° C for 5 min.

2.2.1.7 Transformation of E. coli with plasmid DNA

Heat shock transformation of E. coli was performed using NEB 5-alpha

Competent E. coli (subcloning efficiency or high efficiency, sourced from New England

Biolabs, Inc) according to the manufacturer’s protocol. Transformed were 59 plated on LB-agar plates with the appropriate antibiotic selection. Resistant colonies were picked and the presence of the insert of interest in isolated plasmid clones was screened by specific restriction endonuclease digestions or by a targeted PCR assay using a small sample from the bacterial colony. Positive clones were further tested for fidelity of the desired sequence, orientation and correct alignment with reading frames in the plasmids by Sanger sequencing, which was carried out at the Ramaciotti Centre for Genomics, UNSW Australia, Sydney, Australia.

2.2.1.8 Plasmid purification

Plasmid DNA for general analysis was purified using the QIAprep Spin

Miniprep Kit (Qiagen) according to the manufacturer’s protocols. DNA for mammalian cell transfection purposes was purified using the PureYield Plasmid Maxiprep System

(Promega Corporation). The concentration and purity of the DNA was measured with a

NanoDrop UV spectrophotometer (Thermo Fisher Scientific Inc) at 260/280 nm.

2.2.1.9 Total RNA extraction

RNA was extracted from dissected mouse tissues (detailed in 2.3.2) using TRI- reagent (Sigma), following the manufacturer’s instructions, which included a chloroform phase separation and ethanol precipitation. RNA was resuspended in 40 μL of RNase-free water.

For microarray purposes, RNA was extracted from HeLa cells that were grown to confluence in 6 well plates. The medium was aspirated away and the cells were washed once with cold sterile PBS. Lysis was performed according to the RNeasy Mini

60

Kit (Qiagen) protocol. Lysis solution was applied to the cells and the cells were removed from the plate using a cell scraper and transferred into an RNase-free microfuge tube. The cells were homogenised cold by passing them at least 5 times through a syringe with a 20-gauge needle attached, while the tube was seated in ice.

Then all samples were transferred to the QIAcube workstation (Qiagen) for automated

RNA extraction. RNA samples were assessed for quantity and quality using a

NanoDrop UV spectrophotometer (Thermo Fisher Scientific Inc).

2.2.1.10 Affymetrix microarray analyses

RNA microarray analyses were carried out by the Ramaciotti Centre for

Genomics, UNSW Australia. Mouse striatum tissue and HeLa cell samples were assessed for quality using Bioanalyzer (Agilent Technologies) analysis. 1 μg of total

RNA per striatum sample was used for hybridisation with a Mouse Gene 1.0 ST Array

(Affymetrix, Inc), which includes over 26,000 RefSeq transcripts. For HeLa cells, 500 ng total RNA samples were used for hybridisation with an Affymetrix Human 2.0 ST chip, which covers approximately 30,000 annotated transcripts plus 11,000 long non coding RNAs (lnc-RNAs). Data from both arrays were analysed using the Partek

Genomics Suite (Partek Inc).

2.2.1.11 Reverse transcription

First-strand cDNA synthesis was carried out using the RNase H Minus, Point

Mutant M-MLV Reverse Transcriptase enzyme system, with Oligo dT primers (both sourced from Promega Corporation), using 1 μg of total RNA as the template, according to the manufacturer’s instructions.

61

2.2.1.12 Quantitative real time-PCR

5-10 ng of cDNA was used as a template for quantitative PCR (qPCR) using the

EvaGreen dye-based system SsoFast EvaGreen Supermix (Bio-Rad Laboratories, Inc).

Reactions were set up to a total volume of 20 μL according to the product protocol and performed on the Stratagene MX3005P qPCR system (Agilent Technologies). Each reaction was set up in triplicate for the target gene under test. Duplicate reactions were also set up with an identical amount of template using primers designed against human or mouse HPRT (hypoxanthine phosphoribosyltransferase 1) as a housekeeping gene reference standard (see Table 2.5). For all RT-qPCR assays, the efficiency of the different primer sets were tested by establishing a standard curve using serial dilutions of a cDNA pool made by combining samples of all the templates used in each experiment. MxPro QPCR Software was used to analyse the dissociation and amplification curves of every experiment and to obtain the threshold cycle values (Ct).

Data were then analysed in Microsoft Excel for quantitation of target gene relative to the reference standard using the delta delta Ct (ΔCT) method (Livak and Schmittgen,

2001) and t-test analyses were performed on the ΔΔCt values for the different groups of samples.

2.3 Cell Biology

2.3.1 Animals

Gtf2ird1-/- mice (sometimes referred to as knock out mice) were generated previously (Palmer et al., 2010). The mutation has been maintained on a C57BL/6J

62 background for greater than 20 generations and all experiments involved use of mice on this background. All experimental procedures were approved by the Animal Care and

Ethics Committee at UNSW Australia.

2.3.2 Tissue preparation

After mouse euthanasia by cervical dislocation, were dissected out and laid ventral side up in a Model PA 002 Mouse Brain Blocker (Kopf). Two cuts were made at the 7th and 9th slots, counted from the olfactory bulb end, resulting in a 2 mm coronal brain slice. The slice was transferred to a sterile petri dish containing sterile

PBS mounted under a dissecting microscope (Olympus). The striatum region was identified and cut out using microsurgical scissors, being careful to remove any contaminating tissue from the surrounding regions. Striatum tissue from both hemispheres was placed into sterile microcentrifuge tubes containing TRI-reagent

(Sigma) and disaggregated by trituration with a 1 mL pipette fitted with a sterile aerosol resistant tip for subsequent RNA extraction (see 2.2.1.9).

2.3.3 Cell culture

2.3.3.1 Immortalised cell lines

The cell lines used in this work included HeLa, HEK-293 and SH-SY5Y cells.

The first two were grown in Dulbecco’s modified Eagle’s medium, supplemented with

10% fetal bovine serum (FBS), and penicillin (100 U/mL)/streptomycin (100 µg/mL) at

37° C in 5% CO2. SH-SY5Y cells were grown under identical conditions except for the use of 1:1 DMEM/F12 medium. Passaging was performed when the cells approached

63 confluence; medium was aspirated, cells were rinsed using pre-warmed PBS and 1 mL of 0.25% trypsin-EDTA solution was added. The cells were incubated at 37° C until detached from the dish. Trypsin activity was blocked by adding the culture media (9 mL) containing FBS. An appropriate volume was then transferred into a new culture vessel containing the complete culture media. For immunofluorescence and proximity ligation assays, cells were plated into Millicell EZ slides (0.7 cm2, Merck Millipore).

2.3.3.2 ESCs-derived neurospheres culture and neuronal differentiation

The H9 (WA-09, WiCell, WI, USA) human ES cell line was cultured feeder-free on vitronectin coated plates using MTeSR-1 defined media according to the manufacturer’s instructions (Stem Cell Technologies) and maintained at 37° C with 5%

CO2. Colonies were mechanically dissected every 7 days and transferred to freshly prepared coated plates. Cell culture media was changed every day. Neural inductions of human ES cells were set up as described by Denham et al. (2012) with some slight modifications. Briefly, human ES cells were mechanically dissected into pieces approximately 0.5 mm in diameter and transferred to laminin-coated organ culture plates in N2/B27 medium containing 1:1 mix of Neurobasal medium (NBM) with

DMEM/F12 medium. Neurobasal media contained Neurobasal A medium supplemented with 1% N2, 2% B27, 2 mM L-glutamine and 0.5%

Penicillin/Streptomycin (all sourced from Gibco). Cells were cultured in N2/B27 media for 14 days without passaging. SB431542 (10 μM, Tocris) and noggin (500 ng/mL,

Peprotech) were supplemented in the N2/B27 media for the first 7 days, followed by basic fibroblast growth factor (bFGF; 20 ng/mL, Peprotech) supplementation only for the remaining 7 days. Fresh supplemented media was replaced every second day. After

64

14 days, colonies were dissected into pieces and cultured in suspension in NBM supplemented with epidermal growth factor (EGF) and bFGF at 20 ng/mL each

(Peprotech) for one week to generate neurospheres. Neuronal differentiation was performed by mechanically disaggregating neurospheres and plating the cells onto poly-

D-lysine/laminin dishes in unsupplemented NBM for 1-2 weeks, as previously described (Denham and Dottori, 2011).

2.3.4 Transient transfection

For siRNA transfection, the ON-TARGETplus GTF2IRD1 siRNA (L-013262-

00-0005) SMART pool (Dharmacon, Inc) was transfected into HeLa cells (100 pmol/well for a 6 well plate) using Lipofectamine 2000 (Life Technologies) following the manufacture’s protocol. As negative control, the siRNA ON-TARGETplus Non- targeting pool was used (D-001810-10-05, Dharmacon, Inc). Transient transfections of mammalian expression vectors were performed in HeLa cells using Lipofectamine LTX

(Life Technologies) according to the product manual. For siRNA treated cells, post- transfection incubation was 48 hours and for over-expression studies using plasmid

DNA, transfections were stopped at 24 hours.

2.3.5 Immunofluorescence

For immunofluorescence detection of endogenous proteins, cells were washed twice with PBS and then fixed and permeabilised for 15 min in 4% PFA/0.25% Triton-

X 100. For analyses of overexpressed proteins, 24 hours after transfection, HeLa cells were washed with PBS and fixed in ice-cold methanol for 10 min. After fixation, all

65 cells were incubated with blocking buffer (10% BSA in PBS) for 1 hour at room temperature, followed by primary antibody incubation in 1% BSA in PBS, for either 2 hours at room temperature or overnight at 4° C. Detection was carried out using secondary antibodies conjugated to Alexa Fluor Dyes (Molecular Probes). ProLong

Gold Antifade reagent with DAPI (Molecular Probes) was used as the mounting media for all preparations, except for the stimulated emission depletion (STED) imaging, where DAPI was excluded.

Coimmunofluorescence analysis with antibodies that were both derived from a rabbit source (including the detection of endogenous GTF2IRD1 and DCAF6, ZMYM2 or ZMYM3) required the use of a special protocol involving sequential antibody application. HeLa cells were washed with PBS followed by fixation/permeabilization for 15 min in 4% PFA/0.25% Triton-X100. After blocking, as detailed above, cells were incubated with rabbit anti-GTF2IRD1 and then blocked with a goat anti-rabbit Fab fragment (1:30, Jackson Immunoresearch Laboratories, Inc), which converts the presentation of the rabbit IgG (H+L) of the primary antibody as a goat antigen. This was followed by incubation with a secondary anti-goat antibody conjugated to Alexa 488.

Then, the primary rabbit antibody against the second protein was added and detected with a secondary anti-rabbit antibody conjugated to Alexa 555. Negative controls were performed to ensure the complete blocking of rabbit IgG from the first primary antibody.

2.3.6 Proximity ligation assay (PLA)

The proximity ligation assay was performed using the Duolink kit (Olink AB) following the manufacturer’s protocol and using all the proprietary solutions from the

66 kit. In brief, cells were grown in Millicel EZ 8 well chamber slides (Merck Millipore) and fixed for endogenous protein detection as described for the standard immunofluorescence method above (2.3.5). Duolink blocking solution was added and incubated for 30 min at 37° C in a humidified chamber. This solution was tapped off and the two primary antibodies targeting the putative binding partners were added in the antibody diluent solution from the kit, and incubated overnight at 4° C. The next morning, wells were washed twice with Wash buffer A and the oligonucleotide PLA probes were added and incubated for 1 hour at 37° C. Following this, probes were ligated using the provided ligase and ligation buffer, which was incubated on the slides at 37° C for 30 min. Finally, incorporation of the red fluorophore and signal amplification was carried out at 37° C for 100 min using the kit detection reagents, which incorporates DNA polymerase for rolling circle amplification. A schematic representation of the key steps for this procedure is presented in figure 2.1.

67

This figure has been removed

due to copyright restrictions.

Figure 2.1 Key steps of the in situ proximity ligation assay (PLA).

The target proteins bind the two proximity probes (conjugated to oligonucleotides), which when in proximity can hybridize and ligate creating a circular DNA molecule.

This DNA is then amplified by rolling-circle amplification, incorporating fluorescently labelled oligonucleotides present in the mix. Consequently, a fluorescent in situ PLA signal is generated in the place where the two target proteins are in proximity. Adapted from Soderberg et al. (2008).

68

2.3.7 Confocal microscopy and image analyses

Immunofluorescence analysis was conducted at the Biomedical Imaging Facility

(BMIF), part of the Mark Wainwright Analytical Centre, UNSW Australia. Images were acquired by confocal microscopy using an Olympus Fluoview FV1000 microscope (60x, 100x objectives) or Leica TCS SP5 microscope under x63 magnification. For the super resolution technique of stimulated emission depletion

(STED), the Leica TCS SP5 imaging system was connected to a 592 nm continuous wave depletion laser. Images were analysed using Image J software (National Institutes of Health, USA). PLA experiments were measured using the Particle Analysis plugin by setting the thresholds based on the signal of the negative controls in each experiment.

2.4 Protein Biochemistry

2.4.1 Total protein extraction from cells

Cell lysis was performed on ice with solutions that were pre-chilled to 4° C before use. Cells were lysed 24 hours after plasmid transfection or 48 hours after siRNA transfection in lysis buffer (20 mM Tris-HCl, pH7.4, 420 mM NaCl, 10mM MgCl2,

2mM EDTA, 10% Glycerol, 1% TritonX-100, 2.5 mM β-Glycerophosphate, 1mM NaF) supplemented with protease inhibitor cocktail (Sigma P8340) and incubated for 30 min on ice before being sonicated twice for 7 seconds on ice. Cell lysates were centrifuged at 20,000 x g for 10 min at 4° C to remove cell debris.

69

2.4.2 Protein quantitation

To determine the protein concentration of the cell lysates, a BCA protein assay was performed (Pierce, Thermo Fisher Scientific Inc.) following the standard manufacturer’s instructions for a 96-well format assay. A standard curve was prepared using different dilutions from a 2 mg/mL bovine serum albumin stock. Absorbance was measured at 562 nm in a SpectraMax microplate reader (Molecular Devices, LLC). The protein concentration of each sample was calculated by interpolating the sample A562 against the absorbance readings of the known standards.

2.4.3 SDS-polyacrylamide gel electrophoresis (SDS-PAGE)

Proteins were separated and analysed by SDS-PAGE using a BioRad mini-

Protean 3 system (Bio-Rad Laboratories). 8% or 10% Acrylamide/bisacrylamide gels were used according to the size of the proteins of interest. 0.1% SDS and 0.375 M Tris-

HCl pH 8.8 were incorporated into the solution; polymerization of the gel was initiated by adding ammonium persulfate (APS) to a final concentration of 0.1% together with

0.1% TEMED. The gel was poured into the pre-washed and prepared mini-Protean plates mounted in the apparatus and covered with isopropanol to allow polymerisation.

Afterwards, the isopropanol was removed and the stacking gel solution (4.4% acrylamide/bisacrylamide, 0.125 M Tris-HCl pH 6.8 and 0.1% SDS, 0.1% APS and

0.1% TEMED) was poured over the separating gel. Protein samples containing 1x

Laemmli sample buffer and 0.1 M DTT were heated at 95° C for 5 min and then loaded into the wells of the gel. Electrophoresis was carried out in the mini-Protean

70 electrophoresis tank containing running buffer (0.025 mM Tris-base, 0.19 M glycine and 0.1% SDS) using the advancing dye front of the Laemmli sample buffer to judge when to stop.

2.4.4 Western blotting.

Proteins were transferred to Immun-Blot PVDF Membrane (Bio-Rad

Laboratories) for western blot analysis using standard methods. After transfer, membranes were blocked for 1 hour in blocking solution (TBS/Tween 20 and 5% non- fat milk powder), incubated with the primary antibody for 2 hours at room temperature in the same solution and washed three times for 10 min in TBS/Tween 20. The HRP- conjugated secondary antibody incubation was conducted for 45 min at room temperature in blocking solution, washed as before and the signal was detected using the ECL substrates, Clarity (Bio-Rad Laboratories) or Luminata Forte (Merck

Millipore), and exposed to x-ray film.

2.4.5 Immunoprecipitation

Cells were lysed as described (2.4.1) and cell extracts pre-cleared by incubation with Pure Proteome Protein A/G magnetic beads (Millipore, #LSKMAGAG02) for 30 min at 4° C. The antibody of interest (as described in table 2.2) was coupled to protein

A/G Magnetic beads for 30 min at room temperature with constant gentle rotation, and then washed three times in PBS/Tween 20 (0.2%). Pre-cleared lysates were incubated

71 overnight with the antibody-bound beads at 4° C. Beads were washed in PBS/Tween 20 four times and proteins were eluted by boiling the samples in 1x Laemmli sample buffer containing 0.1 M DTT.

2.4.6 Histone deacetylase activity

The enzymatic activity of HDACs in HeLa cell extracts was measured 24 hours post transfection. Cells were lysed without any protease inhibitors and nuclear extracts were prepared following the manufacturer’s instructions for the HDAC Activity

Fluorometric Assay kit (Abcam, PLC). All manipulations were performed at low temperature (4° C) and all the materials and reagents utilised were pre-chilled during the nuclear extract preparation. Nuclear proteins were quantitated by BCA protein assay as described previously (section 2.4.2). The enzymatic reactions were set up in a black- bottomed, clear, 96-well plate according to the product manual and included a crude

HDAC extract as positive control, a non-enzyme control (the buffer that contained the samples) and a negative control containing a random sample combined with the HDAC inhibitor trichostatin A (TSA). 5 µg of every nuclear extract sample was loaded into 2 duplicate wells for each measurement. HDAC activity was measured by allowing the reaction to occur for a set period of time (two-step method, 20 min), then the stop and developer solutions from the kit were added and the fluorescence intensity was measured in a SpectraMax microplate reader (Molecular Devices, LLC) using an excitation wavelength of 350 nm and an emission detection wavelength of 440 nm.

HDAC activity was calculated as the amount of fluorescence per sample, minus the

72 fluorescence of the non-enzyme control, and expressed in terms of activity/100 µg protein.

2.5 Yeast two-hybrid assays

Most of the procedures for yeast culture, handling and media preparation were obtained from the Yeast Protocol Handbook (Clontech Laboratories, Inc.).

2.5.1 Small-scale yeast transformation

Saccharomyces cerevisiae strain AH109 was transformed with prey and bait plasmids using the standard lithium acetate/polyethylene glycol protocol (Gietz and

Woods, 2002). For 10 transformations, a fresh colony of 2 mm in diameter was inoculated into 5 mL of YPD medium (50 g/L YPD broth) in a 15 mL round-bottomed tube and incubated overnight at 30° C at 200 rpm. The next morning, the number of yeast cells was counted using a haemocytometer and the culture was diluted with pre- warmed YPD to a final concentration of 5x106 cells/mL. Then, the culture was grown to at least 2x107 cells/mL (3-5 extra hours). Yeast cells were centrifuged at 1,000 x g for

10 min, the medium was discarded and cells were washed with 25 mL sterile water, which was followed by a centrifugation at 1,000 x g for 5 min. The supernatant was discarded and the yeast pellet was resuspended with 1 mL sterile water. 100 μL of this dilution was used for every transformation (~1x108 cells). A transformation mix was added to the cells which, for every transformation, is composed of; 240 μL of 50%

PEG-3350 (w/v), 36 μL of 1 M LiAc, 10 μL of pre-boiled salmon sperm DNA, plasmid

DNA (100 ng per plasmid) and sterile water to a final volume of 360 μL. The mixture 73 was vortexed and incubated for 30 min at 30° C and then at 42° C for 20 min. The reaction was centrifuged at 8,000 rpm for 30 sec, the supernatant discarded and the pellet was resuspended with 500 μL of sterile water. Yeast cells were then plated on double dropout (DDO) selective agar plates (6.7 g/L SD base, pH 5.8 containing 1X

DDO supplement Clontech Laboratories 2% glucose and 2% w/v agar). Plates were incubated at 30° C for 4-5 days until colonies of resistant clones emerged. To test for protein interactions that might activate the His and galactosidase reporter genes, selected colonies were picked and re-streaked on quadruple dropout (QDO supplement,

Clontech Laboratories) medium agar plates with x-α-gal (40 µg/mL). Plates were then incubated at 30° C for 3-5 days and observed for growth of blue colonies.

2.5.2 Yeast library screening

The following procedures were carried out, following the instructions from the Matchmaker Mate & Plate System (Clontech Laboratories). The library screening steps are summarised in figure 2.2 in a workflow that ultimately leads to the confirmation of the protein-protein interactions in yeast. For library screening, the plasmid pGBKT7-GTF2IRD1 was first transformed into the AH109 yeast strain using the small-scale transformation protocol described above (2.5.1). This bait has been previously tested as negative for autoactivation of the yeast 2-hybrid selection markers

(Widagdo, 2011; Widagdo et al., 2012).

A fresh 2-3 mm diameter yeast colony containing pGBKT7-GTF2IRD1 was inoculated into 50 mL of SD medium lacking tryptophan (-Trp), which selects for the presence of the pGBKT7 plasmid. The culture was grown overnight at 30° C, shaking at

250 rpm. The next day, when the OD600 had reached 0.8, the cells were pelleted by

74 centrifugation at 1,000 x g for 5 min and then resuspended with 5 mL of SD-Trp medium (~1x108 cells/mL). 1 mL of the yeast strain Y187 (which is capable of mating with AH109), pre-transformed with library plasmids, was thawed and combined with the pGBKT7-GTF2IRD1 culture in a sterile 2 L flask. The Mate and Plate (Clontech

Laboratories) libraries used included the Universal Mouse (Normalized) library

(#630482) and the Human Brain (Normalized) cDNA library (#630486). The yeast mating was carried out in 45 mL of 2X YPDA medium (100g/L YPD broth, 60 mg/L adenine hemisulfate and 50 μg/mL kanamycin), incubated at 30-50 rpm at 30° C for 20-

24 hour, until the presence of diploid cells was observed under phase-contrast brightfield microscopy by mounting droplets of the solution onto a glass slide and covering with a coverslip. The cells were centrifuged for 10 min at 1,000 x g and washed twice with 50 mL of 0.5X YPDA supplemented with 50 μg/mL kanamycin, which was rinsed through the flask in order to collect any last-remaining cells. The yeast was resuspended in 10 mL of 0.5X YPDA (50 μg/mL kanamycin) medium. A 100

μL aliquot was used for making serial dilutions 1:10, 1:100, 1:1000 and 1:10000 to plate in different selective media (SD-Trp, SD-Leu, DDO and QDO) in order to later calculate the diploid survival and mating efficiency of the library screen. The volume of the total remaining cell suspension was measured and plated equally over approximately

45 x 150 mm quadruple dropout (QDO) agar plates. The plates were incubated for 4-7 days at 30° C. Clones appearing on QDO medium were re-streaked onto new QDO/x-α- gal plates (supplemented with X-α-galactosidase, 2 mL of 20 mg/mL stock per 1 L).

Clones that showed survival and blue colour, indicating the activation of 2 independent genetic markers of protein interaction, were selected for plasmid rescue to identify the prey protein.

75

Figure 2.2Yeast two-hybrid library screening workflow.

The steps carried out from the yeast mating to the final confirmation of the protein interactions are summarised.

76

2.5.3 Plasmid rescue from yeast

Prey plasmids were extracted from selected yeast colonies using a procedure based on acid-washed bead disruption (Hoffman and Winston, 1987). Yeast colonies were inoculated into 2 mL of liquid QDO medium and grown overnight to saturation at

30° C at 250 rpm. Cells were collected by centrifugation at 1,000 x g for 10 min, the supernatant was discarded, and the pellet was resuspended with 0.2 mL of Buffer A (2%

Triton X-100, 1% SDS, 100 mM NaCl, 10 mM Tris-HCl pH 8.0, 1 mM EDTA). The cell suspension was combined with approximately 150 μL of acid-washed glass beads

(425-600 μm, Sigma). 0.2 mL of phenol:chloroform:isoamyl alcohol (25:24:1) was added and the tubes were sealed and vortexed for 2 min in order to break up the yeast.

Cell suspensions were centrifuged at 14,000 rpm for 5 min to separate the phases. The aqueous layer containing released DNA was collected carefully into a new tube and 240

μL of 100% ethanol was added to precipitate the DNA for 30 min on ice. The solution was centrifuged for 15 min at 14,000 rpm at 4° C, supernatant was discarded and the

DNA pellet was washed twice with ice-cold 70% ethanol with centrifugation at 14,000 rpm for 5 min at room temperature to re-pellet. After the final centrifugation, supernatant was discarded and the DNA pellet was air-dried and resuspended in 40 μL of sterile water.

The isolated prey plasmids were introduced into E. coli (electrocompetent

DH5α) by high efficiency electroporation as the concentration of plasmid DNA was generally very low. 1 μL of the rescued DNA solution was added to 50 μL of bacteria, the mixture was transferred into a pre-chilled electroporation cuvette (1 mm gap, Bio-

Rad). Electroporation was performed in the Gene Pulsar Xcell electroporation system

(BioRad) using the parameters; 1.8 kV, 200 ohms, and 25 μFd. After electroporation, 77 the cuvette was immediately chilled in ice and 200 μL of SOC medium was added followed by incubation at 37° C with shaking at 250 rpm for 1 hour. The culture volume was then plated onto LB-ampicillin agar plates and incubated at 37° C overnight.

2.5.4 Identification of isolated yeast library clones

Ampicillin resistant E. coli clones, transformed by the yeast prey plasmids, were tested for the presence of an insert by picking a small quantity of the bacterial colony into a PCR tube containing a PCR mixture made with primers that flank the multiple cloning site of the pGADT7 vector (T7 seq and 3’AD seq primers, as listed in table 2.3).

The rest of the colony was inoculated into LB-ampicillin medium and allowed to grow overnight at 37° C. After confirmation of the presence of an insert by PCR, the bacterial cultures were used for mini-prep plasmid extraction. Plasmid DNA was prepared for

Sanger sequencing at the Ramaciotti Centre for Genomics, UNSW Australia, Sydney,

Australia; using T7 seq and/or 3’AD seq sequencing primers (Matchmaker Yeast Two-

Hybrid System, Clontech), which flank the prey library inserts.

The sequences obtained were analysed using the Basic Local Alignment Search

Tool BLASTn from the National Center for Biotechnology Information, USA (NCBI).

Protein information was retrieved from the UniProt Consortium database.

78

2.6 Online resources

Yeast library clone identification and searches were performed using the Basic Local Alignment Search Tool (BLAST) from the National

Center for Biotechnology Information, USA (NCBI), http://blast.ncbi.nlm.nih.gov/Blast.cgi. Protein symbols, characteristics, sequences and protein accession numbers were retrieved from the Universal Protein Resource

(UniProtKB) database, from the UniProt Consortium (http://www.uniprot.org). Gene symbols and sequences were retrieved from NCBI. Gene ontology (GO) information was obtained from the Gene Ontology Consortium database (www.geneontology.org).

The interactional network of GTF2IRD1 was generated using Cytoscape 3.1.1 (The

Cytoscape Consortium, http://www.cytoscape.org/) (Shannon et al., 2003), retrieving protein-protein interactions from the IntAct database (EMBL-EBI, http://www.ebi.ac.uk/intact/).

2.7 Statistical analyses

Results presented in graphs showed means ± SEM as indicated in the figure legends. Statistical significance was determined using a two tailed Student’s T-test and p values were depicted in results figures as follows: * p≤ 0.05, ** p ≤ 0.01, *** p ≤

0.001 or **** p ≤ 0.0001.

79

CHAPTER 3 - CELLULAR CHARACTERISATION OF GTF2IRD1

80

3.1 Introduction

To date, most of the studies regarding GTF2IRD1 have been focused on its

DNA-binding or gene regulatory capabilities. GTF2IRD1 was originally isolated based on its DNA binding capability and subsequent work has refined that understanding

(Bayarsaihan and Ruddle, 2000a; O'Mahoney et al., 1998b; Ring et al., 2002). Evidence indicates that it also has intrinsic repressive capabilities (Polly et al., 2003a; Vullhorst and Buonanno, 2003b; Vullhorst and Buonanno, 2005). When tested in mammalian cell lines using reporter assays, the repressive effect of transfected GTF2IRD1 was maintained even when all of the DNA binding regions were removed (Polly et al.,

2003a). This suggests that GTF2IRD1 may act, not only as a conventional transcription factor by binding to target sequences, but also by repressing gene expression in a DNA- independent manner. These repressive capabilities were also observed in a transgenic mouse line in which human GTF2IRD1 is constitutively expressed in skeletal muscle

(Issa et al., 2006).

DNA binding studies have shown that GTF2IRD1 binds to its own promoter through an array of sequence-specific binding sites in the GTF2IRD1 upstream region, leading to negative autoregulation of its own transcription (Palmer et al., 2010).

However, analyses of direct gene target sets for this transcription factor have been limited by the experimental approaches employed, including the use of transfected cells with tagged cDNA (Ku et al., 2005) or using anti-GTF2IRD1 antibodies that are known to have non-specific binding properties (Chimge et al., 2008). In addition, studies that aim to gather information about the endogenous expression of GTF2IRD1 and its molecular role, have been hampered by a lack of specific and high-affinity antibodies and the low abundance of the protein (Calvo et al., 2001; O'Mahoney et al., 1998b). 81

Studies of transfected mammalian cells have shown over-expressed GTF2IRD1 inside the nucleus (Palmer et al., 2010; Tussie-Luna et al., 2002a). Localisation studies of the other GTF2I family members (TFII-I and GTF2IRD2), have shown that both proteins also localise to the nucleus but can also be present in the cytoplasm (TFII-I) (Novina et al., 1999) or associated with microtubules (GTF2IRD2) (Palmer et al., 2012). Analysis of TFII-I indicates that some of this variation is dependent on the splice isoform under consideration (Hakre et al., 2006). Considering this complexity and the possibility of artefacts caused by over-expression of GTF2IRD1, it is important to undertake a cellular localisation analysis of the endogenous GTF2IRD1 protein.

Sub-cellular localisation of endogenous GTF2IRD1 has never been achieved by immunofluorescence. Here, for the first time, evidence is presented that the endogenous human GTF2IRD1 protein is localized exclusively to the nucleus in immortalized human cell lines and in neurons and neuronal progenitors differentiated from human embryonic stem cells. The protein shows a specific speckled-pattern throughout the nucleus. GTF2IRD1 speckles were compared with an array of nuclear markers for histone methylation, nuclear bodies and other nuclear proteins and substructures with similar patterns of distribution within the nucleus. GTF2IRD1 showed partial overlap with a subset of these markers including; Histone H3 bi/trimethylated on lysine 27 (H3K27me2/3), heterochromatin protein 1 γ (HP1γ) and

PML. These markers support an association with developmentally regulated silent chromatin and transcriptional regulation.

82

3.2 Results

3.2.1 Detection of human GTF2IRD1 and antibody validation

As mentioned previously, the study of endogenous GTF2IRD1 has been hampered by a lack of good quality antibodies. However, two commercial antibodies were identified, one of which (from here on named as 333A, further details provided in chapter 2, table 2.2) is usable in immunofluorescence and immunoprecipitation but is specific for only the human protein and is a very poor detection reagent of denatured

GTF2IRD1 on western blots. The second antibody (named M19; see chapter 2, table

2.2) is highly sensitive as a detection reagent on western blots but has relatively poor specificity and does not work in immunofluorescence applications.

SDS-PAGE analysis of human GTF2IRD1 in whole cell extracts from HeLa cells showed that the M19 antibody detects two major bands in the 110-130 kDa range

(figure 3.1A). After immunoprecipitation of GTF2IRD1 using the 333A antibody, only the upper band (running at approximately 130 kDa) is detected by M19, indicating that this corresponds to endogenous GTF2IRD1 (predicted molecular weight: 106 kDa).

This result was confirmed by the targeted knock-down of GTF2IRD1 using a pool of four anti-GTF2IRD1 siRNAs. The upper band at 130 kDa was lost when cells were transfected with the anti-GTF2IRD1 siRNA pool (figure 3.1A, fourth lane). No change in the band pattern was observed in whole cell extracts from HeLa cells transfected with the non-targeting control siRNA pool (figure 3.1A).

The second protein detected by M19 running in the 110-130 kDa range could not be identified. One possibility that was considered was that it constituted TFII-I or

83

GTF2IRD2 and was detected because of similarities between these family members.

However, this hypothesis was dismissed by comparison with similar immunoblots probed with anti-TFII-I and anti-GTF2IRD2 antibodies. These antibodies both identified bands that run at a very similar molecular weight to endogenous GTF2IRD1 and not at the lower level of approximately 110 kDa. The possibility that the anti-

GTF2IRD1 siRNAs affected the levels of TFII-I and GTF2IRD2 or that M19 was cross- reacting with these related proteins was also excluded by probing extracts treated using the siRNA knockdown oligonucleotides (figure 3.1A). These data validated the antibodies 333A and M19 as tools for immunoprecipitation and immunoblot detection of endogenous human GTF2IRD1 in HeLa cells and characterised methods for their use in cell lysates.

3.2.2 Endogenous GTF2IRD1 exists in a punctate pattern in the nucleus

3.2.2.1 Super-resolution microscopy of GTF2IRD1 in human cell lines

Since the antibody 333A was found to be specific for human GTF2IRD1, its ability to detect the endogenous human protein by immunofluorescence was tested in

HeLa cells. Cells were transiently transfected with the siRNA pools (non-targeting control and GTF2IRD1-targeting), followed by indirect immunofluorescence analysis with the 333A antibody. A punctate signal was detected in all the nuclei of the cells treated with the non-targeting control siRNAs, but this signal was completely abrogated in the majority of cells treated with the anti-GTF2IRD1 siRNAs (figure 3.1B). Cell counting in three independent experiments indicated that approximately 98% of cells

84 had no signal, while the remaining 2% showed the normal pattern, suggesting that they had failed to be transfected by the anti-GTF2IRD1 siRNA oligonucleotides.

To further assess the localisation pattern of GTF2IRD1 inside the nucleus, immunofluorescence analysis was performed with the 333A antibody in different human immortalised cell lines, including HeLa, HEK-293 and SH-SY5Y cells. The super-resolution confocal microscopy technique, STED (Stimulated Emission

Depletion), was employed, which uses a combination of lasers (see chapter 2, Materials and Methods) to achieve better resolution in fluorescent microscopy (Hell and

Wichmann, 1994). The average resolution achieved by this technique in the cell lines utilised was ~60 nm. Therefore, it was possible to obtain a sharp visualisation of the

GTF2IRD1 speckles, which were distributed evenly throughout the nuclei of all the cell lines analysed. The speckles were found in large numbers and the intensity of fluorescence was similar in the three cell lines tested (figure 3.2).

85

Figure 3.1 Detection of endogenous human GTF2IRD1.

(A) Western blot analysis of endogenous GTF2IRD1, GTF2IRD2 and TFII-I in HeLa cell extracts. The anti-GTF2IRD1 M19 antibody detects two bands above 100 kDa in the whole cell extract (Input) but after immunoprecipitation with the anti-GTF2IRD1

333A antibody (IP 333A), only the upper band is detected. In extracts from cells transfected with a negative control siRNA (Con siRNA), both bands are detected but in cells transfected with the anti-GTF2IRD1 siRNA pool (IRD1 siRNA), the upper band disappears. Immunoblotting (IB) in identical conditions using anti-GTF2IRD2 (IRD2) and anti-TFII-I antibodies show that the lower band does not correspond to these proteins and the upper band is exclusively a product of the GTF2IRD1 gene.

86

(B) Immunofluorescence analysis of endogenous GTF2IRD1 protein using the 333A antibody on HeLa cells treated with control or targeting siRNA.

87

Figure 3.2 Endogenous GTF2IRD1 is expressed in a speckled pattern within the nucleus.

Immunofluorescence utilising the antibody 333A for analysing endogenous GTF2IRD1 distribution in the nucleus of HeLa, HEK-293 and SH-SY5Y cells using stimulated emission depletion (STED) super resolution confocal microscopy.

88

3.2.2.2 Expression of GTF2IRD1 in human neurons derived from ES cells and neural progenitors.

The possibility that the pattern of GTF2IRD1 localization was specific to immortalized cell lines was considered. Since GTF2IRD1 function has been associated with neurobehavioural abnormalities in mouse studies (Howard et al., 2012; Young et al., 2008) and the protein is expressed in different regions of the mouse brain (Palmer et al., 2007), an understanding of localisation in neurons was sought. Therefore, neurons differentiated from human embryonic stem (ES) cells were analysed using the 333A antibody. Co-immunofluorescence with anti-β-tubulin III and anti-MAP2ab antibodies

(both markers for early and late neuronal differentiation, respectively (Dinsmore and

Solomon, 1991; Katsetos et al., 2003) showed punctate nuclear expression of

GTF2IRD1 in neuronal cells (figure 3.3), consistent with its expression pattern in the immortalized cell lines (figure 3.2). Of note, GTF2IRD1 nuclear expression was also observed in β-tubulin III negative and MAP2ab negative cells, which correspond to subpopulations of early neural progenitors (Denham et al., 2012; Dottori and Pera,

2008). The intensity of GTF2IRD1 immunofluorescence was similar in both MAP2ab positive and negative cells, whereas the intensity was stronger in β-tubulin III positive cells, compared to the β-tubulin III negative cells in the same field of view (figure 3.3).

89

Figure 3.3 Endogenous GTF2IRD1 adopts a speckled nuclear pattern in hESC- derived neuronal cell cultures.

Immunofluorescence analyses of hESC-derived cells, driven into the cortical neuronal pathway of development, show that GTF2IRD1 has the same nuclear pattern found in immortalised cell lines. GTF2IRD1 is expressed in all cells including differentiating neurons, as marked by β-tubulin III and MAP2ab antibodies. Scale bars represent 20

µm.

90

3.2.3 Endogenous GTF2IRD1 shows overlap with markers of silenced chromatin

The finding of a defined pattern for the subcellular localisation of endogenous

GTF2IRD1 prompted an exploration of potential co-localization with other nuclear proteins, as means to understand GTF2IRD1 function. As described in Chapter 1, previous studies have shown that GTF2IRD1 is able to repress gene expression, both in in vitro reporter assays (Polly et al., 2003a), and in mouse lines that express transgenic

GTF2IRD1 in muscle tissue, in which a set of fibre type-specific genes are repressed

(Issa et al., 2006). Therefore, comparisons of subcellular localisation were sought with established markers of gene expression regulation, as well as markers of established nuclear subcompartments. Immunofluorescence analysis was performed in HeLa cells using the 333A anti-GTF2IRD1 antibody, together with antibodies raised against proteins related to gene regulation, such as heterochromatin protein 1 alpha (HP1α),

HP1β, HP1γ, RNA Polymerase II C-terminal domain phosphorylated on serine 5 (CTD phospho-SER5), histone H3 multi-methylated on lysine 4 (H3K4me3) or lysine 9

(H3K9me3) or lysine 27 (H3K27me2/3).

For comparison with established nuclear subcompartments, the antibodies used included anti-PML (PML bodies), Coilin (Cajal bodies), NPC (nuclear pore complex) and LAP2 (lamina associated polypeptide 2), SC-35 and SP1.

While none of the markers tested showed complete co-localization, qualitative assessment indicated that GTF2IRD1 overlaps to the greatest extent with H3K27me2/3,

HP1γ and PML bodies (figure 3.4). A lesser degree of overlap was found for HP1α,

HP1β and H3K9me3, comparing the intensity peaks of fluorescent signal. Co- localization analysis between GTF2IRD1 and the nuclear speckle proteins coilin, LAP2,

91

NPC, RNA Polymerase II CTD phospho-SER5, SC-35, SP1 and H3K4me3 was minimal (figure 3.5).

92

Figure 3.4 Confocal co-immunofluorescence analysis of endogenous GTF2IRD1 with markers of nuclear sub-compartments and chromatin sub-domains in HeLa cells.

The nuclear markers include heterochromatin proteins (HP1) α, β, γ, methylated histone

H3K9me3 (K9), H3K27me2/3 (K27) and PML (in red). Observations indicated that the strongest correspondence with GTF2IRD1 localization included H3K27me2/3, HP1γ and PML.

93

Figure 3.5 Comparison of GTF2IRD1 localization with markers of nuclear sub- compartments.

The images show co-immunofluorescence detection of a series of endogenous nuclear markers and their relative nuclear localization with endogenous GTF2IRD1 (in green) in HeLa cells. The nuclear proteins include; coilin, trimethylated histone H3K4 (K4), lamin-associated polypeptides (LAP2), nuclear pore complex (NPC), RNA polymerase

II CTD phosphorylated Ser5 (R-pol II), splicing factor SC-35 (SC-35) and the zinc finger transcription factor SP1 (SP1) (all in red). No obvious co-localization was observed between these proteins and GTF2IRD1.

94

3.3 Discussion

In this chapter, we have shown the unequivocal detection of endogenous human GTF2IRD1. Immunoprecipitation with the 333A anti-GTF2IRD1 antibody in

HeLa whole cell extracts, followed by immunodetection with the M19 anti-

GTF2IRD1 antibody, showed a band of ~130 kDa, which was confirmed to be specific for GTF2IRD1 by comparing with extracts from cells transfected with siRNA targeted for GTF2IRD1 knock-down in which this band disappears. The 130 kDa band was not abolished in samples derived from cells treated with the non-targeting siRNA control (figure 3.1A).

Here it is shown for the first time that endogenous human GTF2IRD1 protein is localized exclusively to the nucleus in both cultured cell lines and in cultured neurons differentiated from embryonic stem cells. GTF2IRD1 was detected using the

333A antibody, which was confirmed by the lack of fluorescence in HeLa cells treated with the targeting siRNA. GTF2IRD1 protein was found to be distributed in a distinctive speckled pattern within the nucleus of HeLa, SH-SY5Y and HEK-293 cells, which was visualised in detail using STED microscopy. This super-resolution technique permits the visualisation of cellular structures beyond the diffraction limit of resolution, which in conventional microscopy is calculated as 200 nm. By using this technique, it was possible to achieve a resolution of ~60 nm, detecting isolated small and sharp speckles of a relatively uniform size distributed throughout the nucleus in the cell lines analysed.

The diameter of large nuclear speckles like Cajal or PML bodies range between

0.3-0.5 µm (Ochs et al., 1995; Weis et al., 1994), a diameter that is within the resolution

95 limit of confocal microscopy. The resolution achieved for GTF2IRD1 speckles by

STED is sufficient to improve their visualisation, as they appeared smaller and sharper than the examples mentioned above (as shown in figures 3.4 and 3.5), but the level of resolution is not powerful enough to determine whether the speckles represent monomeric or multimeric complexes. From a theoretical perspective, the molecular weight of the GTF2IRD1 protein predicts an estimated size of 6.8 nm

(http://www.calctool.org/CALC/prof/bio/protein_size). Whilst monomeric forms are within the range of detection and the speckles could represent monomeric GTF2IRD1, additional experiments would be required to establish whether this is the case.

It is not yet possible to perform similar studies in mouse-derived cells or tissues because the 333A antibody, which is the only antibody we have found to date that can detect the endogenous protein in immunofluorescence experiments, does not detect the mouse GTF2IRD1 protein (data not shown). This could be due to the fact that the epitope recognised for the 333A antibody lies on the last 50 amino acids of human GTF2IRD1, which does not share a complete sequence identity with the mouse protein.

This limitation prevented an analysis of GTF2IRD1 protein distribution in cells and tissues taken from live animals and attempts to detect GTF2IRD1 in human post- mortem samples were unsuccessful (data not shown). Therefore, GTF2IRD1 distribution was assessed in hESCs differentiating into cortical neurons in culture (see

Materials and Methods), using β-tubulin III and MAP2ab antibodies to detect neuronal differentiation. β-tubulin III expression is associated with early neuronal differentiation

(Katsetos et al., 2003), whereas MAP2ab expression is required for neurite extension

96 and cessation of cell division in order to produce mature neurons (Dinsmore and

Solomon, 1991).

The nuclear speckled pattern of GTF2IRD1 distribution was maintained in both neural progenitor and neuronal populations derived from human ES cells, thus suggesting a function for this protein starting from the early stages of human neuronal development. In studies of Gtf2ird1 mRNA expression in mice using a Gtf2ird1 LacZ knock-in line (Palmer et al., 2007), expression was also found in the developing central nervous system, which was maintained into adulthood, specifically located in various neuronal cell types. Expression in the telencephalon (prospective cortex) was not detected in mouse embryos by this method but the difference could be explained by a lack of sensitivity of detection, variations between the behaviour of cultured versus in vivo cells, species differences or a consequence of observing GTF2IRD1 protein versus observing Gtf2ird1 mRNA production.

The speckled pattern of GTF2IRD1 distribution within the nucleus prompted us to explore possible colocalisations with other nuclear proteins that are also known to have a punctate pattern of expression. Observations in HeLa cells suggested that the localization pattern was most similar to H3K27me2/3 marks in chromatin and the distribution of the HP1γ protein. A lower degree of overlap was found for HP1α, HP1β and H3K9me3. What these chromatin marks and proteins share in common is a functional role in gene silencing (Eissenberg and Elgin, 2014; Martin and Zhang, 2005;

Nishibuchi and Nakayama, 2014).

Unlike H3K9me3 chromatin marks, which are generally found in constitutive heterochromatin, H3K27 methylation, mediated via PRC2, is a key repressive

97 modification for the regulation of developmental genes (Golbabapour et al., 2013).

HP1 proteins have a chromodomain that recognizes the H3K9me2/3 mark and were originally associated with heterochromatin but are now recognized as having multiple roles in transcriptional activation, sister chromatid cohesion, chromosome segregation, telomere maintenance, DNA repair and RNA splicing (Canzio et al., 2014). While

HP1α and HP1β are generally localized to heterochromatin, HP1γ is found in heterochromatin but also in euchromatin at transcription start sites (Minc et al., 2000;

Sridharan et al., 2013). HP1 isoforms bind to H3K9me1, 2, or 3 with different affinities; HP1γ being the isoform with the lowest affinity for this modification

(Nishibuchi and Nakayama, 2014). Thus, one may hypothesise that the regions of the genome to which GTF2IRD1 localises with HP1γ, are not those associated with H3K9 methylation.

These observations are consistent with the recruitment of GTF2IRD1 to regions of the genome involved in dynamic gene repression and not to regions that are being actively transcribed, as indicated by markers such as H3K4me3 (Santos-Rosa et al., 2002) and RNA Polymerase II CTD phospho-SER5. RNA polymerase II is predominantly found in an unphosphorylated state when not bound to promoters but its C-terminal domain (CTD) is phosphorylated on serine 5 at the initiation of transcription (Phatnani and Greenleaf, 2006). SP1 is a transcription factor that, like

GTF2IRD1, negatively auto-regulates its own transcription (Deniaud et al., 2009) and can positively or negatively regulates target gene transcription by binding to DNA recognition sequences and recruiting protein complexes (Doetzlhofer et al., 1999).

Other nuclear speckle proteins that showed no overlap with GTF2IRD1 included; SC-35, which is a spliceosome protein involved in a variety of processes, 98 amongst them regulation of transcription, translation and RNA stability (Zhong et al.,

2009), and Coilin, a component of the Cajal bodies, involved in the biogenesis of small nuclear ribonucleoprotein particles (snRNPs) (Matera, 1999). Additionally,

GTF2IRD1 is excluded from the structural elements at the nuclear periphery including the nuclear lamina (as marked by LAP2) and the nuclear pore complexes (NPC).

Based on these associations alone, one may conclude that GTF2IRD1 plays a role in transcriptional regulation and developmental gene silencing. These ideas fit well with previous observations regarding the repression of multiple tissue-specific genes in a transgenic system (Issa et al., 2006), the direct negative autoregulation of the GTF2IRD1 promoter/enhancer by its own protein product (Palmer et al., 2010) and the fact that Gtf2ird1 is widely and robustly expressed during development but restricted to specific cell types such as neurons and brown adipose tissue during adulthood (Palmer et al., 2007).

In addition, a small proportion of the GTF2IRD1 speckles also localised to

PML nuclear bodies (PML-NBs), which have extensive contacts with chromatin and are functionally associated with DNA repair, transcriptional regulation, cellular senescence and (Carracedo et al., 2011). PML-NBs are thought to function as sites of DNA repair and as sites of assembly for SUMO-dependent co-repressor complexes (Chang et al., 2011a). PML forms the scaffold of this nuclear sub- compartment and the entry of other proteins is regulated via SUMOylation (Shen et al., 2006). We have previously shown that GTF2IRD1 can be SUMOylated, regulating interactions with SUMO-interacting motif (SIM)-containing proteins (Widagdo et al.,

2012). Thus it seems plausible that a sub-population of SUMOylated GTF2IRD1 is to be found in the PML-NBs. 99

A similar cellular analysis of over-expressed recombinant GTF2IRD2 localisation showed that this family member does not overlap with many of the markers analysed here (Coilin, SC-35, H3K9me3 or H3K4me3, NPC, LAP2 and

PML) (Palmer et al., 2012).

The observations on the subcellular localisation of GTF2IRD1 presented in this chapter, suggest an association with silent chromatin and gene regulation, but stronger evidence of such interactions prevented firmer functional conclusions.

Therefore, there is a strong need for further studies that elucidate the molecular function of GTF2IRD1. Thus, analyses of the protein networks in which this protein is involved will provide a better understanding on its molecular role and its role in the pathology of

WBS.

100

CHAPTER 4 - IDENTIFICATION OF PROTEINS INTERACTING WITH GTF2IRD1

101

4.1 Introduction

GTF2IRD1 was first isolated due to its DNA binding capability in a yeast one- hybrid study (O'Mahoney et al., 1998a), and its effects over the repression of the transcription have been well stablished in both cell lines and transgenic mice, where

Gtf2ird1 produces a clear phenotype (Issa et al., 2006; Polly et al., 2003b). To date several investigations have focused on the study of GTF2IRD1 DNA targets

(Bayarsaihan and Ruddle, 2000b; Chimge et al., 2012; Makeyev and Bayarsaihan, 2011;

Ring et al., 2002), with the assumption that it behaves as a conventional transcription factor that has a consistent set of gene targets. However, these reports and also transcriptomic profile studies in Gtf2ird1 knock out mice (O'Leary and Osborne, 2011) have not provided with consistent gene targets that can direct to hypothesise about mechanisms for gene regulation mediated by GTF2IRD1. Therefore, there is important missing information about how GTF2IRD1 gene targets are selected for silencing and which processes could be involved in this repression.

A key feature of the GTF2IRD1 protein is the presence of five I-repeat domains

(RDs), which have been described to have sequence-specific DNA recognition properties for GGATTA-containing sequences (Polly et al., 2003b; Thompson et al.,

2007; Vullhorst and Buonanno, 2003a; Vullhorst and Buonanno, 2005). The GTF2IRD1 upstream region (GUR) contains three highly-conserved GGATTA binding sites and

EMSA studies have shown that all three are required to achieve high-affinity

GTF2IRD1 binding in a dimeric form (Palmer et al., 2010). This interaction is thought to be achieved by the simultaneous binding of at least two different RDs to two of the

GGATTA recognition motifs. This idea is supported by experiments that show RDs 2, 4 and 5 of the human GTF2IRD1 protein all have sequence-specific DNA binding properties (Polly et al., 2003b; Vullhorst and Buonanno, 2005). While these properties 102 have been analysed experimentally in the autoregulatory system, it is unclear what evolutionary advantage was bestowed by the multiple duplication of this DNA binding domain and how the RDs work in DNA binding site selection of target genes. However, one potential explanation is to hypothesize that the RDs also have protein-protein interaction functions and duplication of this domain provided a multiple surface for mediating GTF2IRD1 macromolecular interactions. This hypothesis predicts that in addition to their DNA binding properties, the RDs are also sites of protein partner interactions that can be identified experimentally.

Apart from the RDs, the human GTF2IRD1 protein contains a short leucine zipper near the N-terminus implicated in dimerization (Vullhorst and Buonanno, 2003a) a nuclear localisation signal (NLS) near the C-terminus, two SUMOylation motifs of which one is clearly highly conserved and functional (Widagdo et al., 2012), a highly- conserved C-terminal domain that may be important for SUMOylation due to the binding of the E3 SUMO-ligase, PIASxβ (Widagdo et al., 2012) and a polyserine tract near the C terminus.

To define functional relationships that can help to understand more how

GTF2IRD1 mediates gene repression, give more information about the domain structure of GTF2IRD1 and the molecular pathways in which this protein is engaged, an exhaustive analysis on the protein-protein interactions is required. To date, the only known partners are HDAC3 (Tussie-Luna et al., 2002a), PIASxβ (Tussie-Luna et al.,

2002b), and ZMYM5 (Widagdo et al., 2012).

In order to identify novel interacting partners of GTF2IRD1, yeast two-hybrid screenings were performed. This technique represents an unbiased approach with the strong advantage that no a priori assumptions can be made about GTF2IRD1 function

103 due to the nature of the Y2H as a sole protein binding assay. The potential binding partners are pulled out from a random selection of proteins and the libraries screened represent very large collections of cDNAs, therefore the ability to validate these hits will likely reflect GTF2IRD1 function. Another critical advantage of the Y2H is that it allows to identify both transient and stable interactions, since the only context required for a given protein-protein interaction to occur, is that the proteins coexist in the sample.

On the other hand, other proteomic techniques to study protein interactions involving mammalian cells, can have reduced feasibility due to the low abundance of the endogenous protein (Calvo et al., 2001; O'Mahoney et al., 1998a) which poses a technical challenge to attempt to pull down GTF2IRD1 complexes from human samples.

Therefore to identify the range of protein interactions engaged in by GTF2IRD1, yeast two-hybrid (Y2H) library screenings were used to generate an unbiased comprehensive list of protein partners. This allowed the identification 36 potential novel interaction partners, most of them confirmed by Y2H using direct co-transformation of the proteins. The sites of interaction were defined for GTF2IRD1 and mapped to specific domains, including the RDs.

These interactions were also analysed using transient expression in mammalian cells, including the analysis of the subcellular localisation of the protein partners, followed by coimmunoprecipitation (co-IP) of the nuclear-localised protein partners.

Most of the proteins isolated support a role in the regulation of chromatin.

Interactions with other DNA-binding proteins and transcriptional co-factors suggest that

GTF2IRD1 binds to chromatin targets using cooperative mechanisms. In addition,

104 interactions with several components of the primary cilium and ARM-repeat proteins offer intriguing new directions in GTF2IRD1 research.

4.2 Results

4.2.1 Yeast two-hybrid library screening for novel GTF2IRD1 interacting partners

In order to provide as complete a list of binding partners as possible, two Y2H screens were performed using the human full-length protein as bait and either a universal normalised mouse cDNA library (derived from a collection of different mouse tissues) and a human brain normalised cDNA library. Normalisation of these libraries attempts to correct for differences in mRNA abundance and should increase the likelihood of detecting interactions with proteins encoded by low-abundance transcripts as well as interactions with more abundant proteins (Widagdo et al., 2012).

The bait plasmid used was negative for an autoactivation test (described by

(Widagdo et al. (2012))). This is done by cotransforming in yeast the protein of interest contained in the bait plasmid (a GAL4 DNA binding domain fusion) together with the empty prey plasmid (GAL4 DNA activation domain); and constitutes an important negative control to exclude the possibility that the protein of interest fused to the GAL4

DNA binding domain may activate the empty vector.

In summary, 11 million mated clones were screened from the human brain library and 22 million from the mouse library; resulting in a combined total of 191 clones that passed the survival stringency test in the selective drop-out media (see

Materials and Methods for details). The prey plasmids were isolated from all 191 clones

105 and the plasmid inserts were sequenced and analysed against genomic and transcript

DNA databases (see Materials and Methods).

From these clones, only those that contained unique transcripts in frame with the

GAL4 DNA activation domain (those that can produce a fusion protein and therefore activate the transcription of the reporter genes for yeast survival) are listed in table 4.1 as potential novel interacting partners for GTF2IRD1. Clones with prey sequences duplicating other clones or out of frame with the GAL4 DNA binding domain were discarded. Most of the prey clones containing the binding partners for GTF2IRD1 were re-transformed with into haploid AH109 yeast with either GTF2IRD1 bait plasmid or the empty bait vector (pGBKT7), in order to confirm the specificity of these interactions

(figure 4.1). This direct yeast co-transformation was carried out with the prey plasmids rescued from the yeast clones, which usually contained a truncated ORF or with plasmids that contain full-length ORFs of the rescued prey sequences that were constructed or obtained (a detailed list of the plasmids used is provided in table 2.3, chapter 2 Materials and Methods). Prey clones that were resistant to the quadruple dropout (QDO) media in the presence of the empty control plasmid were presumed to encode proteins that bind directly to the GAL4 DNA binding domain; these were discarded as false positive hits and excluded from table 4.1.

This refinement process led to the identification of 38 independent GTF2IRD1 binding protein candidates, including 2 that were reported previously (table 4.1). Four of these clones were not pursued beyond sequence identification as they were known to be solely cytoplasmic, extracellular or cell membrane localised and were less likely to be of biological relevance. Two of the proteins have been described previously as interacting nuclear partners (Tussie-Luna et al., 2002b; Widagdo et al., 2012). Of the remaining 32 proteins, 26 either shuttle into the nucleus or are primarily located in the 106 nucleus according to known functions or predictions summarized in the subcellular localisation database, COMPARTMENTS (Binder et al., 2014). The KPNA proteins were predicted to have been isolated due to their binding to the GTF2IRD1 nuclear localisation signal (NLS). Preliminary Y2H studies in our laboratory (Widagdo, 2011) mapped this interaction to the C-terminal domain of GTF2IRD1, where the NLS is located, suggesting that this was indeed the case and these proteins were not pursued beyond this point. It was striking that five of the remaining non-nuclear proteins contribute to or have links with centrosome and primary cilia function (table 4.1), an association that has never been previously noted in connection with GTF2IRD1.

Interestingly, most of the proteins identified in the yeast two-hybrid screenings fit into several functional groups such as proteins associated with signalling pathways

(PKP2, Akirin2), proteins involved in ubiquitination and SUMOylation (PIAS1, USP20,

USP33, FBXW10), proteins involved in genetic disorders (ALMS1, BBS4) and a large number of DNA-binding proteins and/or transcriptional regulators: ELF2, Homez,

MBD3L1, TRIP11, ZC4H2 amongst others. It is also remarkable that some partners are known as or associated with histone modifying enzymes, such as SETD6 (N-lysine methyltransferase SETD6), ATF7IP (Activating transcription factor 7-interacting protein 1), DCAF6 (DDB1 and CUL4 associated factor 6) and three MYM-type zinc finger proteins: ZMYM5 (Zinc finger MYM-type 5, previously reported (Widagdo et al., 2012), ZMYM2 and ZMYM3.

PKP2 is a member of the plakophilin family that plays dual roles in the nucleus and in desmosomes (Chen et al., 2002). In HeLa cells, the protein predominantly localises to the cell surface. However, its closest family member PKP1 is known to localise more readily to the nucleus (Hatzfeld et al., 2000). Therefore a possible interaction between GTF2IRD1 with this PKP family member was also considered. To 107 address this question, the prey vector pGADT7 containing the PKP1 open reading frame was co-transformed with GTF2IRD1 and the interaction in yeast was verified (figure

4.1).

108

Figure 4.1 Confirmation of Y2H interactions by retransformation of bait and prey plasmids.

Haploid AH109 yeast were co-transformed with the pGBKT7 empty vector or the pGBKT7-GTF2IRD1 bait plasmid and each of the prey plasmids that had either been rescued directly from the primary screen or reconstructed with full-length ORFs.

Successful transformation of each plasmid pair was confirmed by survival on double dropout (DDO) medium (data not shown). The panels show transformed yeast replated from the DDO plates on quadruple dropout (QDO) medium containing x-α-gal, which provides an additional marker of protein-protein interaction through the production of a blue colour. Of the 36 putative partner proteins originally identified (Table 1), 7 were not screened using this assay. 4 were excluded from further study due to their cellular localisation and 2 have been previously reported (PIAS2 and ZMYM5). PKP1 was

109 included as an additional potential partner due to its strong homology to PKP2 and its predominant nuclear localisation properties.

110

Table 4.1 Summary of hits obtained in both Y2H screens.

Official HGNC gene symbols are used and hits are sorted alphabetically. Location data are based on reported subcellular localisations or predicted/inferred information using

COMPARTMENTS (Binder et al., 2014). Several clones occurred in both screens (a) and 2 genes (b) have been described previously (Tussie-Luna et al., 2002b; Widagdo et al., 2012). Some interactions were not pursued beyond sequence analysis (not done – n.d.). Other clones were either validated solely by retransformation in yeast (yeast), or by yeast retransformation and subsequent transfection into HeLa cell lines and coimmunoprecipitation (co-IP).

Gene Name Location Validation Symbol AKIRIN2 Akirin 2 nuclear yeast ALMS1 Alstrom syndrome 1 primary yeast cilia/ centrosome ARMCX5 Armadillo repeat containing, X-linked 5 nuclear/ yeast cytoplasmic ATF7IP Activating transcription factor 7 nuclear co-IP interacting protein ATP2C1 ATPase, Ca++ transporting, type 2C, cytoplasmic n.d. member 1 BBS4 Bardet-Biedl syndrome 4 primary yeast cilia/ centrosome DCAF6a DDB1 and CUL4 associated factor 6 nucleus co-IP ELF2 E74-like factor 2 (ets domain nucleus yeast transcription factor) FAM47E Family with sequence similarity 47, nucleus yeast member E FBXW10 F-box and WD repeat domain nuclear/ yeast containing 10 cytoplasmic FHAD1 Forkhead-associated (FHA) unknown yeast phosphopeptide binding domain 1 HOMEZ Homeobox and leucine zipper encoding nucleus co-IP HTRA4 HtrA serine peptidase 4 extracellular n.d. INTS12 Integrator complex subunit 12 nucleus co-IP

111

Table 4.1 Continued. Summary of hits obtained in both Y2H screens.

Gene Name Location Validation Symbol KPNA1 Karyopherin alpha 1 (importin alpha 5) nucleus yeast KPNA2a Karyopherin alpha 2 (RAG cohort 1, nucleus yeast importin alpha 1) KPNA3 Karyopherin alpha 3 (importin alpha 4) nucleus yeast KPNA4 Karyopherin alpha 4 (importin alpha 3) nucleus yeast MBD3L1 Methyl-CpG-binding domain protein 3- nucleus co-IP like 1 NAP1L2 Nucleosome assembly protein 1-like 2 nucleus co-IP OPHN1 Oligophrenin 1 cytoplasm yeast PARPBP PARP1 binding protein nucleus yeast PIAS1a Protein inhibitor of activated STAT-1 nucleus yeast PIAS2b Protein inhibitor of activated STAT, 2 nucleus reported PKP2 Plakophilin 2 desmosome/ yeast nucleus SCNM1 Sodium channel modifier 1 nucleus yeast SETD6 SET domain containing 6 nucleus co-IP SPTLC1 Serine palmitoyltransferase, long chain endoplasmi n.d. base subunit 1 c reticulum TAF1B TATA box binding protein (TBP)- nucleus yeast associated factor TMEM55A Transmembrane protein 55A membrane n.d. TRIP11 Thyroid hormone receptor interacting golgi/ yeast protein 11 primary cilia USP20 Ubiquitin specific peptidase 20 cytoplasm/ yeast centrosome USP33 Ubiquitin specific peptidase 33 cytoplasm/ yeast centrosome VIMP VCP-interacting membrane protein endoplasmi n.d. c reticulum ZC4H2a Zinc finger, C4H2 domain containing nucleus co-IP ZMYM2 Zinc finger, MYM-type 2 nucleus co-IP ZMYM3 Zinc finger, MYM-type 3 nucleus co-IP ZMYM5ab Zinc finger, MYM-type 5 nucleus reported

112

4.2.2 Domain characterization for the novel protein interactions of GTF2IRD1

To map the binding domains of these proteins in GTF2IRD1, a range of Y2H bait plasmids were constructed containing 8 separate 88 amino acid regions of the

GTF2IRD1 protein (figure 4.2A) containing known functional units or sequences that are strongly conserved between species, as described previously (Widagdo et al., 2012).

These peptide domains include the leucine zipper at the N-terminus (LZ), repeat domains 1 to 5 (RD1, RD2, RD, RD4, and RD5), and the regions containing the first and second conserved SUMOylation motifs (SUMO1 and SUMO2). A selected set of prey proteins were co-transformed with each of the 8 domain-specific plasmids and plated on QDO/x-αgal media (figure 4.2B). An example of these experiments is shown in figure 4.2B, in which prey plasmid containing the Nap1l2 ORF was co-transformed in yeast with all the domain-specific bait plasmids. The results for all the proteins tested are summarized in figure 4.2C, including interactions mapped previously in our laboratory (Widagdo, 2011), which have been added to compare the binding of other putative partners (ARMCX5, FHAD1, OPHN1, SETD6 and TAF1b) with those presented in this dataset.

Interactions were mapped to various different locations and sometimes involved multiple potential interaction domains. All of the repeat domains were found to mediate protein-protein interaction, except for RD5. RD1, which has previously demonstrated no DNA binding ability scored most of the interactions with other proteins (Polly et al.,

2003b). Other regions which appeared to be of high importance for protein interactions are the SUMO1 and SUMO2 regions which are the sites of SUMOylation, scoring the highest number of interactions (figure 4.2C).

113

Figure 4.2 Mapping of interaction domains in GTF2IRD1 with the proteins identified in the Y2H screens.

(A) Diagram of human GTF2IRD1 and the corresponding sub-domains used for mapping (black bars above). The domains include the leucine zipper region (LZ), five

114 repeat domains (RDs), two regions containing SUMO motifs and a nuclear localisation signal (NLS). (B) Representative example images of yeast colonies plated on double dropout agar (DDO) as a control, or quadruple dropout (QDO) agar containing x-α-gal.

Each colony represents yeast co-transformed with the empty vector control (V), domain-specific or full-length (FL) bait plasmids together with the prey plasmid identified in the Y2H screen. The example shown is the NAP1L2 interaction. Slight background activity in some surviving yeast is typical of Y2H assays and is ignored.

(C) Summary of the domain mapping results.

115

4.2.3 Subcellular localisation of GTF2IRD1 and its novel protein partners

To gather further evidence for interactions in a mammalian cell context and to check the subcellular localisation characteristics of the putative protein partners, plasmids encoding epitope-tagged versions or EGFP fusion proteins were either obtained or constructed (see table 2.3, chapter 2 Materials and Methods). These plasmids were transiently cotransfected into HeLa cells with plasmids encoding either

GTF2IRD1-EGFP or Myc-tagged GTF2IRD1 and colocalisation was analysed by fluorescence microscopy (figure 4.3).

The majority of the candidate proteins localised to the nucleus, as expected, with varying degrees of nuclear speckling and varying overlap with the tagged GTF2IRD1 protein (figure 4.3A). Candidate proteins that localised predominantly outside of the nucleus in these assays (figure 4.3B) may shuttle into the nucleus under normal circumstances to interact with GTF2IRD1 and thus, these findings should not prejudice the likelihood of a genuine biological interaction. However, these proteins were not included in the next stage of verification. Localisation of PKP1-EGFP to the nuclei of

HeLa cells was confirmed (figure 4.3B) and, therefore, PKP1 was included in the next stage of validation, while PKP2 that is primary outside the nucleus, was excluded.

116

Figure 4.3 Subcellular localisation of constitutively expressed GTF2IRD1 and the novel putative protein partners.

(A) Confocal coimmunofluorescence analysis of partner proteins exclusively located in the nucleus. HeLa cells were transfected with plasmids encoding GTF2IRD1-Myc (red) or GTF2IRD1-EGFP (green) together with plasmids encoding the partner, also tagged with Myc, EGFP or FLAG (PIAS1 only). (B) Confocal coimmunofluorescence analysis of partner proteins found to show cytoplasmic, cytoplasmic/nuclear or peripheral localisation. PKP1 was included as it is homologous to PKP2 (which is located at the cell periphery) but localises to the nucleus. Scale bars represent 20 µm and are provided when proteins localise outside of the nucleus. 117

4.2.4 GTF2IRD1 interactions with chromatin modifiers and transcriptional regulators can be demonstrated in mammalian cells

The majority of the candidate proteins that showed significant distribution in the nuclear compartment were selected for a further level of validation using co-IP analysis of the recombinant proteins. Plasmids encoding the proteins GTF2IRD1-Myc and each candidate protein as EGFP fusion were cotransfected into HeLa cells and protein complexes were immunoprecipitated using the anti-GFP antibody. Co-IP proteins were analysed on western blots using the anti-Myc antibody. Negative controls were performed by cotransfection with the empty pEGFP vector.

As shown in figure 4.4A, GFP fusion proteins containing ATF7IP, DCAF6,

Homez, INTS12, MBD3L1, NAP1L2, PKP1, ZC4H2 and ZMYM3 were immunoprecipitated using the anti-GFP antibody and GTF2IRD1-Myc was coimmunoprecipitated from these samples, therefore confirming these interactions in mammalian cells.

As SETD6 and ZMYM2 were only available in Myc-tagged constructs, these plasmids were cotransfected with plasmids encoding GTF2IRD1-EGFP, in order to perform coimmunoprecipitation in the reciprocal configuration. Coimmunoprecipitation of these proteins with GTF2IRD1-EGFP was confirmed by probing with the anti-Myc antibody (figure 4.4B). Negative controls for these interactions involved cotransfection of both protein partners with the empty pEGFP vector.

All of the candidate proteins tested were found to coimmunoprecipitate with

GTF2IRD1 from the HeLa cell extracts with varying levels of recovery and all the

118 control interactions with GFP were negative, as expected. A brief description of the proteins validated can be found in table 4.2.

119

Figure 4.4 Novel GTF2IRD1 interactions with nuclear proteins revealed by coimmunoprecipitation in mammalian cells.

Panels show western blot analyses of HeLa cells transiently transfected with the indicated constructs (A) Protein partners were immunoprecipitated (IP) with anti-GFP antibody and immunoblotted (IB) with anti-GFP to show successful immunoprecipitation. In one case (INTS12-GFP), the loading of the input was too low 120 to be detected but sufficient protein was recovered in the IP. Immunoblotting with anti-

Myc to reveal coimmunoprecipitation (Co-IP) of GTF2IRD1-Myc showed that

GTF2IRD1 was recovered in all experiments, except for the pEGFP vector control

(CON GFP). (B) Due to limited plasmid clone availability, some partners were assayed in the reverse configuration. HeLa cells were transfected with plasmids encoding

GTF2IRD1-EGFP or EGFP alone and SETD6-Myc or ZMYM2-Myc. Proteins were immunoprecipitated using anti-GFP antibody and the interactions detected by immunoblotting with anti-Myc antibody. Numbers below the construct names represent the molecular weight in kDa, which was assessed as correct against molecular weight markers on the original image.

121

Table 4.2 Notes on the functional properties of novel GTF2IRD1 protein partners validated in mammalian cells.

Protein symbols, accession numbers and gene ontology (GO) terms were retrieved from the UniProtKB database. (* Interaction previously reported by Widagdo et al. (2012)).

Symbol Uniprot ID UniProt Function (GO) Comments DCAF6 Q58WW2 Protein co-activator. ubiquitination,Ligand- Interacts with calmodulin, dependent nuclear androgen and glucocorticoid receptor transcription receptors (Chang et al., coactivator activity 2011b; Tsai et al., 2005). HOMEZ Q8IX15 Transcription factor Binds to Hoxc8 regulatory activity, DNA binding region (Bayarsaihan and Ruddle, 2000a). Has a role in late neurogenesis (Bayarsaihan et al., 2003; Ghimouz et al., 2011). INTS12 Q96CB8 Protein binding,Zinc ion Associated with RNA binding polymerase II. Involved in the processing of small nuclear RNAs (Baillat et al., 2005; Chen et al., 2013) MBD3L1 Q8WWY6 Regulation of Transcriptional repressor transcription DNA- (Jiang et al., 2002). templated Component of the NuRD complex (Jiang et al., 2004). ATF7IP Q6VMQ6 Protein binding, Interacts with RNA (MCAF1) regulation of polymerase II (De Graeve et transcription DNA- al., 2000). Transcriptional templated repressor and activator involved in heterochromatin formation (Fujita et al., 2003a; Ichimura et al., 2005; Uchimura et al., 2006; Wang et al., 2003). PML bodies (Sasai et al., 2013). PKP1 Q13835 Protein binding,Lamin Causative of ectodermal binding dysplasia/skin fragility syndrome (McGrath et al., 1997). Interacts with ssDNA (Sobolik-Delmaire et al., 2010). Role in translation regulation and cell proliferation (Wolf et al., 2010).

122

Table 4.2 Continued. Notes on the functional properties of novel GTF2IRD1 protein partners validated in mammalian cells.

Symbol Uniprot ID UniProt Function (GO) Comments NAP1L2 Q9ULW6 Protein binding,Histone Regulation of neuronal binding, differentiation and histone Regulation of histone acetylation. Interacts with acetylation histones 3 and 4 (Attia et al., 2007; Rogner et al., 2000). SETD6 Q8TBK2 Protein-lysine N- Lysine methyltransferase, methyltransferase inhibits NF-κB signalling activity,NF-kappaB (Binda et al., 2013; Levy et binding,Transcription al., 2011). Can act as factor activity transcriptional repressor or co-activator. Interacts with members of the NuRD complex (O'Neill et al., 2014). ZC4H2 Q9NQZ6 Metal ion binding Candidate gene for X-linked mental retardation (Lombard et al., 2011). Associated with arthrogryposis multiplex congenita and intellectual disability. Has a role in the nervous systems development (Hirata et al., 2013). ZMYM2 Q9UBW7 Zinc ion PML bodies (Kunapuli et al., binding,Regulation of 2006). Part of HDAC1, 2 co- transcription DNA- repressor complexes (Gocke templated and Yu, 2008; Hakimi et al., 2003). Rearranged in Myeloproliferative disorder (Xiao et al., 1998). ZMYM3 Q14202 Zinc ion binding,DNA Candidate gene for X-linked binding mental retardation (van der Maarel et al., 1996). Part of a HDAC1, 2 co-repressor complex (Hakimi et al., 2003). ZMYM5* Q9UJ78 Zinc ion binding Binding and repression of Presenilin-1 promoter (Pastorcic and Das, 2007). Possible association with neurological and craniofacial abnormalities (Bartnik et al., 2014).

123

4.2.5 Coimmunofluorescence analysis of endogenous partners in mammalian cells

The colocalisation of endogenous GTF2IRD1 was analysed with some relevant protein partners, such as the zinc finger protein ZC4H2, chromatin modifiers ZMYM2 and ZMYM3, and the transcriptional regulator DCAF6. The endogenous expression of these proteins had not been demonstrated before and because of their functional classification they were considered of interest to assess their colocalisation with

GTF2IRD1. Coimmunofluorescence was performed in HeLa cells using antibodies that recognise GTF2IRD1 and the proteins mentioned above. Confocal microscopy revealed that a fraction of the GTF2IRD1 protein colocalises at the endogenous level with all the proteins analysed (figure 4.5). These data indicate the extent of the constitutive endogenous interaction between GTF2IRD1 and some of the nuclear protein partners that have particular functional significance.

The proteins ZMYM2 and ZMYM3 were arguably the most interesting because they have been previously isolated using immunoaffinity purification from endogenous

HeLa cell extracts as part of a complex that contained TFII-I, BHC110, BHC80,

CoREST, HDAC1 and HDAC2 (Hakimi et al., 2003). Therefore, while direct binding of

TFII-I to ZMYM2 and ZMYM3 had not been demonstrated, it seemed plausible that direct interactions between ZMYM proteins and members of the TF2I family, including

GTF2IRD1, are a conserved feature that confers the ability to integrate into HDAC- containing silencing complexes.

To examine this hypothesis, a comparison of the colocalisation of endogenous

GTF2IRD1 and TFII-I together with ZMYM2 and ZMYM3 was conducted in HeLa cells. All four proteins were distributed in similar punctate patterns and GTF2IRD1 and

124

TFII-I both showed some colocalisation with each of the ZMYM proteins tested (figure

4.6). However, the overlap was not complete as a proportion of the red and green signal was still obvious in both cases.

125

Figure 4.5 Endogenous expression of several GTF2IRD1 partners in mammalian cells.

Coimmunofluorescence of HeLa cells detecting GTF2IRD1 (first column) and several relevant partners (second column) at the endogenous level. The subcellular localisation of GTF2IRD1 just partially overlaps with the partners tested (DCAF6, ZC4H2,

ZMYM2 and ZMYM3), which are expressed in a punctate manner within the nucleus, while ZC4H2 is also located in the cytoplasm in most of the cells.

126

Figure 4.6 Comparison of endogenous colocalisation of GTF2IRD1 and TFII-I with ZMYM2 and ZMYM3. Confocal immunofluorescence analysis of HeLa cells using antibodies against the protein indicated. Partial colocalisation is observed in both cases, indicating that interaction of TFII-I and GTF2IRD1 with the ZMYM proteins involves a subset of the protein population and is probably dynamic

127

4.2.6 Overview of the novel GTF2IRD1 interactional network

To summarize the findings of these experiments, a GTF2IRD1 interactional network (figure 4.7), was created involving those proteins that were shown to be capable of interaction in mammalian cells and by retrieving secondary interactions for these protein partners from online databases (See Materials and Methods for references).

Some proteins constitute major hubs of interaction, such as ATF7IP, PKP1 and

ZMYM2. Evidence of cross-regulation within this network is also indicated; for example, ATF7IP interacts with SP1, a transcription factor that can bind to the DCAF6 promoter (a direct GTF2IRD1 partner) in order to regulate DCAF6 expression (Chen et al., 2008).

Many of protein partners are associated with epigenetic complexes and are capable of interaction with histone deacetylases such as MBD3L1, ZMYM2, ZMYM3

(table 4.2). The novel GTF2IRD1 partner MBD3L1 has been found in the NuRD repressor complex in testis (Jiang et al., 2002; Jiang et al., 2004) in association with the proteins MTA1, MTA2, RBAP46, RBAP48 and MBD2. As mentioned earlier, ZMYM2 and ZMYM3 are associated with a CoREST-like complex, involving LSD1 (Hakimi et al., 2003). The interactional network demonstrates indirect interactions with the methylated DNA binding proteins, MBD1 and MBD2, and direct interactions between

GTF2IRD1 and the histone binding proteins NAP1L2 (histone H3 and H4) and SETD6

(H2B and H3), contributing further evidence for involvement of GTF2IRD1 in the mediation of gene repression through epigenetic mechanisms.

128

Interestingly, a number of the novel interacting partners presented in this work have also been shown to belong to histone H3 K9me3 reader complexes; this is the case for

HOMEZ, ZMYM2, ZMYM3, ZMYM5, KPNA3 and KPNA4. While MBD3L1, as member of the NuRD complex, it was reported to be repelled by K4me3 (Eberl et al.,

2013).

129

Figure 4.7 A GTF2IRD1 interactional network.

The protein partners that were validated by coimmunoprecipitation are depicted in light green, proximal to GTF2IRD1. Their corresponding protein-protein interactions (distal proteins) were retrieved from the IntAct database and published literature (blue). We have previously reported ZMYM5 as a GTF2IRD1 protein partner (Widagdo et al.

2012).

130

4.3 Discussion

These yeast-two hybrid screens, using two different normalised libraries: a universal mouse and a human brain library, have led to the isolation of a large number of novel interaction partners for GTF2IRD1. These screens make no a priori assumptions regarding the function of GTF2IRD1 since they are reliant on protein binding capabilities alone and potentially tests for interactions with proteins encoded by the entire genome. This exhaustive analysis therefore provides an unbiased account of potential GTF2IRD1 functions. In practise, not all of the proteins would be represented since the diversity of the library is limited by the cDNA samples from which each library was made. However, both libraries were normalized, which should overcome some of the abundance imbalances inherent in cDNA libraries. Use of the universal mouse normalized library also maximized the chances of discovering tissue-specific

GTF2IRD1 interactions and selected for interactions that are conserved across species, since a human GTF2IRD1 bait protein was used. The human brain library was used in order to maximize the chances of discovering protein interactions that are relevant to the role of GTF2IRD1 function in the cognitive abnormalities of WBS. Therefore, these studies provide a powerful insight into GTF2IRD1 function through an understanding of the proteins with which it interacts and provides an invaluable resource that allows the development of testable functional hypotheses.

As anticipated, a large number of GTF2IRD1 novel partners are nuclear- localised or have the capability to shuttle into the nucleus, while some are cytoplasmic or extracellular and are therefore more likely to be artefacts of the screening system, although there is no additional evidence to support that conclusion. Putting the proteins of primary interest into functional groups, several broad categories emerge; such as 131 nuclear import functions (KPNA1-4), post-translational modifications of ubiquitination

(e.g. USP20, USP33 and FBXW10) and SUMOylation (PIAS1 and PIAS2), DNA- binding proteins and transcriptional coregulators (e.g. ELF2, HOMEZ, TRIP11,

ZC4H2) and the largest group, which is primarily associated with chromatin regulation

(e.g. SETD6, ATF7IP, DCAF6, ZMYM2, ZMYM3, ZMYM5, MBD3L1 and NAP1L2).

Several proteins fall into a category that could indicate a signalling role. This group includes 2 proteins that localise to the primary cilium/centrosome complex

(ALMS1 and BBS4) as well as 3 other proteins that are linked with primary cilium function (TRIP11, USP20 and USP33). No previous reports have indicated a role for

GTF2IRD1 in this structure and there is no evidence as yet to suggest that GTF2IRD1 shuttles to this site but the isolation of 5 proteins belonging to this grouping in an unbiased screen, potentially containing all proteins encoded by the genome, seems beyond coincidence and could initiate a valuable new line of future investigation. From these proteins, 2 are associated with ciliopathies: mutations in BBS4 cause Bardet Biedl syndrome 4 (OMIM #615982) and ALMS1, Alstrom syndrome (OMIM #203800).

Primary cilia in specialized sensory cells are well known but it is now clear that these structures are virtually universal in all cell types, playing critical roles in the sonic hedgehog and Wnt signalling pathways and are particularly important in the developing brain (Guemez-Gamboa et al., 2014; Han et al., 2009).

GTF2IRD1 was shown to interact with 7 members of the ARM repeat- containing family; PKP1, PKP2, ARMCX5 and the importins KPNA1-4. The plakophilins localise to the cytoplasmic surface of desmosomes but also localise to the nucleoplasm in a wide range of cells. They are widely viewed as signalling proteins that shuttle between these locations playing roles of structural scaffold at the desmosome 132 and transcriptional regulation in the nucleus (Bass-Zubek et al., 2009), being capable of potentiating β-catenin/TCF-mediated transcriptional regulation (Chen et al., 2002).

ARMCX5 function is poorly understood but evidence suggests that the Armcx genes arose as a cluster on the X chromosome as a result of retrotransposition from Armc10.

These genes encode proteins that are highly expressed in the developing and adult nervous system, localise both to the nucleus and to mitochondria and play a role in the distribution and dynamics of the mitochondria (Lopez-Domenech et al., 2012).

Interactions with the importins KPNA1-4 are predicted to be related to nuclear import of GTF2IRD1 as per the main function described for these proteins (Chook and Blobel,

2001).

Mapping of the interaction domains by yeast co-transformation of GTF2IRD1 peptides with the various protein partners (some of which was conducted previously;

Widagdo (2011)) shows that binding can occur throughout the protein. All of the domains tested (repeat domains RD1-RD4, SUMO1, SUMO2 and the leucine zipper), except for RD5, participate in protein-protein interactions. RD1 and the SUMOylation domains are the regions that engaged in the highest number of interactions. Previous work in our laboratory has shown the importance of the repeat domains for DNA binding (Palmer et al., 2010) and how SUMOylation regulates the interaction between

GTF2IRD1 and ZMYM5 (Widagdo et al., 2012). Therefore, it can be speculated that the formation of complexes involving GTF2IRD1 could be regulated by post- translational modifications and may also include the possibility of GTF2IRD1 binding to DNA through one RD while tethering partners through another RD.

It is difficult to understand what evolutionary advantage was bestowed by the internal duplication of the repeat domains, leading to their expansion to 5 copies in 133 humans and 6 copies in mice. It seems unlikely that this was driven by a need to refine direct DNA binding properties, although this does restrict high affinity binding to sites that contain at least 2 GGATTA recognition sequences, as shown for the autoregulation of the GTF2IRD1 promoter/enhancer (Palmer et al., 2010). Work described here demonstrates that the RDs also form an important protein interaction surface as well as a DNA binding domain. Therefore, an evolutionary expansion of this domain would initially amplify the number of proteins with which GTF2IRD1 could interact and subsequent divergence of the repeat domain sequence could diversify the range of simultaneous protein-protein interactions. This scheme could explain the origin of the multiple RDs. Alternatively, adding repeat domains could provide a secondary interaction surface for the same partner protein, thus allowing greater control over the binding reaction. The multiple binding sites of several of the partner proteins in the

Y2H mapping experiments indicate that the latter scenario is possible.

Proteins from the same family were found to share the same domains of interaction, such as ZMYM 2, 3, 5 binding to the SUMO1 region and PKP1 and 2 binding to RD1, SUMO1 and SUMO2, suggesting that the interaction domains are conserved in homologous proteins. However, no common theme was found for the binding of transcriptional regulators or chromatin modifier partners to different domains of GTF2IRD1, indicating that these functions are not clearly subdivided into different portions of the protein. Questions relating to the conformational complexity of

GTF2IRD1 in vivo could not be examined by the Y2H peptide mapping assay, as binding with each single domain is examined individually. Nevertheless, it provides a simple and efficient means to provide functional insights into the GTF2IRD1 sub- domains and how these regions may have arisen during evolution.

134

The identification of 3 members of the ZMYM family as GTF2IRD1 partners is consistent with the isolation of ZMYM2 and ZMYM3 in association with TFII-I in the same HDAC-containing complex using immunoaffinity purification from endogenous

HeLa cell extracts (Hakimi et al., 2003). This would suggest that the interaction between ZMYM proteins and members of the GTF2I family is an evolutionary conserved feature. Endogenous coimmunofluorescence analysis of the ZMYM proteins suggested some overlap with GTF2IRD1 and TFII-I, while some was localised in separate locations. These findings suggest that while these proteins can be copurified in association, interactions may be unstable and dynamic. This is reinforced by the mapping of the ZMYM proteins to the SUMO domains of GTF2IRD1, which may contribute to the dynamic formation of such complexes.

The association of GTF2IRD1 with gene silencing functions is consistent with the identification of multiple binding partners that play a role in transcriptional regulation and chromatin modification (table 4.2). One might predict on this basis that a major functional role for GTF2IRD1 is to nucleate complexes of proteins that are capable of changing histone marks and direct them to specific locations in the genome, either through the direct DNA binding properties of GTF2IRD1 or by association with other transcription factors. This function would also be predicted to involve proteins indirectly identified in the expanded interactional network (figure 4.7).

135

CHAPTER 5- GENE REGULATION ASSOCIATED WITH GTF2IRD1

136

5.1 Introduction

Based on the well-established DNA binding properties of GTF2IRD1 (Issa et al.,

2006; Palmer et al., 2010; Polly et al., 2003a) and the impact on transcriptional regulation of reporter constructs when over-expressed in vitro (Polly et al., 2003a;

Vullhorst and Buonanno, 2003b) and in vivo (Issa et al., 2006), it has been assumed that

GTF2IRD1 is a conventional transcription factor that has a consistent set of gene targets that will be dysregulated in its absence. Transgenic mice overexpressing human

GTF2IRD1 in skeletal muscle (Issa et al., 2006), showed specific repression of slow fibre-specific genes in the soleus muscle, and evidence was presented that for one them,

TnIslow, the repression occurs through the binding of GTF2IRD1 to the upstream enhancer region (USE) of the gene.

Further evidence for the involvement of GTF2IRD1 in tissue-specific direct gene regulation has been described in the retina: Gtf2ird1 null mice show repression of a range of cone or rod-specific photoreceptor-enriched genes (Masuda et al., 2014).

Furthermore, in this tissue, GTF2IRD1 functions in cooperation with other proteins to activate the expression of M-opsin in M cones whilst repressing S-opsin at the same time. In rods, it is proposed that it can bind to the rhodopsin promoter in order to supress its expression (Masuda et al., 2014).

GTF2IRD1 over-expression in mouse embryonic fibroblast (MEF) cells has been used as an in vitro approach to try to identify direct or indirect targets of

GTF2IRD1 regulation by transcriptome profiling. A large list of genes were found to be dysregulated, but only a few were validated (Chimge et al., 2007). However, transcriptional profiling of Gtf2ird1 siRNA knock down in mouse neuroblastoma cell

137 lines did not lead to the validated identification of any dysregulated genes (O'Leary and

Osborne, 2011).

The expression of Gtf2ird1 in the mouse brain is detectable is diverse areas of the developing brain (Palmer et al., 2007), and in the adult, it is abundant in the striatum, olfactory bulb, cerebellum and piriform cortex, amongst other areas. Gtf2ird1 knock out mice present a range of demonstrated alterations in behaviour and motor function including hyperactivity and ataxia (Howard et al., 2012; Young et al., 2008) and electrophysiological changes in CNS neurons (Proulx et al., 2010).

Despite these data, evidence for a discrete gene set regulated by GTF2IRD1 from transcriptional analysis of knock out whole brain tissue has so far proved elusive

(O'Leary and Osborne, 2011). One hypothesis that may explain this apparent discrepancy is that GTF2IRD1 function is highly context dependent and the consequences of gene loss vary in different cell types. In a highly heterogeneous tissue like the brain, differential expression profiles may be masked and diluted by the variety of different cell contexts and the varying levels of Gtf2ird1 expression. If this hypothesis is correct, one way to overcome this difficulty in transcriptional profiling experiments would be to choose a brain region that is more homogeneous and has consistent Gtf2ird1 expression levels. An alternative strategy would be to look for the immediate consequences of GTF2IRD1 repression in homogeneous cultured cell lines.

Since Gtf2ird1 has been shown to be relatively consistent throughout the corpus striatum (Howard et al., 2012), of which 96% of the neuronal composition is made up medium spiny neurons, this region was chosen for microarray analysis. Secondly, an in

138 vitro analysis approach was chosen, using GTF2IRD1 siRNA knock down in HeLa cells.

Here it is shown that siRNA knock down of GTF2IRD1 in HeLa cells results in very modest changes in the transcript profile and the differentially expressed gene set do not overlap with those identified in the striatum analysis. In the striatum of Gtf2ird1 knock out mice, increased expression was found for genes involved in neuronal development and a cluster of immediate-early response genes that correlate with hyperactivity, ADHD and the response to psychostimulants.

Integration of the two approaches from this chapter, combined with the published literature, strongly suggests there is no common theme for the gene expression regulation mediated by GTF2IRD1, and this protein may be acting in a highly variable temporal and spatial manner and its specificity of gene regulation may differ in different tissues and cellular subpopulations.

5.2 Results

5.2.1 Microarray analysis of GTF2IRD1 knock down in a mammalian cell line

The first approach chosen to identify gene sets whose expression is altered by the levels of GTF2IRD1 was a microarray gene expression profile in HeLa cells as an unbiased genome-wide screen. Since it was previously shown that it was possible to efficiently knock down the expression of GTF2IRD1 in HeLa cells and its expression pattern was similar to other cell lines analysed (see chapter 3), HeLa cells were chosen

139 as a very basic model system to test by microarray analysis. This simple model was thought to potentially form a basis for identifying gene targets that could be used in further investigations on the molecular pathways altered under dosage differences of

GTF2IRD1. In addition, the knock down of GTF2IRD1 was occurring only in the mature cell line and not in the lineage history of the cells that gave rise to this cell line.

Therefore, it was reasoned that differentially expressed genes that could be identified would be more likely to be current direct targets of GTF2IRD1 regulation.

GTF2IRD1 protein levels were knocked down by transient transfection of a pool of siRNAs targeting the GTF2IRD1 gene in triplicate samples. Control transfections of scrambled siRNA were also performed in triplicate at the same time. Forty eight hours post-transfection, total RNA was collected from the three samples of each condition

(from here on GTF2IRD1 KD and control). The efficiency of the knock down was assessed by immunofluorescence using the anti-GTF2IRD1 333A antibody in cell samples plated on glass coverslips, transfected under the same experimental conditions.

97% of cells were estimated to have sufficient GTF2IRD1 knock down for the protein to be undetectable by immunofluorescence. All of the 6 RNA samples isolated were used for a 3 versus 3 microarray analysis using an Affymetrix Human 2.0 ST chip, which encompasses over 30,000 annotated transcripts plus 11,000 long non coding

RNAs (lnc-RNAs).

The results were analysed using Partek Genomics Suite software, which generated a list of deregulated genes that showed significance (unadjusted p value

≤0.05) with a change of expression of +/- 1.2 fold. This threshold was set based on other reports that have shown very subtle changes in gene expression due to GTF2IRD1 loss in other studies models (O'Leary and Osborne, 2011; Widagdo, 2011). 627 annotated 140 genes satisfied this criterion, including 5 long non coding RNAs (lncRNAs). Although this array is not designed specifically for microRNA analysis, the gene list also included

40 dysregulated microRNAs. The microarray analysis confirmed that the GTF2IRD1 mRNA abundance in the knock down samples was -2.59 folds.

These dysregulated gene identities were used for a gene ontology enrichment analysis (using Partek Genomics Suite software) to define whether there is a common trend in the dysregulation of specific groups of genes. This software analysis works by assigning an enrichment score that is higher when there is an overrepresentation of genes belonging to the same ontology group. The gene ontology annotations for these groupings are defined by the Gene Ontology Consortium (www.geneontology.org).

Figure 5.1 shows the molecular function ontology groups that were assigned the highest enrichment scores, were statistically significant (p value ≤0.05) and that had more than two gene in each group. The most overrepresented ontology groups were the molecular functions of neuropeptide Y receptor activity, actin binding, and signalling, enzymatic and receptor activities (figure 5.1).

141

Figure 5.1 Gene ontology (GO) enrichment analysis of the dysregulated genes found in HeLa GTF2IRD1 knock down samples.

The GO groups with the highest enrichment scores based on molecular function classification that contain more than two genes in the group are presented in the chart.

The category identities are official GO terms and the numerical values represent the enrichment score obtained in the analysis (p value ≤0.05).

142

5.2.2 Microarray analysis in corpus striatum tissue of GTF2IRD1 knock out mice

Previous work has shown a clear set of neurobehavioural phenotypes in the

Gtf2ird1-/- mice (Howard et al., 2012; Young et al., 2008) but no gene sets have been validated in the mouse brain that correlate with these phenotypes (O'Leary and Osborne,

2011). Therefore, we aimed to readdress this issue by focussing attention on the corpus striatum, which contains minimal cellular diversity and is composed mainly of medium spiny neurons that all express Gtf2ird1 at appreciable levels (Howard et al., 2012).

Thus, any confounding effects that are due to dilution effects and cell context should be minimized.

Striatum tissue was dissected from five 8-9 month old C57BL/6 wild type (WT) mice and five Gtf2ird1-/- (KO) mice. RNA was extracted and a 5 versus 5 microarray analysis was performed using a Mouse Gene 1.0 ST Affymetrix array, which covered over 26,000 RefSeq transcripts. Partek Genomics Suite software was used to generate a list of dysregulated genes, starting at ±1.2 fold change (unadjusted p value ≤0.05). After applying a multiple testing correction of 10%, (indicating the percentage of false positives hits) only one gene emerged as being significantly different. When these parameters were adjusted by raising the multiple testing correction to 40% and excluding two outlier samples detected in a principal component analysis (PCA), the differentially expressed gene list increased to 177 hits.

Using this gene list, the same gene ontology enrichment analysis as with the

HeLa cells samples was performed using Partek Genomics Suite software. Figure 5.2 shows the molecular function ontology groups that were overrepresented in this analysis, which contained at least two genes that are significant (p value ≤0.05). The highest enrichment scores were assigned for the molecular function of olfactory

143 receptors and receptor activity; followed by specific receptor activities such as steroid hormone, G-protein coupled and ligand-dependent. Signalling, DNA binding, RNA polymerase II regulation and tropomyosin binding functions were also in the top categories overrepresented. Most notable was the identification of genes that showed increased expression in the knock out brain samples, which are involved in neuronal development and a cluster of immediate-early response genes that have previously been linked with hyperactivity.

144

Figure 5.2 Gene ontology (GO) enrichment analysis of the dysregulated genes found in the Gtf2ird1-/- striatum samples.

The overrepresented GO groups are shown based on their molecular function annotations. Groups containing at least two genes are presented. The category identities are official GO terms and the numerical values represent the enrichment score obtained in the analysis (p value ≤0.05).

145

5.2.3 Analysis of the differentially expressed genes by qRT-PCR

5.2.3.1 qRT-PCR in siRNA-treated HeLa cells

Several genes were chosen for further qRT-PCR validation as the discovery of differentially expressed genes criteria were based on parameters that were not of the highest stringency. These genes were selected as being of primary interest due to their potential relevance to GTF2IRD1 function and hypotheses of downstream functional consequences. A summary of these genes, including their main function and fold expression is shown in table 5.1. cDNA was prepared from the RNA samples utilised previously in the microarray analyses and these were used as template for qRT-PCR analysis. Levels of expression for the genes ELK3, KAT2B and NDRG1 were quantitated in comparison with the housekeeping gene HPRT (hypoxanthine phosphoribosyltransferase 1) as an internal reference standard, using the delta Ct (ΔCt) method (Livak and Schmittgen, 2001). FOS was also included as it was found to be upregulated in the striatum tissue microarray analysis; although it was not one of the

177 hits discovered in the HeLa sample gene list. In agreement with the microarray results, the qRT-PCR quantitation of FOS expression showed no significant difference between siRNA-treated HeLa samples and the control samples (figure 5.3A). The same was true for the other genes analysed; the ΔCt values were not significantly different from controls (figure 5.3A) and the mean fold expression change observed was relatively minor compared to the control mean expression value, which was adjusted to

1 in all cases to allow all values to be plotted on the same graph (figure 5.3B).

Two other genes were also included for the purposes of integration with other results that will be presented in the following chapter. Due to a potential connection

146 between GTF2IRD1 and histone deacetylases, a quantitative analysis of HDAC1 and

HDAC2 transcripts was also performed in the siRNA treated HeLa cells (figure 5.3).

147

Table 5.1 Overview of the differentially expressed genes chosen for qRT-PCR validation in the HeLa siRNA treated cells.

Gene symbols are listed together with the main cellular process of the protein. Process information was obtained from GO evidence codes according to the Gene Ontology

Consortium database (Ashburner et al., 2000). (CEL) comparative expression levels of mRNA obtained by qRT-PCR. Error values represent standard error of the mean

(SEM).

Gene Name Process CEL symbol ELK3 ELK3, member of ETS Regulation of transcription; 1.38±0.34 oncogene family cell differentiation KAT2B K(lysine) acetyltransferase Chromatin remodelling; 0.97±0.30 2B histone H3 acetylation NDRG1 N-myc downstream Peripheral nervous system 0.86±0.13 regulated 1 myelin maintenance; positive regulation of spindle checkpoint

148

Figure 5.3 qRT-PCR validation of selected genes from siRNA treated HeLa cells.

(A) cDNAs from HeLa cells treated with an siRNA pool targeting GTF2IRD1 (knock down) and a scrambled siRNA (control), (n=3, each condition) were used for qRT-PCR analyses using Hprt expression as a reference. The scatter plots represent values for each sample relative to the genotype mean (long horizontal bar) in the cycle threshold values (ΔCt). (B) The same results are expressed as fold expression levels relative to the control, which is set at 1 in all cases. Error bars represent the standard error of the mean

(SEM). 149

5.2.3.2 qRT-PCR of dysregulated genes in striatum tissue

From the RNA samples utilised for the microarray analyses (five Gtf2ird1-/- and five wild type mice), cDNA was prepared and used as template for qRT-PCR for relative quantitation using the mouse housekeeping gene Hprt as an internal standard. A range of dysregulated genes were selected according to their possible relevance to the phenotypes observed in the Gtf2ird1 knock out mice. The comparative expression levels of Fos, Fosl2, Egr2, Eya1, Nr4a1-3, Ubash3b and Zbtb16 were analysed. As shown

(figure 5.4), the difference in expression between knock out and wild type mice was significant for Fos, Egr2, Eya1, Nr4a1 and Hdac1 (p value ≤0.05). The number of mice range from 4 to 5 as there were outliers in the expression of certain genes that were excluded.

Comparative expression level was calculated and the mean expression level of each gene in the wild type samples was set at a value of 1 in order to plot all genes on the same graph (figure 5.5), All the genes analysed were found to be upregulated, except for Ubash3b, which showed no change and Zbtb16, which was found to be slightly downregulated. The largest change in fold expression was for Fos, which showed a 5.48±0.83 fold upregulation compared to wild type mice. A summary of fold change as well as main cellular function is compiled in table 5.2. Similar to the HeLa cell experiment, quantitation of expression for the histone deacetylases Hdac1 and

Hdac2 was also included. It is also noteworthy that one of the protein partners identified for GTF2IRD1, Plakophilin 2 (Pkp2) also appeared to be downregulated in the microarray expression profile; therefore it was also included for analysis by qRT-PCR.

Pkp2 shows a trend towards decrease in fold expression level compared with wild type mice, but the ΔCt values did not indicate that these differences were significant (figures

5.4 and 5.5). 150

Figure 5.4 qRT-PCR validation of selected genes in the striatum of Gtf2ird1-/- mice.

The scatter plots represent the variation of every sample (n=4-5, Gtf2ird1-/- or wild type mice) in the cycle threshold values (ΔCt) of the qRT-PCR for the genes analysed compared with the expression of the housekeeping gene Hprt. Error bars represent

SEM; asterisks denote a p value ≤0.05.

151

Figure 5.5 Comparative expression levels of genes in the corpus striatum tissue of

Gtf2ird1 knock out mice.

Fold expression values, normalised against Hprt, are shown for the genes Egr2, Eya1,

Fos, Fols2, Hdac1, Hdac2, Nr4a1-3, Ubash3b and Zbtb16, in Gtf2ird1-/- mice.

Expression levels for the wild type mice were set at 1 in all cases (n=4-5). Error bars represent SEM.

152

Table 5.2 Overview of the dysregulated genes in the striatum of Gtf2ird1 knock out mice.

The gene symbols are listed together with the main cellular process associated with the proteins they encode. Process information was obtained from GO evidence codes according to the Gene Ontology Consortium database (Ashburner et al., 2000). CEL represents the Comparative expression levels of mRNA obtained by qRT-PCR. Error values represent SEM.

Gene Name Process CEL symbol Egr2 early growth response 2 Regulation of transcription; 2.98±0.35 brain development Eya1 eyes absent 1 homolog Regulation of neuron 2.47±0.78 (Drosophila) differentiation; positive regulation of transcription Fos FBJ osteosarcoma Regulation of transcription; 5.48±0.83 oncogene nervous system development Fosl2 FOS-like antigen 2 Regulation of transcription; 1.43±0.26 regulation of cell proliferation Nr4a1 nuclear receptor Regulation of transcription; 2.73±0.18 subfamily 4, group A, regulation of cell proliferation member 1 Nr4a2 nuclear receptor Transcription; adult locomotor 4.36±2.16 subfamily 4, group A, behaviour member 2 Nr4a3 nuclear receptor Regulation of transcription; 2.09±0.64 subfamily 4, group A, adult behaviour member 3 Ubash3b ubiquitin associated and Regulation of protein kinase 1.22±0.15 SH3 domain containing B activity Zbtb16 zinc finger and BTB Regulation of transcription 0.70±0.08 domain containing 16

153

5.3 Discussion

5.3.1 Transcriptional profiling of SiRNA-treated HeLa cells

The in vitro model for analysing the effect of GTF2IRD1 loss on gene expression used HeLa cells treated with a pool of siRNA targeting GTF2IRD1 mRNA.

This was achieved by transient transfection that was stopped after 48 hours, and the knock down efficiency was estimated by immunofluorescence, which indicated that

97% of cells contained no detectable GTF2IRD1 protein. The microarray profile indicated a change of -2.5 folds in GTF2IRD1 mRNA level. GTF2IRD1 was previously shown to be subject to a negative autoregulatory loop (Palmer et al., 2010), but the siRNA degradation would continually reduce the levels of newly synthesised mRNA.

Therefore it is difficult to know what levels of mRNA might be expected under such conditions. The immunofluorescence analysis indicates that the block to GTF2IRD1 protein synthesis is profound in those cells that are successfully transfected but at normal levels in those that are not.

Overall, the vast majority of the differentially expressed genes showed a fold change close to ±1.2 and the qRT-PCR validation, which was performed on a subset of the genes of primary interest, failed to detect any significant differences in the levels of expression of ELK3, FOS, HDAC1, HDAC2, KAT2B and NDRG1.

A number of previous studies have reported the differential expression of genes in other in vitro models, which as described below, were compared to the results obtained in this work with the siRNA treated HeLa cells. The published gene lists were contrasted to our results and little or no overlap was found; in the case of shared

154 dysregulated genes with the reported studies, the GO categories probably do not show a straightforward correlation with GTF2IRD1 function.

A search for many of the genes that have been found to be dysregulated or claimed as GTF2IRD1 DNA targets (Lazebnik et al., 2008; O'Leary and Osborne, 2011) revealed no representation amongst the hits identified in the siRNA treated HeLa dataset, except for BMP4, which showed a subtle 1.22 fold change in the siRNA treated

HeLa cells. This lack of overlap between dysregulated gene sets also occurred in other analyses, such as the expression profile induced by overexpression of GTF2IRD1 in

MEFs (Chimge et al., 2007) where a large list of dysregulated genes did not show overlap with other datasets. Similarly, a study using siRNA knock down of Gtf2ird1 in a neuroblastoma cell line showed no correspondence with the dysregulated genes identified in brain tissue microarray analyses from Gtf2ird1 null mice (O'Leary and

Osborne, 2011). In the context of WBS, a study of the transcriptome profile carried out in lymphoblastoid cell lines from WBS patients (Antonell et al., 2010b) showed diverse categories of dysregulated genes, including those involved in glycolysis and neuronal migration. Comparison with the HeLa dataset described here, showed overlap with only one gene, PGAM1 (Phosphoglycerate mutase 1), an enzyme involved in the glycolytic pathway.

Another comparison of the dysregulated genes found in siRNA treated HeLa cells was performed against the transcriptome profile obtained from skin fibroblasts of

WBS patients (Henrichsen et al., 2011), and revealed 17 common genes, most of them were categorised by GO in the ubiquitin proteasome and gonadotropin releasing hormone receptor pathways.

155

Therefore, the comparison made between the differentially expressed genes found in the microarray analysis from siRNA treated HeLa cells with other published reports from different cell lines, suggests that GTF2IRD1 gene regulation may be context/tissue dependent.

5.3.2 Transcript profiling of Gtf2ird1 knock out mice striatum samples

The dysregulated gene list obtained from the microarray profile of the striatum samples isolated from Gtf2ird1 knock out mice was generated by increasing the threshold of the false discovery rate, in order to obtain a greater number of candidate genes to analyse. This was done because some of the neurological phenotypes are relatively subtle (Canales et al., 2014; Howard et al., 2012) and this could be reflected in the changes to the transcriptional profile. A number of genes were selected for qRT-

PCR validation based on their relevance in the neurobehavioural phenotypes and those that showed differential expression after this validation are summarised in table 5.2.

The list of dysregulated genes was compared to data from published reports of genes predicted to be targets, or found to be differentially expressed in Gtf2ird1 knock out mice, but no overlap was found from studies in the brain and in other tissues

(Lazebnik et al., 2008; Makeyev et al., 2012; O'Leary and Osborne, 2011).

In order to correlate the results obtained from the striatum samples into the context of copy number variation in the 7q11.23 genomic region, the validated dysregulated gene set were compared to the 7qGB-WikiWilliams database, which curates genome-wide experimental data from induced pluripotent stem cell derived

(iPSC) samples obtained from both Williams-Beuren syndrome and Williams-Beuren

156 region duplication syndrome patients (Adamo et al., 2015). The comparison was performed against microarray data obtained from neural progenitor cells (NPCs) that express forebrain markers, as they constitute a relevant comparison for our striatal tissue samples. The genes that are significantly upregulated in the Gtf2ird1-/- mice are also overexpressed in NPCs from WBS, although there is no corresponding expression difference in the duplication syndrome.

Another comparison was performed with two other transcriptome profile WBS datasets derived from lymphoblastoid cells (Antonell et al., 2010a; Henrichsen et al.,

2011) and skin fibroblasts. In the lymphoblastoid cell study, there was an overlap of two genes: OXT, which encodes the precursor of oxytocin and neurophysin, involved in maternal and social behaviour (Figueira et al., 2008; Neumann, 2008), and HIST1H2BM

(histone cluster 1, H2bm) which is found dysregulated in a form of early-onset dementia

(Martins-de-Souza et al., 2012). The comparison with differentially expressed genes in fibroblasts from WBS patients showed overlap with the genes ARHGAP26 (Rho

GTPase activating protein 26), CHST2 (carbohydrate [N-acetylglucosamine-6-O] sulfotransferase 2) and FOSL2 (FOS-like antigen2), which is interesting because it is another member of the FOS family.

The most striking finding of the microarray analysis of the striatum tissue is the validated upregulation of several immediate early response genes, including Fos, Egr2 and Nr4a1, due to their importance to the dynamic control of gene transcription following diverse stimuli, as summarised at the end of this discussion. The upregulation of Fos in the striatum of Gtf2ird1-/- mice also fits well with previous evidence that shows the positive transcriptional impact of TFII-I, upon binding to FOS upstream elements (Kim et al., 1998). A counter-regulation system between GTF2IRD1 and TFII- 157

I has been proposed based on in vitro reporter studies (Tussie-Luna et al., 2001), which showed that GTF2IRD1 can repress FOS promoter activation mediated by TFII-I. The upregulation of Fos in the Gtf2ird1 knock out mice striatum shows, for the first time, in vivo evidence of this process, in a tissue relevant to neurobehavioural phenotypes.

C-Fos immunostaining was used as a marker of recent neuronal activation following a forced swim test (Howard et al., 2012), evaluating different brains regions, including amygdala, lateral septum, nucleus accumbens and cingulate cortex amongst others. The striatum analysis was not done and the difference was sexually dimorphic but Fos cell counts were clearly elevated in several regions (Howard et al., 2012).

Previous observations in Gtf2ird1-/- mice demonstrated an increase in exploratory behaviour during the dark cycle and impaired control of motor coordination

(Howard et al., 2012). The striatum is a brain region that is associated with motor control, cognition and learning and emotional-motivational functions, and its abnormalities have been associated with human disorders, such as Parkinson’s and

Huntington’s diseases, drug addiction and dystonia (Crittenden and Graybiel, 2011;

Groenewegen, 2003). The genes found upregulated present a possible connection with the neurobehavioural phenotypes of the Gtf2ird1 knock out mice, but may also indicate a link between GTF2IRD1 and other traits related to striatal function.

158

5.3.2.1 Overview of the genes upregulated in the striatum of Gtf2ird1 knock out mice

EGR2, Early growth response 2: In humans, EGR2 is related to learning and memory processes (Poirier et al., 2008) and it has been associated with schizophrenia and bipolar disorder in some gender/ethnicity conditions (Balan et al., 2013; Cheng et al., 2012;

Kim et al., 2012).

Mutations in EGR2 are responsible for some forms of Charcot-Marie-Tooth disease, Dejerine-Sottas syndrome and congenital hypomyelinating neuropathy (Shiga et al., 2012; Warner et al., 1998; Warner et al., 1999) and studies in knock out mice support a role in the myelination of the peripheral nervous system. Increased levels of

EGR2 have been reported to be induced by seizure activity of certain cognitive processes (reviewed by Poirier et al. (2008). Interestingly, Egr2 levels are increased in the mouse striatum during the first hour of administration of addictive drugs, such as cocaine and methamphetamine, (GEO Profiles database, accession number GDS3703).

Moreover, upregulation of Egr2 mRNA is induced in several brain regions by dopamine

D1 agonists or morphine withdrawal and, in the striatum, by the excitotoxic compound kainic acid (reviewed by Beckmann and Wilce (1997).

Similar elevated levels of Egr2 have been found in the striatum tissue of mice with genetic or pharmacological inactivation of the adenosine A2A receptor, thus demonstrating that Egr2 is a downstream effector of signalling by this receptor (Yu et al., 2005). The adenosine A2A receptor of the striatum is noteworthy because it is thought to be an important player in neuropsychological disorders such as Parkinson’s disease; as adenosine A2A antagonists in the striatum produce an antiparkinsonian

159 effect in terms of motor function (Jenner et al., 2009; Pereira et al., 2005; Richardson et al., 1997). However, pharmacological studies show that inactivation of the adenosine

A2A receptor leads to decreased expression of Fos (Dassesse et al., 2001), which is opposite to the result shown here. Therefore, the pathway activated by the adenosine

A2A receptor may not be the same mechanism that causes an upregulation of Egr2 in the Gtf2ird1 knock out mice.

EYA1, Eyes absent homolog 1 (Drosophila): Eya1 is a transcriptional co-activator of high importance in the development of the kidney, the eye, the ear, neurogenesis of the ear, and the patterning of cranial sensory nerves, amongst others (Bonini et al., 1993;

Gong et al., 2007; Xu et al., 1999; Xu et al., 2002; Zou et al., 2004). Furthermore, mutations in the human EYA1 gene cause several genetic disorders such as branchio-oto

(BO) and branchio-oto-renal (BOR) syndrome, and congenital cataracts and ocular anterior segment anomalies (Castiglione et al., 2014; Musharraf et al., 2014). Another important aspect of EYA1 is its ability to interact with members of the epigenetic complex SWI/SNF in order to facilitate neuronal development. In this model, EYA1 can promote neurogenesis by recruiting a chromatin remodelling complex that enables promoter activation in neuron specific genes (Ahmed et al., 2012).

FOS, FBJ murine osteosarcoma viral oncogene homolog: Also known as c-FOS, this protein is encoded by a well-studied immediate early response gene with multiple roles in cell differentiation, development and proliferation as part of the AP-1 transcription complex (together with c-JUN and ATF) (Jochum et al., 2001; Mitchell et al., 1986;

Wagner, 2002). FOS has a critical role in bone development, by participating in osteoclast differentiation, and the AP-1 complex is widely studied because of its involvement in tumorigenesis and as a tumour suppressor (Jochum et al., 2001). 160

In the brain, c-FOS is upregulated in various regions in response to a broad range of stimuli, including neurotropic substances, learning and memory processes, behaviour, stress, seizures and the light/dark cycle, which leads to a cascade of activation of downstream genes (as reviewed by Herrera and Robertson (1996),

(Kovacs, 1998)).

With specific relevance to the results presented here, increased striatal FOS expression has been observed under induced stress (Clark et al., 2014) and following the administration of psychostimulant drugs. In mice, cocaine induces Fos striatal immunoreactivity (Young et al., 1991) and expression profile studies show that, like

Egr2, Fos is induced during the first hour of administration of cocaine, methamphetamine (GEO Profiles database, accession number GDS3703) and morphine

(Ziolkowska et al., 2012).

NR4A1, 2 and 3, Nuclear receptor subfamily 4, group A, member 1, 2 and 3:

NR4A1, 2 and 3 are orphan nuclear receptors that belong to the nuclear hormone receptor (NR) 4A subgroup that function as early response genes in a range of cellular processes such as proliferation, apoptosis, differentiation, adipogenesis and inflammation (Staels, 2010). The members of this subgroup are able to interact with each other in order to activate their own expression, and they can be repressed or activated by a number of key pathways, mediated by the factors c-JUN, PML and β- catenin (as reviewed by Kurakula et al. (2014)). Part of their mechanism of regulation also involves interactions with members of histone deacetylase complexes such as the

HDAC1, SMRT, Co-REST, N-CoRs and SIN3A (Kurakula et al., 2014).

161

In a similar manner to the immediate early genes upregulated in the Gtf2ird1-/- mice, Nr4a2 and Nr4a3 also show increased expression after administration of drugs, like methamphetamine and nicotine in rats: Nr4a1 was the only family member that showed the opposite effect (Saint-Preux et al., 2013). In mice, administration of antipsychotic drugs produces an elevated expression of Nr4a1, 2 and 3 in the area of the striatum associated with motor activity (Maheux et al., 2005). Interestingly, since none of the three receptors have any endogenous ligands identified to date, it is hypothesised that they work regulated at the transcriptional level (Pearen and Muscat, 2010). NR4A proteins are also present in tissues such as liver, skeletal muscle, brown and white adipose tissue, and the heart; where a wide range of stimuli are responsible for their increased expression, which results in diverse effects on metabolism (Pearen and

Muscat, 2010). The NR4A subgroup of receptors is currently being explored as a potential target for cancer therapies and memory improvement (Deutsch et al., 2012;

Hawk et al., 2012) and for cell therapy in Parkinson’s disease, where NR4A2 is required for dopaminergic differentiation in the neurons to be transplanted (Kim et al., 2003).

5.3.3 Context dependence of the in vitro and in vivo gene expression profiles

The transcriptome profile of two different samples: siRNA treated HeLa cells

(GTF2IRD1 knock down) and striatum tissue from Gtf2ird1-/- mice, was reported in this chapter. In order to integrate the results obtained in both in vitro and in vivo models, a comparison was made, contrasting the differentially expressed gene lists obtained from both microarray analyses. The gene ontology enrichment analyses based on molecular function classification in both in vitro and in vivo gene expression profiles showed no overlap between specific ontology categories, although there are similar general groups such as cytoskeleton binding or receptor activities. 162

A further analysis of the dysregulated genes lists obtained from the siRNA treated HeLa cells and striatum tissue from Gtf2ird1-/- mice, showed that the only genes that overlapped from both microarray analyses are: HSPB8, NAP1L5 and UBASH3B

(figure 5.6).

HSPB8 (heat shock 22kDa protein 8) is a chaperone that prevents aggregation of misfolded proteins in neurodegenerative diseases and it is associated to Charcot-Marie-

Tooth disease (Capponi et al., 2011; Crippa et al., 2010; Vicario et al., 2014).

NAP1L5 (nucleosome assembly protein 1-like 5) is an imprinted gene with low levels of expression in the striatum (Davies et al., 2004) that is able to interact with its family member NAP1L2 (Attia et al., 2011), a protein identified as interacting partner for GTF2IRD1 in chapter 4.

UBASH3B (ubiquitin associated and SH3 domain containing B) is a ubiquitously expressed transcript that encodes for a protein phosphatase that can target tyrosine- phosphorylated proteins (Tsygankov, 2009). It is critical in the regulation of T-cell activation and it has been genetically associated to Behcet’s disease, a chronic-systemic inflammatory disorder (Carpino et al., 2004; Fei et al., 2009).

As shown in figure 5.6, all three genes that appeared commonly dysregulated in both microarray analyses, present a very subtle downregulation (-1.2 to -1.3 folds).

Quantitation by qRT-PCR of Ubash3b transcript in the Gtf2ird1-/- striatum showed no change in the folds of expression, and also the ΔCT value difference was not significant between knock out and wild type mice. Being GTF2IRD1 a protein that functions as transcriptional repressor, it can be possible that downregulated genes may not be actual

163 targets for direct regulation mediated by GTF2IRD1, but rather may be a secondary consequence of the alteration of GTF2IRD1 gene dosage.

Regarding GTF2IRD1 targets in the brain that could allow establishing a connection between genotype and phenotype, previous published studies have not yet provided with consistent information that could be extrapolated to consider potential gene candidates for GTF2IRD1 regulation. A study attempted to measure the expression of previously identified gene targets in the developing and adult brain and MEFs from

Gtf2ird1 knock out mice (O'Leary and Osborne, 2011), but no changes were detectable.

Other wide-genome expression profile studies have not shown overlapping of differentially expressed genes across tissues in mouse models for Gtf2ird1 haploinsufficiency (Masuda et al., 2014; O'Leary and Osborne, 2011; Widagdo, 2011).

These cases also showed just a reduced number of dysregulated genes, which is seen in the adult mouse retina where only 21 annotated genes are differentially expressed, where Gtf2ird1 has a function in cones and rods regulations (Masuda et al., 2014).

Our data, taken altogether with the comparisons made previously with the published literature, confirm that GTF2IRD1 acts regulating gene expression mainly in a tissue specific manner. But it is also suggested that developmental stages also can condition its function. A study in mice overexpressing GTF2IRD1 in skeletal muscle

(Issa et al., 2006), showed specific repression of slow fibre genes in the soleus tissue, such as TnIslow, TnTslow, TnCslow/cardiac, MLC1slow and α-tropomyosin. The phenotypic consequence occurs at the tissue patterning level, leading to a loss of slow fibres. It is remarkable that this is not found in early stages of life as the slow fibres still developed normally in the transgenic mice, but undergo gradual postnatal fibres

164 conversion. Thus, these data represents an example of a phenotype in which GTF2IRD1 is the responsible factor, but there is a tight window where its regulation occurs.

It can be then predicted that the tissue, stage or cellular subtype restriction of the gene regulation mediated by GTF2IRD1, may be also regulated upon presence of other cofactors, and then the main mechanism for gene regulation may be by acting through bigger regulatory complexes which may be recruited in a specific manner, depending on specific situations.

It is known that GTF2IRD1 needs at least two consensus binding sequences to be able to bind to DNA (Palmer et al., 2010). But it is contradictory that several candidate targets or dysregulated genes found in other studies (Calvo et al., 2001;

O'Leary and Osborne, 2011; O'Mahoney et al., 1998b), contain just one DNA consensus sequence or simply they lack it. This observation builds more evidence for favouring the involvement of GTF2IRD1 in highly dynamic regulatory protein complexes and/or with chromatin remodelling factors that enable GTF2IRD1 to reach its targets, despite not satisfying all the DNA consensus requirements.

165

Figure 5.6 Venn diagram of genes with a minimal fold change of ±1.2 in both microarray analyses.

The dysregulated gene lists obtained from the transcript analyses in HeLa cells treated with siRNA (GTF2IRD1 knock down) and striatum tissue from Gtf2ird1 knock out mice were compared. From a total of 627 dysregulated genes in the HeLa cell samples and 177 genes from the striatum samples, only three genes were found to be commonly dysregulated in both datasets, the identities of which are shown in the box together with the fold change value.

166

CHAPTER 6 - UNRAVELLING EPIGENETIC COMPLEXES ASSOCIATED WITH GTF2IRD1

167

6.1 Introduction

No previous studies involving the purification and characterisation of epigenetic multi-protein complexes have been found to include GTF2IRD1 as a member.

However, several reasons lead to the hypothesis that GTF2IRD1 may interact with such complexes in a transient and dynamic manner. Firstly, the majority of the protein partners identified in the yeast 2-hybrid screen belonged to chromatin remodelling complexes (see chapter 4), but many of these interactions mapped to the SUMO-ligation region of the GTF2IRD1 protein, suggesting that these interactions are regulated by

GTF2IRD1 SUMOylation. Furthermore, the lack of proteomics evidence from classical purification protocols to support a role in these complexes indicates that these interactions may be difficult to capture because they are transient and dynamic.

In general proteome-wide analyses for the identification and characterisation of the members of a complex (usually by mass spectrometry) involve, as a first step, the immunoprecipitation of one protein of interest and coimmunoprecipitation of the partner proteins will lead to the identification of peptides corresponding to these different subunits (Chandramouli and Qian, 2009). The validation of the antibody against human GTF2IRD1, 333A, presented earlier (see chapter 3, section 3.2.1), indicated that such studies might be feasible starting with GTF2IRD1. However, attempts in our laboratory to coimmunoprecipitate partner proteins using this antibody have failed to identify such endogenous interactions, despite the fact that endogenous

GTF2IRD1 can be immunoprecipitated effectively (figure 3.1 and data not shown). The relatively low abundance of the GTF2IRD1 protein in the human cell types analysed may provide some of the explanation for this result but it may also indicate that

GTF2IRD1 interactions in the nucleus are extremely transient and unstable.

168

Therefore, another approach was chosen to test the interactional network of

GTF2IRD1, in order to position this protein into distinct functional complexes. The proximity ligation assay (PLA) is a novel technique that allows the in situ study of endogenous interactions in a visual and quantitative manner. In this chapter, GTF2IRD1 interactions with members of histone deacetylase (HDAC) complexes are studied in both HeLa cells and hESC-derived neurons using PLA.

HDACs are enzymes responsible for the removal of acetyl groups from the lysine residues of the histone tails that project beyond the histone core; a process that is strongly associated with gene repression (de Ruijter et al., 2003). HDACs are divided into three different classes, with the class I (HDACs 1, 2, 3 and 8) being the most relevant for the studies presented herein.

Class I HDAC-containing complexes have multiple subunits that are exchangeable but usually comprise histone binding proteins (i.e. RBAP46), recruiters that bind to DNA (like MBD3 or HP1), nuclear hormone receptor binding proteins, enzymes with remodelling activity, intermediators (i.e. MeCP1) and several other proteins that are variably present. Depending on the cofactors recruited, the complexes formed include Sin3, Co-REST, NuRD, N-CoR, SMRT and MeCP1, amongst others

(de Ruijter et al., 2003; Feng and Zhang, 2001; Gallinari et al., 2007; Verdone et al.,

2006). The first three listed are the major HDAC1/2 complexes and they are well characterised. The cores of the Sin3 and NuRD complexes are composed of HDAC1/2 and RBAP46/48, while the Co-REST complex has an essential requirement for

HDAC1/2, the Co-REST protein and LSD1. HDAC3 is generally associated with the

SMRT and N-CoR complexes (de Ruijter et al., 2003; You et al., 2001).

169

Previous reports support a role for GTF2IRD1 in the N-CoR complex, as it can interact directly with N-CoR protein (Polly et al., 2003a) and HDAC3 has been found to bind both TFII-I and GTF2IRD1 (Tussie-Luna et al., 2002a).

One of the most interesting aspects of these structures is that their formation is highly dynamic and several subunits can be part of several different complexes. Histone deacetylase complexes can also act together with histone methylation regulators. An example is that some proteins from the NuRD complex (RBAP46/48) can also be recruited to the Polycomb Repressive Complex 2 (PCR2), which mediates H3K9 and

H3K27 methylation, with one of its key components being the protein EZH2 (enhancer of zeste 2 polycomb repressive complex 2 subunit) (Kuzmichev et al., 2002).

Work described in this chapter shows that endogenous interactions between

GTF2IRD1 and HDAC1/2 can be demonstrated in both HeLa cells and hESC-derived neurons using PLA. The occurrence of these complexes was also assessed by PLA and they appeared to be less abundant than the HDACs core complexes. Furthermore, overexpression of GTF2IRD1 in HeLa cells was found to decrease the core complex assembly and affected the detectable level of HDAC enzymatic activity. Hence, these studies provide a visual, biochemical and functional account of the involvement of

GTF2IRD1 in chromatin remodelling complexes.

6.2 Results

6.2.1 Expansion of GTF2IRD1 protein network with MBD and HDAC proteins

The results presented in chapter 4 provide a novel range of protein partners for

GTF2IRD1. The new GTF2IRD1 interactional network that has resulted (chapter 4,

170 figure 4.7), has expanded the number of proposed binding partners to an extraordinary degree and provided a solid grounding for the pursuit of GTF2IRD1 function in the context of these interactions. One of the strongest messages to emerge from that study was the fact that most of these novel proteins are known to be part of chromatin modifying complexes or are related to epigenetic regulation, such as ZMYM2,

ZMYM3, MBD3L1, SETD6 and ATF7IP amongst others. It was also reassuring to see that some of these interactions support previously observed associations with conserved family members, such as the co-purification of TFII-I and ZMYM2 and 3 in complexes that contain HDAC 1 and 2 (Hakimi et al., 2003).

Since protein interactions in protein complexes often occur at many points between multiple members of its subunits, it seemed reasonable to suppose that

GTF2IRD1 may also engage in direct interactions with other common members of chromatin modifying complexes that may have been missed in the Y2H screens. Since the tools had already been developed to interrogate GTF2IRD1 binding using the Y2H and mammalian expression systems, it was an easy matter to address this hypothesis for selected candidate proteins by cloning or obtaining yeast and mammalian expression vectors containing their coding sequences.

ATF7IP is a transcriptional regulator involved in heterochromatin formation that functions with MBD1 (Methyl-CpG-binding domain protein 1) (Ichimura et al., 2005), forms part of the MeCP1 complex and is able to interact with class I HDACs, PIAS1

(which also binds to GTF2IRD1; see chapter 4, table 4.1) and HP1α (Cross et al., 1997;

Fujita et al., 2003b; Lyst et al., 2006). MBD3L1 (methyl-CpG binding domain protein

3-like 1), a GTF2IRD1 partner identified in the Y2H screen (chapter 4), is a testis- specific member of the MeCP1 or NuRD complexes, which also contains its family member MBD2 (methyl-CpG binding domain protein 2) (Jiang et al., 2004; Ng et al., 171

1999; Zhang et al., 1999). On the candidate basis described above and the possibility of conserved interactions between protein family members, MBD1 and MBD2 were selected for a candidate Y2H test with GTF2IRD1 (see chapter 2, table 2.3 for details of

MBD1, MBD2 and GTF2IRD1 constructs). Yeast cotransformed with MBD1 and

GTF2IRD1, or MBD2 and GTF2IRD1 containing constructs, were both able to grow on high stringency QDO plates, thus confirming a positive direct interaction between these proteins in this system (figure 6.1A). Empty vector cotransformations were included as negative controls, which showed no growth on the QDO plates.

These results, together with the prior report of a direct interaction in vitro between HDAC3 and GTF2IRD1 (Tussie-Luna et al., 2002a), prompted us to ask whether GTF2IRD1 could also directly interact with HDAC 1 and 2, which form the basis of the complexes in which ZMYM2, ZMYM3 and TFII-I were found (Hakimi et al., 2003). These HDACs form the core of the well-characterised MeCP1, NuRD and

CoREST complexes and a positive result would provide strong additional clues to

GTF2IRD1 function. Therefore, plasmids encoding Myc-tagged human HDAC1 and

HDAC2 were constructed and cotransfected into HeLa cells with GTF2IRD1-EGFP.

Coimmunoprecipitation studies from lysates of these cells, using an anti-Myc antibody, confirmed that GTF2IRD1 is also capable of binding directly to HDAC1 and 2 in this system (figure 6.1B).

172

Figure 6.1 GTF2IRD1 interacts with MBD and HDAC proteins.

(A) Image of yeast colonies grown from cells cotransformed with the human bait plasmid for GTF2IRD1 and the prey plasmids containing MBD1 and MBD2.

Cotransformed yeast were grown on QDO plates with alpha-galactosidase (blue colour).

The negative control involves cotransformation with the prey plasmids and the empty pGBKT7 plasmid. (B) Western blot of coimmunoprecipitation experiments showing interactions between GTF2IRD1 and HDACs in vitro. HeLa cells were transfected with

GTF2IRD1-EGFP and HDAC1 or HDAC2-Myc plasmids. GTF2IRD1 was immunoprecipitated (IP) using the anti-GFP antibody and coimmunoprecipitated (CoIP) proteins were inmmunobloted (IB) with the anti-Myc antibody.

173

6.2.2 Visualisation of direct protein interactions using the proximity ligation assay

6.2.2.1 Defining interactions between GTF2IRD1 and members of the HDAC complexes

The proximity ligation assay (PLA) is a technique that can precisely detect close proximity between two proteins. Two primary antibodies, one for each protein under test are applied, followed by the addition of oligonucleotide-conjugated secondary antibodies. Under circumstances where both epitopes are in close proximity (~10 nm), the next reaction can take place, which involves ligation of the probes and rolling circle amplification with incorporation of a fluorophore (Soderberg et al., 2006; Soderberg et al., 2008), thus forming a fluorescent dot at the site where the proximity occurs.

Therefore PLA permits an in situ quantitative visual account of close protein proximity at a snapshot in time that can be used for the study of endogenous protein-protein interactions. The positive fluorescent dots can be automatedly counted, and their total number represents the specific detection of a certain protein-protein interaction inside the cell.

Given the accuracy of the anti-GTF2IRD1 333A antibody (see chapter 3) and the punctate distribution of the endogenous protein in the nucleus, PLA was chosen as a good potential method to observe the nature and the extent of GTF2IRD1 interactions with selected proteins involved in HDAC complexes. PLA analyses were carried out, in parallel, in HeLa cells and in cultured differentiating neurons derived from human embryonic stem cells (hESCs) (obtained as detailed in Materials and Methods) using the

333A anti-GTF2IRD1 antibody, along with antibodies against HDAC1, HDAC2,

EZH2, LSD1 and MBD1 which were selected upon their feasibility (in terms

174 specificity, validation level or availability) for these experiments amongst a range of antibodies for several subunits of HDACs remodelling complexes.

Antibodies against HDAC1 and HDAC2 were first tested in conventional co- immunofluorescence with GTF2IRD1 to analyse their relative overlap in HeLa cells

(figure 6.2A). Then PLA reactions were performed using anti-GTF2IRD1 with anti-

HDAC1 or HDAC2 antibodies Positive PLA signals were found for both

GTF2IRD1/HDAC1 and GTF2IRD1/HDAC2 (figure 6.2B). The abundant nuclear distribution of both GTF2IRD1 and the HDACs can be contrasted with the much fewer foci of positive PLA reaction, represented by scattered fluorescent speckles (figure 6.2A and B). PLA negative controls were conducted by the incubation of only one of the primary antibodies, followed by both secondary antibodies and subsequent PLA reactions. Very low levels of signal were found in all cases (figure 6.2B).

This experiment was then repeated in differentiating neurons derived from hESCs, which led to a similar set of findings (figure 6.3). Positive PLA foci were automatedly counted using the plugin Analyse particle from Image J software in at least

30 nuclei from different fields, sampled by selection on the DAPI wavelength, in duplicate experiments for both cell types analysed. These data were compared with the negative controls for each antibody used and the results are expressed in terms of dots per nucleus (figure 6.4).

Due to their relationship with confirmed protein partners of GTF2IRD1 or their involvement in HDAC complexes, the proteins MBD1, EZH2 and LSD1 were selected in order to assess a putative interaction with GTF2IRD1. EZH2, a component of the

PCR2 complex that mediates H3K9 and H3K27 methylation (Kuzmichev et al., 2002), was selected as means to asses these histone modifications with GTF2IRD1 protein

175 interactions. On the other side, LSD1 was selected as it is an important demethylase that relieves H3K4me and H3K9me marks and it is part of several HDAC complexes

(Hakimi et al., 2002; Hakimi et al., 2003; Metzger et al., 2005; Rudolph et al., 2013).

Antibodies against these proteins were first tested for immunofluorescence applications in both HeLa cells (figure 6.5A) and hESCs-derived neurons (not shown) and then the

PLA was performed in the two cell types. No positive PLA signal was seen for these three proteins (figure 6.5B). These data indicate that GTF2IRD1 interacts with HDAC1 and HDAC2 at the endogenous level, but any interactions with the proteins EZH2,

LSD1 and MBD1 are below detectable levels.

176

Figure 6.2 Visualisation of interactions between GTF2IRD1 and HDAC1/2 in

HeLa cells.

(A) Coimmunofluorescence of endogenous proteins using antibodies against

GTF2IRD1 and HDACs 1 and 2, showing their nuclear pattern of distribution. (B)

Proximity ligation assays (PLAs) in HeLa cells showing the interactions between

GTF2IRD1 and HDAC1 or HDAC2 using the same antibodies used in the immunofluorescence analysis. The red fluorescent dots represent a positive PLA reaction. The negative controls use one primary antibody only (as indicated) with the same two secondary antibody probes used in the first two rows.

177

Figure 6.3 Interactions between GTF2IRD1 and HDAC1/2 in hESCs-derived neurons.

(A) Endogenous immunofluorescence analysis using anti-HDAC1 and anti-HDAC2 antibodies. (B) Proximity ligation assays (PLAs) in neurons showing the interactions between GTF2IRD1 and HDAC1 or HDAC2. The red fluorescent dots represent a positive PLA reaction. The negative controls use one primary antibody only (as indicated) with the same two secondary antibody probes used in the first two rows.

178

Figure 6.4 Quantitation of the PLA interactions between GTF2IRD1 and HDACs.

The positive PLA dots were automatedly counted using Image J software for the positive interactions between GTF2IRD1 and HDAC1 or 2, and also for the controls of each primary antibody used in these assays. The results correspond to at least 30 DAPI- positive cells counted in two independent experiments, and are presented in dots per cell in the samples from (A) HeLa cells and (B) hESCs-derived neurons (**P ≤ 0.01, *** P

≤ 0.001, **** P ≤ 0.0001).

179

Figure 6.5 Negative GTF2IRD1 PLA interactions.

(A) Immunofluorescence analysis in HeLa cells showing the distribution of the proteins

EZH2, LSD1 and MBD1, as detected by the antibodies used in PLA experiments. (B)

Representative images of PLA analysis, performed for studying potential interactions between GTF2IRD1 and EZH2, LSD1 and MBD1. Negative control experiments included just one primary antibody with both secondary antibodies (Control). None of the conditions showed the formation of positive PLA dots for the three proteins analysed.

180

6.2.2.2 Comparison of the occurrence of HDACs core complexes and GTF2IRD1 interactions.

A valid question regarding the interactions discovered between GTF2IRD1 and the HDAC1 and 2 proteins is; why has GTF2IRD1 never been isolated before in proteomic approaches, such as mass spectrometry, in these well characterised complexes? One might speculate that such interactions may be much less frequent because of the very low abundance of GTF2IRD1 and, secondly, the interactions may be much more dynamic and unstable, and therefore less resistant to dissociation during purification procedures. In order to visualise the level of GTF2IRD1 interactions with

HDAC1 or HDAC2 alongside some of the stable protein interactions of the core subunits of HDAC complexes, PLA was used again to compare with the extent of the interactions between HDAC1 or 2 and RBAP46, and between HDAC1 or 2 and LSD1.

The distribution of the HDAC-associated proteins LSD1 and RBAP46 were compared by immunofluorescence with GTF2IRD1 (figure 6.6A). Analysis by PLA showed that the occurrence of the core complex protein pairs is much higher than the interactions of

HDACs 1 and 2 with GTF2IRD1 (figure 6.6B). Qualitatively, the HDAC2-LSD1 interaction appears to be the most abundant, followed by HDAC2-RBAP46. By comparison, interactions between GTF2IRD1 and the HDACs are just a fraction of these levels.

181

Figure 6.6 Comparison of the levels of HDAC interaction by PLA.

(A) Immunofluorescence in HeLa cells using antibodies against LSD1, RBAP46 and

GTF2IRD1, showing their distribution pattern in the nucleus. (B) Representative images of PLA experiments comparing the number of occurrences of the interactions between

HDAC1/2 and RBAP46, LSD1 and GTF2IRD1. 182

6.2.3 Functional consequences of GTF2IRD1 binding to HDACs

6.2.3.1 Effect of GTF2IRD1 on HDAC enzymatic activity.

The new interactions found between GTF2IRD1 and HDAC1 and 2 in both

HeLa cells and hESC-derived neurons, led to the question; is GTF2IRD1 causing a functional effect on HDAC activity through its direct binding? Therefore, experiments were designed to test whether GTF2IRD1 can alter HDAC activity in HeLa cells.

HDACs remove acetylated lysine residues from core histones in order to regulate chromatin remodelling. In order to quantitate the effect of GTF2IRD1 on

HDAC activity, HeLa cells were transiently transfected with a plasmid encoding

GTF2IRD1-EGFP. The negative control experiment utilised an empty pEGFP vector instead. These tagged plasmid constructs were chosen as they had the benefit of allowing an assessment of transfection efficiency through direct visualisation of EGFP in the cells when they are in culture.

24 hours post-transfection, cells were lysed, nuclear extracts were prepared and then HDAC activity was measured using a fluorometric assay (see chapter 2, section

2.4.6). This assay determines, within a set time, an end-point measurement for total

HDAC activity using a fluorophore-labelled acetylated peptide, which serves as a substrate for all the HDAC enzymes present in the lysate sample. HDACs deacetylate the substrate fluorophore-labelled peptide, which is cleaved by a developer solution, and thus the final fluorescence can be measured. Control reactions included a positive crude

HDAC extract, a negative control sample treated with TSA, (HDAC inhibitor) and blank reactions that contain only the buffer in which the nuclear extracts were prepared.

Fluorescence intensity was measured and HDAC activity was expressed as the

183 difference between the fluorescence measured in the sample and the blank (no enzyme control) per 100 µg of protein (figure 6.7A).

Constitutive expression of GTF2IRD1-EGFP in HeLa cells led to an increase of

37.5% in total HDAC activity over the expression of EGFP only (figure 6.7B), indicating that GTF2IRD1 can not only bind to the HDACs but can also affect total

HDAC enzymatic activity.

184

Figure 6.7 GTF2IRD1 over-expression increases histone deacetylase activity.

HeLa cells were transiently transfected with pGTF2IRD1-EGFP or pEGFP vector control. 24 hours post-transfection, total histone deacetylase (HDAC) activity was measured in a fluorometric assay from both cell lysates and expressed as (A) HDAC activity per 100 µg of protein or (B) the percentage increase found in GTF2IRD1 cell lysates when the GFP control cell lysates are set at 100% basal HDAC activity (n=6, * indicates a p value of ≤0.05).

185

6.2.3.2 Effect of GTF2IRD1 over complex assembly

To investigate whether this effect of GTF2IRD1 over total histone deacetylase enzymatic activity could be caused by an impact on HDAC complex assembly, PLA experiments were performed to assess the level of interaction between the core subunits under GTF2IRD1 knock down conditions.

HeLa cells were transiently transfected, in duplicate, with siRNAs targeted against GTF2IRD1 or scrambled controls. 48 hours post-transfection, cells were fixed and PLAs were performed to assess the levels of HDAC core complex assembly. The antibody pairs used were; HDAC1/RBAP46, HDAC2/RBAP46, HDAC1/LSD and

HDAC2/LSD1 (figure 6.8 A and B). The dots were automatedly counted for every condition and PLA pair, in at least 30 cells from different fields in duplicate experiments. A quantitative comparison between the PLA signals obtained for

GTF2IRD1 knock down versus scrambled siRNA showed a significant decrease in the interaction between HDAC1 and RBAP46, which is equivalent to 30% less interactions than the control condition (figure 6.8C). The other protein interactions analysed

(HDAC2/RBAP46, HDAC1/LSD and HDAC2/LSD1) did not show significant changes in the total number of PLA dots when transfected either with the GTF2IRD1 siRNA or the control.

186

Figure 6.8 Effect of GTF2IRD1 knock down on the assembly of HDACs complexes.

HeLa cells were transfected with siRNA to knock down GTF2IRD1 expression (siRNA

GTF2IRD1) or with scrambled siRNA (Control) and PLA analyses were performed for the protein pairs: (A) HDAC1/RBAP46 and HDAC2/RBAP46; (B) HDAC1/LSD1 and

HDAC2/LSD1. (C) Positive PLA dots were counted in at least 30 cells for each condition (GTF2IRD1 siRNA or control siRNA) for each PLA pair (n=2, ** p value

≤0.01). 187

6.3 Discussion

The evidence presented in the previous chapters contributed to a model in which

GTF2IRD1 functions to regulate gene transcription through complex formation with partner proteins, many of which were found to be involved in chromatin remodelling.

The work described in this chapter aimed to consolidate these findings by exploring potential candidate interactions with other key components of the chromatin remodelling machinery, and by beginning to explore the functional impact of

GTF2IRD1 on their activity.

Using the Y2H system, interactions between GTF2IRD1 and two methyl-CpG binding domain proteins (MBD1 and MBD2) were found. This analysis was based on the discovery of an interaction between GTF2IRD1 and another member of the methyl-

CpG binding domain family, MBD3L1. This serves to illustrate, once again, that

GTF2IRD1 interactions may be conserved across paralogues protein families, such as those found for the ZMYM family members (see chapter 4). However, time did not allow the interactions with MBD1 and MBD2 to be validated in mammalian cells, which would be required in future experiments.

To integrate the interactions discovered in this chapter with those presented in chapter 4, from the sequencing results of the isolated clones in the Y2H screenings it is possible to identify the protein region that contain the domain required for interaction with GTF2IRD1. In the case of the MBD3L1 clone identified (interaction presented in chapter 4), it contains the aminoacids 1-126, region that includes the transcription repressor domain of the protein and is well conserved across species. An aminoacidic comparison of the MBD3L1 interacting domain showed around 40% homology with the methyl-CpG-binding domain proteins MBD2 and MBD3, although this does not map

188 with a known domain for these proteins. Given the homology levels shared between the methyl-CpG-binding domain proteins it can be hypothesised that GTF2IRD1 could be able to interact with other family members that are also involved in similar regulatory complexes such as MBD3 or MeCP2. For the ATF7IP isolated clone in chapter 4, the sequenced region that interacts with GTF2IRD1 contains the aminoacids 779-1063, where also lies the interaction domain with SETDB1 and SUMOylation motif. A comparison was also done for the domain sequenced in ATF7IP protein, but no homology was found for this region and other proteins (domain information and alignment analyses retrieved from The Universal Protein Resource, UniProt database).

In this chapter, new interactions between GTF2IRD1 and HDAC1 and HDAC2 were established by coimmunoprecipitation in mammalian cells, which adds to the reported interaction between GTF2IRD1 and HDAC3 (Tussie-Luna et al., 2002a). This means that three out of the four class I HDACs can interact with GTF2IRD1. Currently, there is no information on GTF2IRD1 binding to HDACs belonging to the other classes.

It is unclear why these proteins did not appear in the initial Y2H screens (see chapter 4). Both human brain and universal mouse libraries used were normalised, which is intended to compensate for biases in transcript abundance and should increase the relative amount of clones derived from genes that are expressed at low levels.

Nonetheless, it is known that these libraries contain cDNA inserts that are highly biased towards the 3’ ends of the mRNAs, due to construction methods based on cDNA synthesis from oligo-dT primers, which bind to the polyA tail. The efficiency of the cDNA synthesis can be quite variable and if the mRNA has a long 3’ untranslated region, it is possible that peptides representing the amino terminal region of the protein, or sometimes the entire protein may be missing. GTF2IRD1 interactions were assessed in an endogenous context in both HeLa cells and hESCs-derived neurons using 189

PLA for HDACs 1 and 2, MBD1, EZH2 and LSD1. Interactions were confirmed for

HDAC1 and 2 in both the in vitro models used but no evidence was found for a direct interaction between GTF2IRD1 and some other common members of HDAC complexes; MBD1, EZH2 and LSD1. This was in spite of the fact that a candidate Y2H assay had previously established the potential for GTF2IRD1 to bind directly to MBD1.

This discrepancy could be explained in several different ways. Firstly, MBD1 may be present in lower abundance that the HDACs, making the interaction more challenging to detect by PLA. Alternatively, complex assembly may be stimulated by signalling that activates the gene silencing complex, which is present at only low levels during basal conditions. Alternatively, this result could indicate that GTF2IRD1 does not generally form a major part of HDAC complexes that involve MBD1, like MeCP1, HDAC3-

MBD1 or Suv39h1-HP1α (Cross et al., 1997; Fujita et al., 2003b; Villa et al., 2006). A final possibility is that GTF2IRD1 only associates with HDAC proteins when they are disassociated from other complex subunits. However, this possibility does not seem likely as HDACs are reported to be inactive when they are in an isolated state (de

Ruijter et al., 2003) and the addition of GTF2IRD1 had a positive effect on HDAC enzymatic activity. This result implies the opposite result; that GTF2IRD1 enhances

HDAC complex assembly and activity of the complex.

In spite of the intense attention that HDACs have received, it is perhaps surprising that the number of new HDAC-associated proteins continues to grow. This has led to the old concept of a fixed stable HDAC complex being replaced by a model in which dynamic assembly of proteins comes together in a much more fluid and variable way (Meier and Brehm, 2014). This could explain why GTF2IRD1 has not previously been isolated in HDAC complexes using classic proteomics tools. If

GTF2IRD1 is also only recruited to a small proportion of these complexes due to its low

190 abundance and tissue specificity, that would also make it difficult to identify in standard proteomics systems. This idea was supported by the PLA comparison of interactions between HDACs and GTF2IRD1 relative to HDACs and the core protein RBAP46 or the Co-REST subunit LSD1. As initially thought, the PLA dots for HDAC1/GTF2IRD1 and HDAC2/GTF2IRD1 were less abundant than the dots formed by the complexes

HDACs/RBAP46 or LSD1.

After establishing the association with HDACs 1 and 2, a functional effect, mediated by GTF2IRD1, over these complexes was sought. Over-expression of

GTF2IRD1 in HeLa cells produced a 37.5% increase in the total HDAC activity compared to the effect of the empty pEGFP vector, which controlled for any transfection-related effects that can alter the enzymatic activity of HDACs in the samples. It is unclear whether these results are due to the effect of GTF2IRD1 over a specific HDAC, since the assay is able to quantitate all class I HDACs (1, 2, 3 and 8). It is assumed that the cell types used will have different relative amounts of each of the 4 class I HDACs and it is currently known that GTF2IRD1 interacts with at least 3 of those (HDACs1-3). Therefore, it is possible that the 37% increase in HDAC activity could be the sum of the activities from all 3 of these proteins. However, observations from the PLA indicated that the interaction between GTF2IRD1 and HDAC2 was more abundant than with HDAC1. In fact, HDAC2 is the most abundant of the class I

HDACs in most tissues, including brain (reviewed by de Ruijter et al. (2003)).

Therefore, it is reasonable to suggest that a large part of the increase in activity could be accounted for by an increase in activity of HDAC2 and this would be the logical first choice for proteins to pursue in further functional investigations.

TFII-I has also been shown to increase HDAC activity in vitro and it is able to interact with HDAC3, but also with HDAC1 and HDAC2 in a weak manner (Tussie- 191

Luna et al., 2002a). It was shown that the interaction of TFII-I with HDAC3 represses the activation effect of TFII-I on the c-fos promoter (Tussie-Luna et al., 2002a). In this report, it was also mentioned that GTF2IRD1 protein has a greater affinity for HDAC3 than TFII-I. This was not corroborated in this work but would also form an interesting line to pursue in future investigations.

The impact of GTF2IRD1 on HDAC activity led to the question of whether this effect could be due to a change in the formation of the other complex subunits, which could be easily assessed by PLA. The interactions between HDAC 1/2 and RBAP46

(which represent the core of the NuRD and Sin3 complexes) and between HDAC 1/2 and LSD1 (part of the Co-REST complex) were assessed by PLA in samples transfected with siRNA against GTF2IRD1. No difference was observed for three of the protein pairs tested (HDAC2/RBAP46, HDAC1/LSD1 and HDAC2/LSD1). However, in the case of the interaction between HDAC1 and RBAP46, there was a reduction of 30% in the number of PLA dots in the GTF2IRD1 siRNA samples. This result suggests that the core complex formed by the subunits HDAC1 and RBAP46 is impaired by GTF2IRD1 deficiency. On the other hand, the recruitment of LSD1 to this structure, which is an important component of the Co-REST complex, were not influenced by GTF2IRD1 and neither were complexes involving HDAC2 and RBAP46, which might be taken to argue against HDAC2 being of primary importance in GTF2IRD1 influenced HDAC activity.

This effect on HDAC complex assembly could have additional impacts on functional associations with other secondary members of the complexes, impairing or enhancing their assembly or influencing their recruitment to sites in the genome and thus enhances the concept of a model in which GTF2IRD1 functions as a transcriptional regulator in a “secondary” role, by organizing the recruitment of specific complexes in order to alter gene regulation via chromatin regulation. 192

The interactions between GTF2IRD1 and the HDACs, and the MBD and

ZMYM family members (see chapter 4), may implicate GTF2IRD1 in several different chromatin regulators, including the Co-REST, NuRD, N-CoR and MeCP1 complexes.

These types of connections fit with the current understanding of the way similar proteins tend to behave, with respect to their multiple binding capabilities (Khan et al.,

2001), and the knowledge that the different HDAC complexes act together (Narlikar et al., 2002). These ideas reinforce the concept of a highly dynamic and adaptable system for chromatin remodelling that has displaced the more rigid concept of highly stable

HDAC machines that traverse the nucleus performing their task in an isolated fashion

(Kelly and Cowley, 2013; Meier and Brehm, 2014).

The data presented in this chapter support an epigenetic role for GTF2IRD1 and pinpoint its involvement in histone deacetylation. However further studies will be required in order to refine the additional subunits that cooperate with GTF2IRD1 in this function and to understand how these assemblies are brought to bear on genomic function.

193

CHAPTER 7 - GENERAL DISCUSSION

194

7.1 Overview

Genotype-phenotype correlations in patients with atypical deletions of the WBS critical region have implicated GTF2IRD1, and its family member TFII-I, in the neuropathology of WBS. Current molecular data and analyses of the phenotypes observed in mouse lines with monogenic mutations of these genes, suggest that the functions of these two related proteins are not redundant. While there is much evidence to suggest that TFII-I and GTF2IRD1 share many attributes, which would be expected given the level of sequence homology, the molecular role of GTF2IRD1 remains poorly understood. The aim of the work described in this thesis was to interrogate GTF2IRD1 function using methods that make very few a priori assumptions, in order to provide novel insights into its cellular role that will ultimately lead to a better understanding of

WBS pathology and to comprehend aspects of the genetic control of normal human development and behaviour. The key findings from this work are discussed below.

7.1.1 GTF2IRD1 is a nuclear speckling protein

The study of GTF2IRD1 has been hampered by the lack of specific antibodies that can unequivocally detect the protein in different contexts and for different experimental purposes. As shown in chapter 3, we validated an antibody that recognises human GTF2IRD1 and works effectively to immunoprecipitate the protein without cross reacting with other TFII-I family members. This antibody was utilised to unequivocally detect endogenous GTF2IRD1 protein in several human immortalised cell lines and also in hESC-derived neurons, demonstrating that it is localised exclusively in the nucleus with a specific speckled pattern of distribution. Analysis of the neuronal cultures, which were driven down the cortical neuron pathway of

195 differentiation, also indicated that GTF2IRD1 protein is not only detectable in the maturing differentiated neurons but also in the undifferentiated neural progenitors.

7.1.2 Effect of GTF2IRD1 on gene regulation

Chapter 5 describes microarray analyses that were conducted in HeLa cells, in which GTF2IRD1 was knocked down using siRNA, and in corpus striatum samples from Gtf2ird1-/- knock out mice. The knock down of GTF2IRD1 in HeLa cells resulted in subtle changes in the transcriptional profile, and the genes that are dysregulated did not overlap with those that are differentially expressed in the striatum analysis. The results from the transcriptional profile of the striatum of Gtf2ird1-/- mice, demonstrated increased expression of genes involved in neuronal development and increased levels of a cluster of immediate-early response genes that are associated with hyperactivity,

ADHD and the response to psychostimulants. These data hold potential clues to the molecular basis of the phenotypes observed in Gtf2ird1 knock out mice, including the altered exploratory motor behaviour (Howard et al., 2012), and to the basis of several aspects of WBS neuropathology.

A previous study on the transcript profile in brain tissue extracted from Gtf2ird1 knock out mice, failed to identify any genes that were differentially expressed to significant levels (O'Leary and Osborne, 2011). Put together with the microarray analyses reported here, these data seem difficult to assemble into a coherent picture. The general view is that GTF2IRD1 is a conventional transcription factor but in experiments where GTF2IRD1 is removed from cells, the transcript profile either does not change to significant levels or the sets of genes that are changed are not consistent between studies. One possible explanation for this is that GTF2IRD1 action is more complex

196 than originally conceived and is influenced by a cell context dependency. If GTF2IRD1 action is dependent upon, as an example, the activity of cell signalling pathways or coactivity with other nuclear proteins, then this would influence the outcome of knock down or knock out experiments and cause the consequences of GTF2IRD1 loss to vary in different cell types. In these circumstances, it becomes much more important to understand the nature of such potential interactions, in order to comprehend the molecular mechanisms that could underpin context dependencies. This hypothesis formed a major driver to establish a set of protein interaction partners for GTF2IRD1.

7.1.3 Novel interacting partners of GTF2IRD1

In the work described in chapter 4, yeast two-hybrid screenings were used as an unbiased approach to identify novel protein partners of GTF2IRD1. An important advantage of this experimental approach is that yeast two-hybrid methodology is sensitive and can detect both transient and stable protein interactions. Also the two cDNA libraries that were used were normalized for transcript abundance and therefore, this should increase the chance of identifying interactions with low abundance proteins, as well as the more common interactions.

36 novel putative binding partners were identified, most of them being reconfirmed in yeast by transformation into haploid AH109 cells, and many of the most promising leads were validated by coimmunoprecipitation in mammalian cells. Most of the proteins are involved in chromatin modification and transcriptional regulation.

Groupings of several proteins also indicate integration of signalling via components of the primary cilium/centrosome complex and armadillo repeat-containing factors. The interactions were mapped to different recognisable domains in GTF2IRD1 structures,

197 with RD1 and the SUMO motifs scoring most highly. The sites of protein interaction indicate key features regarding the evolution of GTF2IRD1 and integration with tight post-translational regulation, fitting well with the concept of human disease states caused by copy number variation. The fact that RD1 shows no ability to interact with

DNA (Vullhorst and Buonanno, 2003b) and yet it is one of the commonest sites for protein interaction, suggests that the internal duplication of the RDs is more likely to have been driven by an evolutionary advantage in increasing the number of protein interaction surfaces, rather than anything to do with the constraints of DNA binding. It is difficult to understand the reason why a protein would evolve five independent DNA binding domains, but if the reason was related to increasing the capacity for protein- protein interactions, then this seems easier to comprehend.

The other aspect that emerges from this part of the study is the fact that many proteins interact at the SUMO motif, which suggests that such interactions could be coregulated by SUMOylation. Several of the protein hits that emerged are related to post-translational regulation via SUMOylation or ubiquitination indicating that the levels and the activity of GTF2IRD1 are kept under tight control by the cell. If this is added to the knowledge that GTF2IRD1 transcription is also kept under tight autoregulatory control (Palmer et al., 2010), it leads to the conclusion that maintaining

GTF2IRD1 at the right level and status has important consequences for cell function and this is exactly what would be predicted for a protein that is implicated in diseases associated with copy number variation.

The interactional network of GTF2IRD1 was expanded with the findings described in chapter 6, where the protein interaction studies were followed up by direct assessment of specific interactions with epigenetic factors. Interactions with the methyl binding proteins MBD1 and MBD2 were revealed by yeast two-hybrid analysis, and 198 interactions with HDAC1 and HDAC2 were validated by coimmunoprecipitation and proximity ligation assays (PLA). These proteins bind to some of the protein partners discovered in chapter 4, and they all bind to DNA methylation or histone deacetylation complexes (Cross et al., 1997; de Ruijter et al., 2003; Ng et al., 1999; Sarraf and

Stancheva, 2004).

The interactions between GTF2IRD1 and HDAC1-2 were visualised using PLA, a relatively new technique to identify in situ protein interactions. This experimental approach provides a powerful tool to detect and quantitate protein interactions, regardless of protein abundance as the interaction signal is amplified in situ. The observation that GTF2IRD1 is frequently found in close proximity to HDACs 1 and 2, which opens exciting new hypotheses to explore, but our evidence also clearly indicated that GTF2IRD1 is capable of having a functional effect on HDACs. Overexpression of

GTF2IRD1 in HeLa cells increased HDAC enzymatic activity; while the knock down of

GTF2IRD1 decreased the interaction between the core complex protein partners,

HDAC1 and RBAP46, thus illustrating the mechanism by which GTF2IRD1 may regulate gene expression.

The results from the subcellular localisation, gene expression analyses and protein interaction studies all suggest a role for GTF2IRD1 in gene repression.

GTF2IRD1 colocalises with markers of silent chromatin, absence of GTF2IRD1 from cells of the striatum in mice leads to the upregulation of several genes, including transcriptional regulators and immediate early genes and, finally, GTF2IRD1 has been shown to be capable of binding to multiple novel factors, many of which are involved in chromatin modification, and has been shown to have functional impact on HDAC proteins that mediate gene silencing via histone deacetylation.

199

7.2 Future directions

The finding of an antibody (333A) that can unequivocally detect endogenous human GTF2IRD1 will enable the design of a whole array of new experimental approaches for studying the function of GTF2IRD1 in immortalised cell lines and human samples. In this regard, cellular imaging techniques should be harnessed to continue exploring the correlation between GTF2IRD1 speckles and states of active/inactive chromatin and other nuclear speckle-associated proteins, which would position GTF2IRD1 in specific cellular functions. GTF2IRD1 was also found to be expressed in hESC-derived neurons, differentiating in culture from neural progenitors.

It is known that GTF2IRD1 is expressed broadly and strongly during development and this system could provide a means to model events in vitro that might demonstrate the role of GTF2IRD1 in mature neuron formation. More expression analyses at different stages of differentiation, together with immunofluorescence analysis, chromatin immunoprecipitation (ChIP) studies and further assessment of the roles of the new

GTF2IRD1 protein partners identified in this work, would provide a powerful means to address the role of GTF2IRD1 in the developing nervous system.

The findings reported in this work, combined with pre-existing reports, contribute to a working model in which GTF2IRD1 mediates its transcriptional regulation function through the formation of different dynamic and transient protein complexes that coexist in the nucleus. Work described here indicates that GTF2IRD1 is engaged in a high number of protein interactions, and it was previously reported that

SUMOylation of Lys495 can modulate affinity for its partners (Widagdo et al., 2012).

Different DNA binding and gene regulation studies have failed to identify a consistent set of gene targets and highlight the context/tissue specificity of GTF2IRD1 gene regulation. These issues present a unique set of technical challenges that must take into 200 account the transient nature of interactions and the low abundance of the proteins involved. Imaging techniques such as single molecule and super resolution confocal microscopy can provide a visual account of GTF2IRD1 heteromer formation and deliver a quantitative analysis of the different protein interactions at snapshots in time.

The proximity ligation assay was shown to be a particularly effective tool for studying

GTF2IRD1 behaviour using the antibody 333A. These experiments could be further followed up in a wide range of human samples, derived from patients with WBS or

7q11.23 duplications, and even fixed tissues. Moreover, the effect of post-translational modifications (such as SUMOylation) on these protein interactions, as well as DNA or

RNA binding, could also be pursued with this technique (Jung et al., 2013; Weibrecht et al., 2012).

One of the most interesting findings to emerge from the microarray analysis in striatum tissue from Gtf2ird1 knock out mice, is the upregulation of several immediate early genes, including Fos. The specific involvement of GTF2IRD1 in the regulation of

FOS, and other early response genes, should be assessed as a means to connect the molecular function of GTF2IRD1 in the striatum and other brain regions with the neurobehavioural features seen in WBS.

Additional DNA binding studies can now be pursued using the 333A antibody for ChIP in human samples. However, the possible context/tissue specificity of the target genes that emerge should be kept in mind. Furthermore, due to the identification of novel GTF2IRD1 interacting partners involved in chromatin modification, it will be of particular interest to assess global epigenetic changes arising from GTF2IRD1 dosage levels; i.e. through the analysis of DNA methylation, histone acetylation and histone methylation in mouse lines carrying Gtf2ird1 mutations.

201

Indeed, one of the most exciting results discovered for GTF2IRD1 in this work was its association with epigenetic factors such as HDACs, methyl binding proteins and other members of chromatin modifying complexes. Moreover, GTF2IRD1 was able to modify the enzymatic activity of HDACs and to alter the interaction between two core members of a HDAC complex; HDAC1 and RBAP46. Further studies on these interactions will have broader significance to GTF2IRD1 molecular function by positioning GTF2IRD1 within specific complexes and provide potential means to identify the locations in the genome where these complexes are being recruited. The other TFII-I family members should be included in this characterisation, as it is expected that some of these interactions are conserved with TFII-I and GTF2IRD2 and may involve crosstalk between these related proteins.

7.3 Concluding remarks

The findings of this work, in the context of current knowledge available for

GTF2IRD1, suggest that GTF2IRD1 is not a classical transcription factor whose main function is to be recruited to a specific set of DNA targets in order to regulate their expression. At this stage, we propose a working model in which GTF2IRD1 is involved in gene regulation, which may involve an element of direct GTF2IRD1 DNA binding, but is mainly mediated through interaction with a range of other chromatin-binding proteins or transcriptional cofactors (figure 7.1). These interactions would be regulated by post-translational modification of GTF2IRD1 through SUMOylation or ubiquitination. GTF2IRD1 would then recruit other proteins involved in adding or removing chromatin marks such as DNA methylation or histone acetylation/methylation. These activities could also integrate signals that are

202 communicated via armadillo proteins or originating from the primary cilium/centrosome complex. The recruitment of each of these interacting partners and the genomic locations of the genes regulated would vary, depending on the cellular context and the status of the chromatin within such cells. This model reminds us that when studying a transcriptional regulator, all facets of its activity should be taken into account, rather than focusing only on DNA binding properties, since the integration with protein interaction studies will also provide valuable information on the molecular properties of the proteins in combination.

This model also provides a basis for considering WBS and 7q11.23 duplication syndrome as epigenetic diseases, in which the phenotypes observed may not just the additive consequences of a collection of individual effects caused by copy number variations of dosage sensitive genes, but more as a product of abnormal epigenetic regulation due to disturbances in various elements of the chromatin modifying machinery. Exploring the molecular role of GTF2IRD1 should ultimately provide a means to understand the cellular mechanisms causing WBS, but it also provides a unique opportunity to study the genetic and epigenetic mechanisms that contribute to many aspects of human mood and behaviour.

203

Figure 7.1 Proposed model for the molecular role of GTF2IRD1.

GTF2IRD1 can modulate gene regulation through its own DNA binding properties, but mainly through interactions with a range of other proteins that assemble according to cellular context, cell signalling stimuli and post-translational modification. GTF2IRD1 binds to its own GTF2IRD1 upstream region (GUR) (Palmer et al., 2010) and it can also bind other transcription factors (TF), and methyl binding domain (MBD) proteins that can bind to methylated CpG DNA residues. GTF2IRD1 interacts with primary cilia and centrosome proteins (PC/Ct), which may shuttle to the nucleus or alternatively,

GTF2IRD1 may shuttle out to the primary cilium. An array of interactions may occur with histone modifying proteins involved in methylation and deacetylation and these may provide secondary interactions with other proteins belonging to epigenetic regulation complexes, and these interactions will provide the means to change the epigenetic marks that control gene expression.

204

REFERENCES

Adamo, A., et al., 2015. 7q11.23 dosage-dependent dysregulation in human pluripotent stem cells affects transcriptional programs in disease-relevant lineages. Nat Genet. 47, 132-41. Ahmed, M., Xu, J., Xu, P.X., 2012. EYA1 and SIX1 drive the neuronal developmental program in cooperation with the SWI/SNF chromatin-remodeling complex and SOX2 in the mammalian inner ear. Development. 139, 1965-77. Antonell, A., et al., 2010a. Partial 7q11.23 deletions further implicate GTF2I and GTF2IRD1 as the main genes responsible for the Williams-Beuren syndrome neurocognitive profile. J Med Genet. 47, 312-20. Antonell, A., Vilardell, M., Perez Jurado, L.A., 2010b. Transcriptome profile in Williams-Beuren syndrome lymphoblast cells reveals gene pathways implicated in glucose intolerance and visuospatial construction deficits. Hum Genet. 128, 27-37. Ashburner, M., et al., 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 25, 25-9. Ashe, A., et al., 2008. A genome-wide screen for modifiers of transgene variegation identifies genes with critical roles in development. Genome Biol. 9, R182. Attia, M., et al., 2007. Nap1l2 promotes histone acetylation activity during neuronal differentiation. Mol Cell Biol. 27, 6093-102. Attia, M., et al., 2011. Interaction between nucleosome assembly protein 1-like family members. J Mol Biol. 407, 647-60. Baillat, D., et al., 2005. Integrator, a multiprotein mediator of small nuclear RNA processing, associates with the C-terminal repeat of RNA polymerase II. Cell. 123, 265-76. Balan, S., et al., 2013. Lack of association of EGR2 variants with bipolar disorder in Japanese population. Gene. 526, 246-50. Barozzi, S., et al., 2012. Audiological findings in Williams syndrome: a study of 69 patients. Am J Med Genet A. 158A, 759-71. Bartnik, M., et al., 2014. Application of array comparative genomic hybridization in 256 patients with developmental delay or intellectual disability. J Appl Genet. 55, 125-44. Bass-Zubek, A.E., et al., 2009. Plakophilins: multifunctional scaffolds for adhesion and signaling. Curr Opin Cell Biol. 21, 708-16. Bayarsaihan, D., Ruddle, F.H., 2000a. Isolation and characterization of BEN, a member of the TFII-I family of DNA-binding proteins containing distinct helix-loop- helix domains. Proc Natl Acad Sci U S A. 97, 7342-7. Bayarsaihan, D., Ruddle, F.H., 2000b. Isolation and characterization of BEN, a member of the TFII-I family of DNA-binding proteins containing distinct helix-loop- helix domains. Proc Natl Acad Sci U S A. 97, 7342-7347. Bayarsaihan, D., et al., 2003. Homez, a homeobox leucine zipper gene specific to the vertebrate lineage. Proc Natl Acad Sci U S A. 100, 10358-63. Bayarsaihan, D., Makeyev, A.V., Enkhmandakh, B., 2012. Epigenetic modulation by TFII-I during embryonic stem cell differentiation. J Cell Biochem. 113, 3056- 60. Bayes, M., et al., 2003. Mutational mechanisms of Williams-Beuren syndrome deletions. Am J Hum Genet. 73, 131-51. 205

Beckmann, A.M., Wilce, P.A., 1997. Egr transcription factors in the nervous system. Neurochem Int. 31, 477-510; discussion 517-6. Bellugi, U., et al., 1999. Bridging cognition, the brain and molecular genetics: evidence from Williams syndrome. Trends Neurosci. 22, 197-207. Berg, J.S., et al., 2007. Speech delay and autism spectrum behaviors are frequently associated with duplication of the 7q11.23 Williams-Beuren syndrome region. Genet Med. 9, 427-41. Beunders, G., et al., 2010. A triplication of the Williams-Beuren syndrome region in a patient with mental retardation, a severe expressive language delay, behavioural problems and dysmorphisms. J Med Genet. 47, 271-5. Binda, O., et al., 2013. SETD6 monomethylates H2AZ on lysine 7 and is required for the maintenance of embryonic stem cell self-renewal. Epigenetics. 8, 177-83. Binder, J.X., et al., 2014. COMPARTMENTS: unification and visualization of protein subcellular localization evidence. Database (Oxford). 2014, bau012. Bonini, N.M., Leiserson, W.M., Benzer, S., 1993. The eyes absent gene: genetic control of cell survival and differentiation in the developing Drosophila eye. Cell. 72, 379-95. Calvo, S., et al., 2001. Molecular dissection of DNA sequences and factors involved in slow muscle-specific transcription. Mol Cell Biol. 21, 8490-503. Campuzano, V., et al., 2012. Reduction of NADPH-oxidase activity ameliorates the cardiovascular phenotype in a mouse model of Williams-Beuren Syndrome. PLoS Genet. 8, e1002458. Canales, C.P., et al., 2014. The role of GTF2IRD1 in the auditory pathology of Williams-Beuren Syndrome. Eur J Hum Genet. Canzio, D., Larson, A., Narlikar, G.J., 2014. Mechanisms of functional promiscuity by HP1 proteins. Trends Cell Biol. 24, 377-86. Capponi, S., et al., 2011. HSPB1 and HSPB8 in inherited neuropathies: study of an Italian cohort of dHMN and CMT2 patients. J Peripher Nerv Syst. 16, 287-94. Caraveo, G., et al., 2006. Action of TFII-I outside the nucleus as an inhibitor of agonist- induced calcium entry. Science. 314, 122-5. Carpino, N., et al., 2004. Regulation of ZAP-70 activation and TCR signaling by two related proteins, Sts-1 and Sts-2. Immunity. 20, 37-46. Carracedo, A., Ito, K., Pandolfi, P.P., 2011. The nuclear bodies inside out: PML conquers the cytoplasm. Curr Opin Cell Biol. 23, 360-6. Castiglione, A., et al., 2014. EYA1-related disorders: two clinical cases and a literature review. Int J Pediatr Otorhinolaryngol. 78, 1201-10. Chandramouli, K., Qian, P.Y., 2009. Proteomics: challenges, techniques and possibilities to overcome biological sample complexity. Hum Genomics Proteomics. 2009. Chang, C.C., et al., 2011a. Structural and functional roles of Daxx SIM phosphorylation in SUMO paralog-selective binding and apoptosis modulation. Mol Cell. 42, 62- 74. Chang, S.W., et al., 2011b. NRIP, a novel calmodulin binding protein, activates calcineurin to dephosphorylate human papillomavirus E2 protein. J Virol. 85, 6750-63. Chapman, C.A., du Plessis, A., Pober, B.R., 1996. Neurologic findings in children and adults with Williams syndrome. J Child Neurol. 11, 63-5. Chen, J., et al., 2013. Functional analysis of the integrator subunit 12 identifies a microdomain that mediates activation of the Drosophila integrator complex. J Biol Chem. 288, 4867-77. 206

Chen, P.H., et al., 2008. Nuclear receptor interaction protein, a coactivator of androgen receptors (AR), is regulated by AR and Sp1 to feed forward and activate its own gene expression through AR protein stability. Nucleic Acids Res. 36, 51-66. Chen, X., et al., 2002. Protein binding and functional characterization of plakophilin 2. Evidence for its diverse roles in desmosomes and beta -catenin signaling. J Biol Chem. 277, 10512-22. Cheng, M.C., et al., 2012. Genetic and functional analyses of early growth response (EGR) family genes in schizophrenia. Prog Neuropsychopharmacol Biol Psychiatry. 39, 149-55. Cheriyath, V., Roy, A.L., 2000. Alternatively spliced isoforms of TFII-I. Complex formation, nuclear translocation, and differential gene regulation. J Biol Chem. 275, 26300-8. Cherniske, E.M., et al., 2004. Multisystem study of 20 older adults with Williams syndrome. Am J Med Genet A. 131, 255-64. Chimge, N.O., et al., 2007. Expression profiling of BEN regulated genes in mouse embryonic fibroblasts. J Exp Zool B Mol Dev Evol. 308, 209-24. Chimge, N.O., et al., 2008. Identification of the TFII-I family target genes in the vertebrate genome. Proc Natl Acad Sci U S A. 105, 9006-10. Chimge, N.O., et al., 2012. PI3K/Akt-dependent functions of TFII-I transcription factors in mouse embryonic stem cells. J Cell Biochem. 113, 1122-31. Chook, Y.M., Blobel, G., 2001. Karyopherins and nuclear import. Curr Opin Struct Biol. 11, 703-15. Clark, P.J., et al., 2014. Wheel running alters patterns of uncontrollable stress-induced cfos mRNA expression in rat dorsal striatum direct and indirect pathways: A possible role for plasticity in adenosine receptors. Behav Brain Res. 272, 252- 63. Collins, R.T., 2nd, 2013. Cardiovascular disease in Williams syndrome. Circulation. 127, 2125-34. Crespi, B.J., Hurd, P.L., 2014. Cognitive-behavioral phenotypes of Williams syndrome are associated with genetic variation in the GTF2I gene, in a healthy population. BMC Neurosci. 15, 127. Crippa, V., et al., 2010. A role of small heat shock protein B8 (HspB8) in the autophagic removal of misfolded proteins responsible for neurodegenerative diseases. Autophagy. 6, 958-60. Crittenden, J.R., Graybiel, A.M., 2011. Basal Ganglia disorders associated with imbalances in the striatal striosome and matrix compartments. Front Neuroanat. 5, 59. Cross, S.H., et al., 1997. A component of the transcriptional repressor MeCP1 shares a motif with DNA methyltransferase and HRX proteins. Nat Genet. 16, 256-9. Danoff, S.K., et al., 2004. TFII-I, a candidate gene for Williams syndrome cognitive profile: parallels between regional expression in mouse brain and human phenotype. Neuroscience. 123, 931-8. Dassesse, D., et al., 2001. Functional striatal hypodopaminergic activity in mice lacking adenosine A(2A) receptors. J Neurochem. 78, 183-98. Davies, M., Udwin, O., Howlin, P., 1998. Adults with Williams syndrome. Preliminary study of social, emotional and behavioural difficulties. Br J Psychiatry. 172, 273-6. Davies, W., et al., 2004. Expression patterns of the novel imprinted genes Nap1l5 and Peg13 and their non-imprinted host genes in the adult mouse brain. Gene Expr Patterns. 4, 741-7. 207

De Graeve, F., et al., 2000. A murine ATFa-associated factor with transcriptional repressing activity. Oncogene. 19, 1807-19. de Ruijter, A.J., et al., 2003. Histone deacetylases (HDACs): characterization of the classical HDAC family. Biochem J. 370, 737-49. Del Campo, M., et al., 2006. Hemizygosity at the NCF1 gene in patients with Williams- Beuren syndrome decreases their risk of hypertension. Am J Hum Genet. 78, 533-42. Denham, M., Dottori, M., 2011. Neural differentiation of induced pluripotent stem cells. Methods Mol Biol. 793, 99-110. Denham, M., et al., 2012. Neurons derived from human embryonic stem cells extend long-distance axonal projections through growth along host white matter tracts after intra-cerebral transplantation. Front Cell Neurosci. 6, 11. Deniaud, E., et al., 2009. Overexpression of transcription factor Sp1 leads to gene expression perturbations and cell cycle inhibition. PLoS One. 4, e7035. Desgranges, Z.P., Roy, A.L., 2006. TFII-I: connecting mitogenic signals to cell cycle regulation. Cell Cycle. 5, 356-9. DeSilva, U., et al., 2002. Generation and comparative analysis of approximately 3.3 Mb of mouse genomic sequence orthologous to the region of human chromosome 7q11.23 implicated in Williams syndrome. Genome Res. 12, 3-15. Deutsch, A.J., et al., 2012. The nuclear orphan receptors NR4A as therapeutic target in cancer therapy. Anticancer Agents Med Chem. 12, 1001-14. Dinsmore, J.H., Solomon, F., 1991. Inhibition of MAP2 expression affects both morphological and cell division phenotypes of neuronal differentiation. Cell. 64, 817-26. Dixit, A., et al., 2013a. 7q11.23 Microduplication: a recognizable phenotype. Clinical Genetics. 83, 155-161. Dixit, A., et al., 2013b. 7q11.23 Microduplication: a recognizable phenotype. Clin Genet. 83, 155-61. Doetzlhofer, A., et al., 1999. Histone deacetylase 1 can repress transcription by binding to Sp1. Mol Cell Biol. 19, 5504-11. Dottori, M., Pera, M.F., 2008. Neural differentiation of human embryonic stem cells. Methods Mol Biol. 438, 19-30. Dowdle, J.A., et al., 2013. Mouse BAZ1A (ACF1) is dispensable for double-strand break repair but is essential for averting improper gene expression during spermatogenesis. PLoS Genet. 9, e1003945. Doyle, T.F., et al., 2004. "Everybody in the world is my friend" hypersociability in young children with Williams syndrome. Am J Med Genet A. 124A, 263-73. Durkin, M.E., et al., 2001. Integration of a c-myc transgene results in disruption of the mouse Gtf2ird1 gene, the homologue of the human GTF2IRD1 gene hemizygously deleted in Williams-Beuren syndrome. Genomics. 73, 20-7. Eberl, H.C., et al., 2013. A map of general and specialized chromatin readers in mouse tissues generated by label-free interaction proteomics. Mol Cell. 49, 368-78. Edelmann, L., et al., 2007. An atypical deletion of the Williams-Beuren syndrome interval implicates genes associated with defective visuospatial processing and autism. J Med Genet. 44, 136-43. Eissenberg, J.C., Elgin, S.C., 2014. HP1a: a structural chromosomal protein regulating transcription. Trends Genet. 30, 103-10. Enkhmandakh, B., et al., 2009. Essential functions of the Williams-Beuren syndrome- associated TFII-I genes in embryonic development. Proc Natl Acad Sci U S A. 106, 181-6. 208

Euteneuer, J., et al., 2014. Molecular and phenotypic characterization of atypical Williams-Beuren syndrome. Clin Genet. 86, 487-91. Ewart, A.K., et al., 1993. Hemizygosity at the elastin locus in a developmental disorder, Williams syndrome. Nat Genet. 5, 11-6. Fei, Y., et al., 2009. Identification of novel genetic susceptibility loci for Behcet's disease using a genome-wide association study. Arthritis Res Ther. 11, R66. Feng, Q., Zhang, Y., 2001. The MeCP1 complex represses transcription through preferential binding, remodeling, and deacetylating methylated nucleosomes. Genes Dev. 15, 827-32. Figueira, R.J., Peabody, M.F., Lonstein, J.S., 2008. Oxytocin receptor activity in the ventrocaudal periaqueductal gray modulates anxiety-related behavior in postpartum rats. Behav Neurosci. 122, 618-28. Frangiskakis, J.M., et al., 1996. LIM-kinase1 hemizygosity implicated in impaired visuospatial constructive cognition. Cell. 86, 59-69. Fujita, N., et al., 2003a. MCAF mediates MBD1-dependent transcriptional repression. Mol Cell Biol. 23, 2834-43. Fujita, N., et al., 2003b. Methyl-CpG binding domain 1 (MBD1) interacts with the Suv39h1-HP1 heterochromatic complex for DNA methylation-based transcriptional repression. J Biol Chem. 278, 24132-8. Fujiwara, T., et al., 2006. Analysis of knock-out mice to determine the role of HPC- 1/syntaxin 1A in expressing synaptic plasticity. J Neurosci. 26, 5767-76. Gallinari, P., et al., 2007. HDACs, histone deacetylation and gene transcription: from molecular biology to cancer therapeutics. Cell Res. 17, 195-211. Ghimouz, R., et al., 2011. The homeobox leucine zipper gene Homez plays a role in Xenopus laevis neurogenesis. Biochem Biophys Res Commun. 415, 11-6. Gietz, R.D., Woods, R.A., 2002. Transformation of yeast by lithium acetate/single- stranded carrier DNA/polyethylene glycol method. Methods Enzymol. 350, 87- 96. Gocke, C.B., Yu, H., 2008. ZNF198 stabilizes the LSD1-CoREST-HDAC1 complex on chromatin through its MYM-type zinc fingers. PLoS One. 3, e3255. Goergen, C.J., et al., 2011. Induced chromosome deletion in a Williams-Beuren syndrome mouse model causes cardiovascular abnormalities. J Vasc Res. 48, 119-29. Golbabapour, S., et al., 2013. Gene silencing and Polycomb group proteins: an overview of their structure, mechanisms and phylogenetics. OMICS. 17, 283-96. Gong, K.Q., et al., 2007. A Hox-Eya-Pax complex regulates early kidney developmental gene expression. Mol Cell Biol. 27, 7661-8. Gosch, A., Pankau, R., 1994. Social-emotional and behavioral adjustment in children with Williams-Beuren syndrome. Am J Med Genet. 53, 335-9. Gothelf, D., et al., 2006. Hyperacusis in Williams syndrome: characteristics and associated neuroaudiologic abnormalities. Neurology. 66, 390-5. Gray, V., et al., 2006. In-depth analysis of spatial cognition in Williams syndrome: A critical assessment of the role of the LIMK1 gene. Neuropsychologia. 44, 679- 85. Groenewegen, H.J., 2003. The Basal Ganglia and Motor Control. Neural Plasticity. 10. Grueneberg, D.A., et al., 1997. A multifunctional DNA-binding protein that promotes the formation of serum response factor/homeodomain complexes: identity to TFII-I. Genes Dev. 11, 2482-93. Guemez-Gamboa, A., Coufal, N.G., Gleeson, J.G., 2014. Primary cilia in the developing and mature brain. Neuron. 82, 511-21. 209

Haas, B.W., et al., 2014. Regionally specific increased volume of the amygdala in Williams syndrome: evidence from surface-based modeling. Hum Brain Mapp. 35, 866-74. Hakimi, M.A., et al., 2002. A core-BRAF35 complex containing histone deacetylase mediates repression of neuronal-specific genes. Proc Natl Acad Sci U S A. 99, 7420-5. Hakimi, M.A., et al., 2003. A candidate X-linked mental retardation gene is a component of a new family of histone deacetylase-containing complexes. J Biol Chem. 278, 7234-9. Hakre, S., et al., 2006. Opposing functions of TFII-I spliced isoforms in growth factor- induced gene expression. Mol Cell. 24, 301-8. Hammond, P., et al., 2005. Discriminating power of localized three-dimensional facial morphology. Am J Hum Genet. 77, 999-1010. Han, Y.G., et al., 2009. Dual and opposing roles of primary cilia in medulloblastoma development. Nat Med. 15, 1062-5. Hatzfeld, M., et al., 2000. The function of plakophilin 1 in desmosome assembly and actin filament organization. J Cell Biol. 149, 209-22. Hawk, J.D., et al., 2012. NR4A nuclear receptors support memory enhancement by histone deacetylase inhibitors. J Clin Invest. 122, 3593-602. Hell, S.W., Wichmann, J., 1994. Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy. Opt Lett. 19, 780-2. Henrichsen, C.N., et al., 2011. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol. 7, e1001054. Herrera, D.G., Robertson, H.A., 1996. Activation of c-fos in the brain. Prog Neurobiol. 50, 83-107. Hinsley, T.A., et al., 2004. Comparison of TFII-I gene family members deleted in Williams-Beuren syndrome. Protein Sci. 13, 2588-99. Hirata, H., et al., 2013. ZC4H2 mutations are associated with arthrogryposis multiplex congenita and intellectual disability through impairment of central and peripheral synaptic plasticity. Am J Hum Genet. 92, 681-95. Hirota, H., et al., 2003. Williams syndrome deficits in visual spatial processing linked to GTF2IRD1 and GTF2I on chromosome 7q11.23. Genet Med. 5, 311-21. Hobart, H.H., et al., 2010. Inversion of the Williams syndrome region is a common polymorphism found more frequently in parents of children with Williams syndrome. Am J Med Genet C Semin Med Genet. 154C, 220-8. Hocking, D.R., Bradshaw, J.L., Rinehart, N.J., 2008. Fronto-parietal and cerebellar contributions to motor dysfunction in Williams syndrome: a review and future directions. Neurosci Biobehav Rev. 32, 497-507. Hoffman, C.S., Winston, F., 1987. A ten-minute DNA preparation from yeast efficiently releases autonomous plasmids for transformation of Escherichia coli. Gene. 57, 267-72. Hoogenraad, C.C., et al., 2002. Targeted mutation of Cyln2 in the Williams syndrome critical region links CLIP-115 haploinsufficiency to neurodevelopmental abnormalities in mice. Nat Genet. 32, 116-27. Hovis, C.L., Butler, M.G., 1997. Photoanthropometric study of craniofacial traits in individuals with Williams syndrome. Clin Genet. 51, 379-87.

210

Howard, M.L., et al., 2012. Mutation of Gtf2ird1 from the Williams-Beuren syndrome critical region results in facial dysplasia, motor dysfunction, and altered vocalisations. Neurobiol Dis. 45, 913-22. Ichimura, T., et al., 2005. Transcriptional repression and heterochromatin formation by MBD1 and MCAF/AM family proteins. J Biol Chem. 280, 13928-35. Iizuka, K., et al., 2004. Deficiency of carbohydrate response element-binding protein (ChREBP) reduces lipogenesis as well as glycolysis. Proc Natl Acad Sci U S A. 101, 7281-6. Iizuka, K., Horikawa, Y., 2008. ChREBP: a glucose-activated transcription factor involved in the development of metabolic syndrome. Endocr J. 55, 617-24. Issa, L.L., et al., 2006. MusTRD can regulate postnatal fiber-specific expression. Dev Biol. 293, 104-15. Jackowski, A.P., et al., 2009. Brain abnormalities in Williams syndrome: a review of structural and functional magnetic resonance imaging findings. Eur J Paediatr Neurol. 13, 305-16. Jarvinen-Pasley, A., et al., 2008. Defining the social phenotype in Williams syndrome: a model for linking gene, the brain, and behavior. Dev Psychopathol. 20, 1-35. Jenner, P., et al., 2009. Adenosine, adenosine A 2A antagonists, and Parkinson's disease. Parkinsonism Relat Disord. 15, 406-13. Jiang, C.L., et al., 2002. MBD3L1 and MBD3L2, two new proteins homologous to the methyl-CpG-binding proteins MBD2 and MBD3: characterization of MBD3L1 as a testis-specific transcriptional repressor. Genomics. 80, 621-9. Jiang, C.L., Jin, S.G., Pfeifer, G.P., 2004. MBD3L1 is a transcriptional repressor that interacts with methyl-CpG-binding protein 2 (MBD2) and components of the NuRD complex. J Biol Chem. 279, 52456-64. Jochum, W., Passegue, E., Wagner, E.F., 2001. AP-1 in mouse development and tumorigenesis. Oncogene. 20, 2401-12. Jung, J., et al., 2013. Quantifying RNA-protein interactions in situ using modified- MTRIPs and proximity ligation. Nucleic Acids Res. 41, e12. Katsetos, C.D., Herman, M.M., Mork, S.J., 2003. Class III beta-tubulin in human development and cancer. Cell Motil Cytoskeleton. 55, 77-96. Kelly, R.D., Cowley, S.M., 2013. The physiological roles of histone deacetylase (HDAC) 1 and 2: complex co-stars with multiple leading parts. Biochem Soc Trans. 41, 741-9. Khan, M.M., et al., 2001. Role of PML and PML-RARalpha in Mad-mediated transcriptional repression. Mol Cell. 7, 1233-43. Kim, D.W., et al., 1998. TFII-I enhances activation of the c-fos promoter through interactions with upstream elements. Mol Cell Biol. 18, 3310-20. Kim, J.Y., et al., 2003. Dopaminergic neuronal differentiation from rat embryonic neural precursors by Nurr1 overexpression. J Neurochem. 85, 1443-54. Kim, S.H., et al., 2012. Genetic association of the EGR2 gene with bipolar disorder in Korea. Exp Mol Med. 44, 121-9. Kirov, G., et al., 2012. De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia. Mol Psychiatry. 17, 142-53. Kovacs, K.J., 1998. c-Fos as a transcription factor: a stressful (re)view from a functional map. Neurochem Int. 33, 287-97. Ku, M., et al., 2005. Positive and negative regulation of the transforming growth factor beta/activin target gene goosecoid by the TFII-I family of transcription factors. Mol Cell Biol. 25, 7144-57. 211

Kunapuli, P., et al., 2006. ZNF198, a zinc finger protein rearranged in myeloproliferative disease, localizes to the PML nuclear bodies and interacts with SUMO-1 and PML. Exp Cell Res. 312, 3739-51. Kurakula, K., et al., 2014. NR4A nuclear receptors are orphans but not lonesome. Biochim Biophys Acta. 1843, 2543-2555. Kuzmichev, A., et al., 2002. Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein. Genes Dev. 16, 2893-905. Lam, P.P., et al., 2005. Transgenic mouse overexpressing syntaxin-1A as a diabetes model. Diabetes. 54, 2744-54. Lazebnik, M.B., Tussie-Luna, M.I., Roy, A.L., 2008. Determination and functional analysis of the consensus binding site for TFII-I family member BEN, implicated in Williams-Beuren syndrome. J Biol Chem. 283, 11078-82. Le, V.P., et al., 2011. Decreased aortic diameter and compliance precedes blood pressure increases in postnatal development of elastin-insufficient mice. Am J Physiol Heart Circ Physiol. 301, H221-9. Levy, D., et al., 2011. Lysine methylation of the NF-kappaB subunit RelA by SETD6 couples activity of the histone methyltransferase GLP at chromatin to tonic repression of NF-kappaB signaling. Nat Immunol. 12, 29-36. Leyfer, O.T., et al., 2006. Prevalence of psychiatric disorders in 4 to 16-year-olds with Williams syndrome. Am J Med Genet B Neuropsychiatr Genet. 141B, 615-22. Li, D.Y., et al., 1997. Elastin point mutations cause an obstructive vascular disease, supravalvular aortic stenosis. Hum Mol Genet. 6, 1021-8. Li, D.Y., et al., 1998a. Elastin is an essential determinant of arterial morphogenesis. Nature. 393, 276-80. Li, D.Y., et al., 1998b. Novel arterial pathology in mice and humans hemizygous for elastin. J Clin Invest. 102, 1783-7. Li, H.H., et al., 2009. Induced chromosome deletions cause hypersociability and other features of Williams-Beuren syndrome in mice. EMBO Mol Med. 1, 50-65. Livak, K.J., Schmittgen, T.D., 2001. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 25, 402-8. Lombard, Z., et al., 2011. A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif- based linear discriminatory analysis. Biol Direct. 6, 30. Lopez-Domenech, G., et al., 2012. The Eutherian Armcx genes regulate mitochondrial trafficking in neurons and interact with Miro and Trak2. Nat Commun. 3, 814. Lucena, J., et al., 2010. Essential role of the N-terminal region of TFII-I in viability and behavior. BMC Med Genet. 11, 61. Luo, R., et al., 2012. Genome-wide transcriptome profiling reveals the functional impact of rare de novo and recurrent CNVs in autism spectrum disorders. Am J Hum Genet. 91, 38-55. Lupski, J.R., 1998. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417-22. Lyst, M.J., Nan, X., Stancheva, I., 2006. Regulation of MBD1-mediated transcriptional repression by SUMO and PIAS proteins. EMBO J. 25, 5317-28. Maheux, J., et al., 2005. Induction patterns of transcription factors of the nur family (nurr1, nur77, and nor-1) by typical and atypical antipsychotics in the mouse brain: implication for their mechanism of action. J Pharmacol Exp Ther. 313, 460-73. 212

Makeyev, A.V., et al., 2004. GTF2IRD2 is located in the Williams-Beuren syndrome critical region 7q11.23 and encodes a protein with two TFII-I-like helix-loop- helix repeats. Proc Natl Acad Sci U S A. 101, 11052-7. Makeyev, A.V., Bayarsaihan, D., 2009. New TFII-I family target genes involved in embryonic development. Biochem Biophys Res Commun. 386, 554-8. Makeyev, A.V., Bayarsaihan, D., 2011. Molecular basis of Williams-Beuren syndrome: TFII-I regulated targets involved in craniofacial development. Cleft Palate Craniofac J. 48, 109-16. Makeyev, A.V., et al., 2012. Diversity and complexity in chromatin recognition by TFII-I transcription factors in pluripotent embryonic stem cells and embryonic tissues. PLoS One. 7, e44443. Malenfant, P., et al., 2012. Association of GTF2i in the Williams-Beuren syndrome critical region with autism spectrum disorders. J Autism Dev Disord. 42, 1459- 69. Marler, J.A., et al., 2005. Sensorineural hearing loss in children and adults with Williams syndrome. Am J Med Genet A. 138, 318-27. Marler, J.A., et al., 2010. Auditory function and hearing loss in children and adults with Williams syndrome: cochlear impairment in individuals with otherwise normal hearing. Am J Med Genet C Semin Med Genet. 154C, 249-65. Martens, M.A., Wilson, S.J., Reutens, D.C., 2008. Research Review: Williams syndrome: a critical review of the cognitive, behavioral, and neuroanatomical phenotype. J Child Psychol Psychiatry. 49, 576-608. Martin, C., Zhang, Y., 2005. The diverse functions of histone lysine methylation. Nat Rev Mol Cell Biol. 6, 838-49. Martins-de-Souza, D., et al., 2012. Proteomic analysis identifies dysfunction in cellular transport, energy, and protein metabolism in different brain regions of atypical frontotemporal lobar degeneration. J Proteome Res. 11, 2533-43. Mass, E., Belostoky, L., 1993. Craniofacial morphology of children with Williams syndrome. Cleft Palate Craniofac J. 30, 343-9. Masuda, T., et al., 2014. The transcription factor GTF2IRD1 regulates the topology and function of photoreceptors by modulating photoreceptor gene expression across the retina. J Neurosci. 34, 15356-68. Matera, A.G., 1999. Nuclear bodies: multifaceted subdomains of the interchromatin space. Trends Cell Biol. 9, 302-9. McGrath, J.A., et al., 1997. Mutations in the plakophilin 1 gene result in ectodermal dysplasia/skin fragility syndrome. Nat Genet. 17, 240-4. Meier, K., Brehm, A., 2014. Chromatin regulation: how complex does it get? Epigenetics. 9, 1485-95. Meng, Y., et al., 2002. Abnormal spine morphology and enhanced LTP in LIMK-1 knockout mice. Neuron. 35, 121-33. Merla, G., et al., 2006. Submicroscopic deletion in patients with Williams-Beuren syndrome influences expression levels of the nonhemizygous flanking genes. Am J Hum Genet. 79, 332-41. Merla, G., et al., 2010. Copy number variants at Williams-Beuren syndrome 7q11.23 region. Hum Genet. 128, 3-26. Merla, G., et al., 2012. Supravalvular aortic stenosis: elastin arteriopathy. Circ Cardiovasc Genet. 5, 692-6. Mervis, C.B., et al., 2003. Attentional characteristics of infants and toddlers with Williams syndrome during triadic interactions. Dev Neuropsychol. 23, 243-68.

213

Mervis, C.B., et al., 2012. Duplication of GTF2I results in separation anxiety in mice and humans. Am J Hum Genet. 90, 1064-70. Metzger, E., et al., 2005. LSD1 demethylates repressive histone marks to promote androgen-receptor-dependent transcription. Nature. 437, 436-9. Meyer-Lindenberg, A., et al., 2004. Neural basis of genetically determined visuospatial construction deficit in Williams syndrome. Neuron. 43, 623-31. Meyer-Lindenberg, A., et al., 2005. Neural correlates of genetically abnormal social cognition in Williams syndrome. Nat Neurosci. 8, 991-3. Meyer-Lindenberg, A., Mervis, C.B., Berman, K.F., 2006. Neural mechanisms in Williams syndrome: a unique window to genetic influences on cognition and behaviour. Nat Rev Neurosci. 7, 380-93. Minc, E., Courvalin, J.C., Buendia, B., 2000. HP1gamma associates with euchromatin and heterochromatin in mammalian nuclei and . Cytogenet Cell Genet. 90, 279-84. Mitchell, R.L., Hanks, S.K., Verma, I.M., 1986. Proto-oncogene fos: an inducible multifaceted gene. Symp Fundam Cancer Res. 39, 99-113. Mobbs, D., et al., 2007. Frontostriatal dysfunction during response inhibition in Williams syndrome. Biol Psychiatry. 62, 256-61. Molina, J., et al., 2008. Abnormal social behaviors and altered gene expression rates in a mouse model for Potocki-Lupski syndrome. Hum Mol Genet. 17, 2486-95. Morris, C.A., Thomas, I.T., Greenberg, F., 1993. Williams syndrome: autosomal dominant inheritance. Am J Med Genet. 47, 478-81. Morris, C.A., et al., 2003. GTF2I hemizygosity implicated in mental retardation in Williams syndrome: genotype-phenotype analysis of five families with deletions in the Williams syndrome region. Am J Med Genet A. 123A, 45-59. Morris, C.A., 2010. The behavioral phenotype of Williams syndrome: A recognizable pattern of neurodevelopment. Am J Med Genet C Semin Med Genet. 154C, 427- 31. Mulle, J.G., et al., 2014. Reciprocal duplication of the Williams-Beuren syndrome deletion on chromosome 7q11.23 is associated with schizophrenia. Biol Psychiatry. 75, 371-7. Musharraf, A., et al., 2014. BOR-syndrome-associated Eya1 mutations lead to enhanced proteasomal degradation of Eya1 protein. PLoS One. 9, e87407. Narlikar, G.J., Fan, H.Y., Kingston, R.E., 2002. Cooperation between complexes that regulate chromatin structure and transcription. Cell. 108, 475-87. Neumann, I.D., 2008. Brain oxytocin: a key regulator of emotional and social behaviours in both females and males. J Neuroendocrinol. 20, 858-65. Ng, H.H., et al., 1999. MBD2 is a transcriptional repressor belonging to the MeCP1 histone deacetylase complex. Nat Genet. 23, 58-61. Nishibuchi, G., Nakayama, J., 2014. Biochemical and structural properties of heterochromatin protein 1: understanding its role in chromatin assembly. J Biochem. 156, 11-20. Novina, C.D., et al., 1999. Regulation of nuclear localization and transcriptional activity of TFII-I by Bruton's tyrosine kinase. Mol Cell Biol. 19, 5014-24. O'Leary, J., Osborne, L.R., 2011. Global analysis of gene expression in the developing brain of Gtf2ird1 knockout mice. PLoS One. 6, e23868. O'Mahoney, J., et al., 1998a. Identification of a novel slow-muslce-fiber enhancer binding protein, MusTRD1. Mol Cell Biol. 18, 6641-6652. O'Mahoney, J.V., et al., 1998b. Identification of a novel slow-muscle-fiber enhancer binding protein, MusTRD1. Mol Cell Biol. 18, 6641-52. 214

O'Neill, D.J., et al., 2014. SETD6 controls the expression of estrogen-responsive genes and proliferation of breast carcinoma cells. Epigenetics. 9, 942-50. Ochs, R.L., et al., 1995. Formation of nuclear bodies in hepatocytes of estrogen-treated roosters. Mol Biol Cell. 6, 345-56. Osborne, L.R., et al., 2001. A 1.5 million- inversion polymorphism in families with Williams-Beuren syndrome. Nat Genet. 29, 321-5. Osborne, L.R., Mervis, C.B., 2007. Rearrangements of the Williams-Beuren syndrome locus: molecular basis and implications for speech and language development. Expert Rev Mol Med. 9, 1-16. Osborne, L.R., 2010. Animal models of Williams syndrome. Am J Med Genet C Semin Med Genet. 154C, 209-19. Ounap, K., et al., 1998. Familial Williams-Beuren syndrome. Am J Med Genet. 80, 491-3. Palacios-Verdu, M.G., et al., 2015. Metabolic abnormalities in Williams-Beuren syndrome. J Med Genet. Palmer, S.J., et al., 2007. Expression of Gtf2ird1, the Williams syndrome-associated gene, during mouse development. Gene Expr Patterns. 7, 396-404. Palmer, S.J., et al., 2010. Negative autoregulation of GTF2IRD1 in Williams-Beuren syndrome via a novel DNA binding mechanism. J Biol Chem. 285, 4715-24. Palmer, S.J., et al., 2012. GTF2IRD2 from the Williams-Beuren critical region encodes a mobile-element-derived fusion protein that antagonizes the action of its related family members. J Cell Sci. 125, 5040-50. Pankau, R., et al., 2001. Familial Williams-Beuren syndrome showing varying clinical expression. Am J Med Genet. 98, 324-9. Parrott, A., et al., 2015. Aortopathy in the 7q11.23 microduplication syndrome. Am J Med Genet A. 167A, 363-70. Pastorcic, M., Das, H.K., 2007. Analysis of transcriptional modulation of the presenilin 1 gene promoter by ZNF237, a candidate binding partner of the Ets transcription factor ERM. Brain Res. 1128, 21-32. Pearen, M.A., Muscat, G.E., 2010. Minireview: Nuclear hormone receptor 4A signaling: implications for metabolic disease. Mol Endocrinol. 24, 1891-903. Pereira, G.S., et al., 2005. Activation of adenosine receptors in the posterior cingulate cortex impairs memory retrieval in the rat. Neurobiol Learn Mem. 83, 217-23. Perez Jurado, L.A., et al., 1998. A duplicated gene in the breakpoint regions of the 7q11.23 Williams-Beuren syndrome deletion encodes the initiator binding protein TFII-I and BAP-135, a phosphorylation target of BTK. Hum Mol Genet. 7, 325-34. Petrini, I., et al., 2014. A specific missense mutation in GTF2I occurs at high frequency in thymic epithelial tumors. Nat Genet. 46, 844-9. Phatnani, H.P., Greenleaf, A.L., 2006. Phosphorylation and functions of the RNA polymerase II CTD. Genes Dev. 20, 2922-36. Pober, B.R., 2010. Williams-Beuren syndrome. N Engl J Med. 362, 239-52. Poirier, R., et al., 2008. Distinct functions of egr gene family members in cognitive processes. Front Neurosci. 2, 47-55. Polly, P., et al., 2003a. hMusTRD1alpha1 represses MEF2 activation of the troponin I slow enhancer. J Biol Chem. 278, 36603-10. Polly, P., et al., 2003b. hMusTRD11 represses MEF2 activation of the troponin I slow enhancer. J Biol Chem. 278, 36603-36610.

215

Proulx, E., et al., 2010. Enhanced prefrontal serotonin 5-HT(1A) currents in a mouse model of Williams-Beuren syndrome with low innate anxiety. J Neurodev Disord. 2, 99-108. Reymond, A., et al., 2007. Side effects of genome structural changes. Curr Opin Genet Dev. 17, 381-6. Richardson, P.J., Kase, H., Jenner, P.G., 1997. Adenosine A2A receptor antagonists as new agents for the treatment of Parkinson's disease. Trends Pharmacol Sci. 18, 338-44. Ring, C., et al., 2002. The role of a Williams-Beuren syndrome-associated helix-loop- helix domain-containing transcription factor in activin/nodal signaling. Genes Dev. 16, 820-35. Rogner, U.C., et al., 2000. Control of neurulation by the nucleosome assembly protein- 1-like 2. Nat Genet. 25, 431-5. Roy, A.L., et al., 1991. Cooperative interaction of an initiator-binding transcription initiation factor and the helix-loop-helix activator USF. Nature. 354, 245-8. Roy, A.L., et al., 1997. Cloning of an inr- and E-box-binding protein, TFII-I, that interacts physically and functionally with USF1. EMBO J. 16, 7091-104. Roy, A.L., 2006. Transcription factor TFII-I conducts a cytoplasmic orchestra. ACS Chem Biol. 1, 619-22. Roy, A.L., 2007. Signal-induced functions of the transcription factor TFII-I. Biochim Biophys Acta. 1769, 613-21. Roy, A.L., 2012. Biochemistry and biology of the inducible multifunctional transcription factor TFII-I: 10 years later. Gene. 492, 32-41. Rudolph, T., Beuch, S., Reuter, G., 2013. Lysine-specific histone demethylase LSD1 and the dynamic control of chromatin. Biol Chem. 394, 1019-28. Sadler, L.S., et al., 1993. The Williams syndrome: evidence for possible autosomal dominant inheritance. Am J Med Genet. 47, 468-70. Saint-Preux, F., et al., 2013. Chronic co-administration of nicotine and methamphetamine causes differential expression of immediate early genes in the dorsal striatum and nucleus accumbens of rats. Neuroscience. 243, 89-96. Sakurai, T., et al., 2011. Haploinsufficiency of Gtf2i, a gene deleted in Williams Syndrome, leads to increases in social interactions. Autism Res. 4, 28-39. Sanders, S.J., et al., 2011. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron. 70, 863-85. Santos-Rosa, H., et al., 2002. Active genes are tri-methylated at K4 of histone H3. Nature. 419, 407-11. Sarraf, S.A., Stancheva, I., 2004. Methyl-CpG binding protein MBD1 couples histone H3 methylation at lysine 9 by SETDB1 to DNA replication and chromatin assembly. Mol Cell. 15, 595-605. Sasai, N., et al., 2013. The transcriptional cofactor MCAF1/ATF7IP is involved in histone gene expression and cellular senescence. PLoS One. 8, e68478. Scherer, S.W., et al., 2005. Observation of a parental inversion variant in a rare Williams-Beuren syndrome family with two affected children. Hum Genet. 117, 383-8. Schmickel, R.D., 1986. Contiguous gene syndromes: a component of recognizable syndromes. J Pediatr. 109, 231-41. Schneider, T., et al., 2012. Anxious, hypoactive phenotype combined with motor deficits in Gtf2ird1 null mouse model relevant to Williams syndrome. Behav Brain Res. 233, 458-73. 216

Schubert, C., 2009. The genomic basis of the Williams-Beuren syndrome. Cell Mol Life Sci. 66, 1178-97. Segura-Puimedon, M., et al., 2014. Heterozygous deletion of the Williams-Beuren syndrome critical interval in mice recapitulates most features of the human disorder. Hum Mol Genet. 23, 6481-94. Shannon, P., et al., 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498-504. Shen, T.H., et al., 2006. The mechanisms of PML-nuclear body formation. Mol Cell. 24, 331-9. Shiga, K., et al., 2012. A novel EGR2 mutation within a family with a mild demyelinating form of Charcot-Marie-Tooth disease. J Peripher Nerv Syst. 17, 206-9. Sobolik-Delmaire, T., et al., 2010. Plakophilin-1 localizes to the nucleus and interacts with single-stranded DNA. J Invest Dermatol. 130, 2638-46. Soderberg, O., et al., 2006. Direct observation of individual endogenous protein complexes in situ by proximity ligation. Nat Methods. 3, 995-1000. Soderberg, O., et al., 2008. Characterizing proteins and their interactions in cells and tissues using the in situ proximity ligation assay. Methods. 45, 227-32. Somerville, M.J., et al., 2005. Severe expressive-language delay related to duplication of the Williams-Beuren locus. N Engl J Med. 353, 1694-701. Sridharan, R., et al., 2013. Proteomic and genomic approaches reveal critical functions of H3K9 methylation and heterochromatin protein-1gamma in reprogramming to pluripotency. Nat Cell Biol. 15, 872-82. Staels, B., 2010. Introduction on the ATVB Review Series "Nuclear receptors in metabolism and cardiovascular disease". Arterioscler Thromb Vasc Biol. 30, 1504-5. Stankiewicz, P., Lupski, J.R., 2002. Molecular-evolutionary mechanisms for genomic disorders. Curr Opin Genet Dev. 12, 312-9. Stromme, P., Bjornstad, P.G., Ramstad, K., 2002. Prevalence estimation of Williams syndrome. J Child Neurol. 17, 269-71. Tam, E., et al., 2008. The common inversion of the Williams-Beuren syndrome region at 7q11.23 does not cause clinical symptoms. Am J Med Genet A. 146A, 1797- 806. Tantin, D., et al., 2004. Regulation of immunoglobulin promoter activity by TFII-I class transcription factors. J Biol Chem. 279, 5460-9. Tassabehji, M., et al., 1999. Williams syndrome: use of chromosomal microdeletions as a tool to dissect cognitive and physical phenotypes. Am J Hum Genet. 64, 118- 25. Tassabehji, M., 2003. Williams-Beuren syndrome: a challenge for genotype-phenotype correlations. Hum Mol Genet. 12 Spec No 2, R229-37. Tassabehji, M., et al., 2005. GTF2IRD1 in craniofacial development of humans and mice. Science. 310, 1184-7. Tassabehji, M., Urban, Z., 2006. Congenital heart disease: Molecular diagnostics of supravalvular aortic stenosis. Methods Mol Med. 126, 129-56. Thompson, P.D., et al., 2007. GTF2IRD1 regulates transcription by binding an evolutionarily conserved DNA motif 'GUCE'. FEBS Lett. 581, 1233-42. Tipney, H.J., et al., 2004. Isolation and characterisation of GTF2IRD2, a novel fusion gene and member of the TFII-I family of transcription factors, deleted in Williams-Beuren syndrome. Eur J Hum Genet. 12, 551-60.

217

Todorovski, Z., et al., 2015. LIMK1 Regulates Long-Term Memory and Synaptic Plasticity via the Transcriptional Factor CREB. Mol Cell Biol. 35, 1316-28. Tordjman, S., et al., 2013. Presence of autism, hyperserotonemia, and severe expressive language impairment in Williams-Beuren syndrome. Mol Autism. 4, 29. Tsai, T.C., et al., 2005. NRIP, a novel nuclear receptor interaction protein, enhances the transcriptional activity of nuclear receptors. J Biol Chem. 280, 20000-9. Tsygankov, A.Y., 2009. TULA-family proteins: an odd couple. Cell Mol Life Sci. 66, 2949-52. Tussie-Luna, M.I., et al., 2001. Repression of TFII-I-dependent transcription by nuclear exclusion. Proc Natl Acad Sci U S A. 98, 7789-94. Tussie-Luna, M.I., et al., 2002a. Physical and functional interactions of histone deacetylase 3 with TFII-I family proteins and PIASxbeta. Proc Natl Acad Sci U S A. 99, 12807-12. Tussie-Luna, M.I., et al., 2002b. The SUMO ubiquitin-protein isopeptide ligase family member Miz1/PIASxbeta /Siz2 is a transcriptional cofactor for TFII-I. J Biol Chem. 277, 43185-93. Uchimura, Y., et al., 2006. Involvement of SUMO modification in MBD1- and MCAF1-mediated heterochromatin formation. J Biol Chem. 281, 23180-90. Valero, M.C., et al., 2000. Fine-scale comparative mapping of the human 7q11.23 region and the orthologous region on mouse chromosome 5G: the low-copy repeats that flank the Williams-Beuren syndrome deletion arose at breakpoint sites of an evolutionary inversion(s). Genomics. 69, 1-13. Van der Aa, N., et al., 2009. Fourteen new cases contribute to the characterization of the 7q11.23 microduplication syndrome. Eur J Med Genet. 52, 94-100. van der Maarel, S.M., et al., 1996. Cloning and characterization of DXS6673E, a candidate gene for X-linked mental retardation in Xq13.1. Hum Mol Genet. 5, 887-97. van Hagen, J.M., et al., 2007. Contribution of CYLN2 and GTF2IRD1 to neurological and cognitive symptoms in Williams Syndrome. Neurobiol Dis. 26, 112-24. Vandeweyer, G., et al., 2012. The contribution of CLIP2 haploinsufficiency to the clinical manifestations of the Williams-Beuren syndrome. Am J Hum Genet. 90, 1071-8. Verdone, L., et al., 2006. Histone acetylation in gene regulation. Brief Funct Genomic Proteomic. 5, 209-21. Vicario, M., Skaper, S.D., Negro, A., 2014. The small heat shock protein HspB8: role in nervous system physiology and pathology. CNS Neurol Disord Drug Targets. 13, 885-95. Villa, R., et al., 2006. The methyl-CpG binding protein MBD1 is required for PML- RARalpha function. Proc Natl Acad Sci U S A. 103, 1400-5. Vullhorst, D., Buonanno, A., 2003a. Characterisation of general transcription factor 3, a transcription factor involved in slow muscle-specific gene expression. J Biol Chem. 278, 8370-8379. Vullhorst, D., Buonanno, A., 2003b. Characterization of general transcription factor 3, a transcription factor involved in slow muscle-specific gene expression. J Biol Chem. 278, 8370-9. Vullhorst, D., Buonanno, A., 2005. Multiple GTF2I-like repeats of general transcription factor 3 exhibit DNA binding properties. Evidence for a common origin as a sequence-specific DNA interaction module. J Biol Chem. 280, 31722-31. Wagner, E.F., 2002. Functions of AP1 (Fos/Jun) in bone development. Ann Rheum Dis. 61 Suppl 2, ii40-2. 218

Wang, H., et al., 2003. mAM facilitates conversion by ESET of dimethyl to trimethyl lysine 9 of histone H3 to cause transcriptional repression. Mol Cell. 12, 475-87. Warner, L.E., et al., 1998. Mutations in the early growth response 2 (EGR2) gene are associated with hereditary myelinopathies. Nat Genet. 18, 382-4. Warner, L.E., et al., 1999. Functional consequences of mutations in the early growth response 2 gene (EGR2) correlate with severity of human myelinopathies. Hum Mol Genet. 8, 1245-51. Weibrecht, I., et al., 2012. Visualising individual sequence-specific protein-DNA interactions in situ. N Biotechnol. 29, 589-98. Weis, K., et al., 1994. Retinoic acid regulates aberrant nuclear localization of PML- RAR alpha in acute promyelocytic leukemia cells. Cell. 76, 345-56. Wessel, A., et al., 2004. Risk of sudden death in the Williams-Beuren syndrome. Am J Med Genet A. 127A, 234-7. Widagdo, J., 2011. Molecular analysis of GTF2IRD1: a protein implicated in the neurobehavioural features of Williams-Beuren Syndrome. School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW, Australia. Widagdo, J., et al., 2012. SUMOylation of GTF2IRD1 regulates protein partner interactions and ubiquitin-mediated degradation. PLoS One. 7, e49283. Wolf, A., et al., 2010. Plakophilin 1 stimulates translation by promoting eIF4A1 activity. J Cell Biol. 188, 463-71. Wu, Y., Patterson, C., 1999. The human KDR/flk-1 gene contains a functional initiator element that is bound and transactivated by TFII-I. J Biol Chem. 274, 3207-14. Xiao, S., et al., 1998. FGFR1 is fused with a novel zinc-finger gene, ZNF198, in the t(8;13) leukaemia/lymphoma syndrome. Nat Genet. 18, 84-7. Xu, P.X., et al., 1999. Eya1-deficient mice lack ears and kidneys and show abnormal apoptosis of organ primordia. Nat Genet. 23, 113-7. Xu, P.X., et al., 2002. Eya1 is required for the morphogenesis of mammalian thymus, parathyroid and thyroid. Development. 129, 3033-44. Yan, X., et al., 2000. Characterization and gene structure of a novel retinoblastoma- protein-associated protein similar to the transcription regulator TFII-I. Biochem J. 345 Pt 3, 749-57. Yang, W., Desiderio, S., 1997. BAP-135, a target for Bruton's tyrosine kinase in response to B cell receptor engagement. Proc Natl Acad Sci U S A. 94, 604-9. Yoshimura, K., et al., 2014. Retraction for Yoshimura et al., Distinct function of 2 chromatin remodeling complexes that share a common subunit, Williams syndrome transcription factor (WSTF). Proc Natl Acad Sci U S A. 111, 2398. You, A., et al., 2001. CoREST is an integral component of the CoREST- human histone deacetylase complex. Proc Natl Acad Sci U S A. 98, 1454-8. Young, E.J., et al., 2008. Reduced fear and aggression and altered serotonin metabolism in Gtf2ird1-targeted mice. Genes Brain Behav. 7, 224-34. Young, S.T., Porrino, L.J., Iadarola, M.J., 1991. Cocaine induces striatal c-fos- immunoreactive proteins via dopaminergic D1 receptors. Proc Natl Acad Sci U S A. 88, 1291-5. Yu, L., et al., 2005. Genetic and pharmacological inactivation of adenosine A2A receptor reveals an Egr-2-mediated transcriptional regulatory network in the mouse striatum. Physiol Genomics. 23, 89-102. Zhang, Y., et al., 1999. Analysis of the NuRD subunits reveals a histone deacetylase core complex and a connection with DNA methylation. Genes Dev. 13, 1924-35.

219

Zhao, C., et al., 2005. Hippocampal and visuospatial learning defects in mice with a deletion of frizzled 9, a gene in the Williams syndrome deletion interval. Development. 132, 2917-27. Zhong, X.Y., et al., 2009. SR proteins in vertical integration of gene expression from transcription to RNA processing to translation. Mol Cell. 35, 1-10. Ziolkowska, B., et al., 2012. Effects of morphine on immediate-early gene expression in the striatum of C57BL/6J and DBA/2J mice. Pharmacol Rep. 64, 1091-104. Zou, D., et al., 2004. Eya1 and Six1 are essential for early steps of sensory neurogenesis in mammalian cranial placodes. Development. 131, 5561-72.

220