STRUCTURAL AND FUNCTIONAL CHARACTERIZATION OF THE CXXC-TYPE PROTEIN 5 (CXXC5)

A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY

BY

GAMZE AYAZ ŞEN

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN BIOLOGY

NOVEMBER 2018

Approval of the thesis:

STRUCTURAL AND FUNCTIONAL CHARACTERIZATION OF THE CXXC-TYPE ZINC FINGER PROTEIN 5 (CXXC5) submitted by GAMZE AYAZ ŞEN in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biology Department, Middle East Technical University by,

Prof. Dr. Halil Kalıpçılar Dean, Graduate School of Natural and Applied Sciences

Prof. Dr. Orhan Adalı Head of Department, Biological Sciences

Prof. Dr. Mesut Muyan Supervisor, Biological Sciences Dept., METU

Examining Committee Members: Prof. Dr. A. Elif Erson Bensan Biological Sciences Dept., METU

Prof. Dr. Mesut Muyan Biological Sciences Dept., METU

Asst. Prof. Dr. Murat Alper Cevher Molecular Biology and Genetics Dept., Bilkent University

Assoc. Prof. Dr. Nurcan Tunçbağ Informatics Institute, METU

Asst. Prof. Dr. Onur Çizmecioğlu Molecular Biology and Genetics Dept., Bilkent University

Date: 16.11.2018

I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work.

Name, Last name: Gamze Ayaz Şen

Signature:

iv

To All Women Who Pursue Their Dreams In Science

v

ABSTRACT

STRUCTURAL AND FUNCTIONAL CHARACTERIZATION OF THE CXXC-TYPE ZINC FINGER PROTEIN 5 (CXXC5)

Ayaz Şen, Gamze Ph.D., Biology Department Supervisor: Prof. Dr. Mesut Muyan November 2018, 89 pages

Estrogen hormones, particularly 17β-estradiol (E2), are involved in the regulation of physiological and pathophysiological functions of many organs and tissues including breast tissue. The expression of CXXC type zinc finger protein 5 (CXXC5) gene is regulated by E2 through estrogen receptor α. Due to a highly conserved zinc-finger CXXC domain (ZF-CXXC), CXXC5 is considered to be a member of ZF-CXXC family, which binds to non-methylated CpG dinucleotides of transcriptionally active DNA regions. This binding is thought to play critical roles in epigenetic modulation of transcription through the prevention of cytosine methylation and the recruitment of DNA modifying enzymes. The structure and function of CXXC5 and its role in cellular events are yet unclear. However, accumulating evidence is suggesting that CXXC5 is involved in transcriptions as a transcription factor, co-regulator and/or epigenetic factor.

In this PhD thesis, I successfully expressed and purified the full-length CXXC5 protein, with which I showed that CXXC5 is a non-methylated CpG DNA binding protein and that the ZF-CXXC domain is indeed responsible for the ability of the protein to interact with DNA. Since proteins exert their functions in the context of dynamically changing interacting protein network, I envisioned

vi that identification of interacting protein partners of CXXC5 would be a critical step in the elucidation of cellular function of the protein. To address this issue, I performed proximity dependent biotinylation assay (BioID) in a cell line model derived from breast adenocarcinoma. Of the identified proteins by Liquid chromatography-tandem mass spectrometry (LC-MS/MS), I validated that CXXC5 protein interacts with MeCP2 (MethylCpG binding protein 2), MAZ (Myc-associated Zinc Finger Protein) and EMD (Emerin) proteins by co- immunoprecipitation assay. I found that the zinc finger domain of CXXC5 is necessary for protein interaction as well. The findings of this study could provide important insights into the mechanism of CXXC5 actions in E2- mediated cellular events.

Keywords: estrogen signaling, CXXC domain, non-methylated CpG binding protein, BioID, protein interaction

vii

ÖZ

CXXC-TİPİ ÇİNKO PARMAK 5 PROTEİNİN (CXXC5) YAPISAL VE FONKSİYONEL KARAKTERİZASYONU

Ayaz Şen, Gamze Doktora, Biyoloji Bölümü Danışman: Prof. Dr. Mesut Muyan Kasım 2018, 89 sayfa

Östrojen hormonları, özellikle 17β-estradiol (E2), meme dokusu dahil olmak üzere birçok dokunun fizyolojisi ve patofizyolojisinde rol oynamaktadır. CXXC tipi çinko parmak protein 5 (CXXC5)’in ekspresyonunu östrojen reseptör α aracılığıyla E2 ile regüle edilmektedir. CXXC5’in yapısal ve fonksiyonel özelliği ve hücre içindeki rolü henüz bilinmemektedir. Yüksek derecede korunmuş bir çinko parmak-CXXC domaini (ZF-CXXC) nedeniyle, CXXC5 transkripsiyonel olarak aktif olan DNA bölgelerindeki metile olmamış CpG’lere bağlanan ZF- CXXC protein ailesinin bir üyesi kabul edilmektedir. Bu bağlanmanın, sitosin metilasyonunu engelleyerek ve DNA’yı modifiye eden enzimlerin bu bölgede yoğunlaşmasını sağlayarak, epigenetik bir rol oynadığı düşünülmektedir. CXXC5 proteinin yapısal ve fonksiyonel özelliği ve hücre içerisindeki rolü henüz bilinmemektedir. Ancak son yıllarda CXXC5 üzerine yapılan çalışmaların sayısı artmaktadır. Tüm bu çalışmalar, çeşitli sinyal yolakları tarafından düzenlenen hücre çoğalması, farklılaşması, metabolizması ve/veya ölümünde CXXC5’in epigenetik faktör ve/veya kromatin düzenleyici komplekslerin bir protein partneri olarak transkripsiyonlarda rol oynadığı öne sürülmüştür.

viii Bu tez çalışmasında, tam uzunluktaki (full-length) CXXC5 proteini rekombinant olarak üretildi ve saflaştırıldı. Saflaştırılmış CXXC5 proteininin metile olmamış DNA’ya bağlanan bir protein olduğu gösterildi. Etkileşim gösteren proteinleri belirleyebilmek için, insan meme kanser hücrelerinden türemiş bir hücre modelinde olası protein partnerleri, yakınlığa-bağlı biyotin tanımlaması ve mass spektrometri analizi (BioID-MS) ile tarandı. Eş-immünçökeltme yöntemi, in situ protein etkileşimlerini doğrulamak için kullanıldı. Sıvı kromatografisi-ardışık kütle spektrometresi (LC-MS/MS) analizi sonucu elde edilen proteinler arasından MeCP2, MAZ ve EMD proteinlerin CXXC5 ile etkileşim gösterdiği doğrulandı. Çinko parmak CXXC domainin bu bağlanma için gerekli olduğu gösterildi. Bu sonuçlar, CXXC5’in E2 ile düzenlenen hücre çoğalma mekanizmalarındaki rolünün anlaşılmasına katkılar sağlayabilir.

Anahtar Kelimeler: östrojen sinyali, CXXC domain, metile olmamış CpG DNA bağlanan protein, BioID, protein etkileşimleri

ix

ACKNOWLEDGEMENTS

I must first and foremost thank to my advisor Mesut Muyan who has always supported and encouraged me during my PhD studies. He provided me with the scientific support and guidance: I learned how to think critically and do my scientific research independently.

I would like to thank my thesis committee members; Prof. Dr. A. Elif Erson Bensan, Asst. Prof. Murat Cevher, Assoc. Prof. Nurcan Tunçbağ and Asst. Prof. Onur Çizmecioğlu for their time and efforts to share their critical comments on my PhD thesis.

My special thanks are extended to my labmates. I want to express my very great appreciation to Pelin Yaşar, Sırma Damla User, Gizem Güpür, Çağla Ece Olgun, Burcu Karakaya, Gizem Kars, Negin Razizadeh, Gizem Turan and Kerim Yavuz individually for their friendship and kind support all the past years. In addition, I want to thank all the members of Erson-Bensan Lab and Banerjee Lab for their support and help. I also would like to thank Prof. Dr. Rengül Atalay and her lab members for their help in using fluorescent microscope.

I would like to express my very great appreciation to Dr. Jinrong Min and his lab members to host me for a year in University of Toronto and doing a fruitful collaboration for my PhD study.

I do not know how to express my gratitude to my beloved husband Bilgecan, my parents Güler and Turan and my lovely sister Tuğba. Without them, none of the work I have done and will have any real meaning.

I gratefully thank TUBITAK whose generous support throughout my PhD education. I was supported by TUBITAK 2211-A and TUBITAK 2214-A scholarships and TUBITAK-KBAG 315S045 grant. This PhD work was supported by TUBITAK-KBAG 114Z243 grant.

x

TABLE OF CONTENTS

ABSTRACT ...... vi

ÖZ ...... viii

ACKNOWLEDGEMENTS ...... x

TABLE OF CONTENTS ...... xi

LIST OF FIGURES ...... xiii

LIST OF TABLES ...... xvi

CHAPTERS 1. INTRODUCTION ...... 1 1.1. CXXC5 as an E2-ERα responsive gene in breast cancer ...... 1 1.2. CXXC type Zinc Finger Protein 5 (CXXC5) ...... 2 1.3. The Zinc Finger CXXC Protein Family ...... 5 1.4. CpG Islands and ZF-CXXC Proteins ...... 7 1.5. The aim of study ...... 8 2. MATERIALS AND METHODS ...... 11 PART I: TESTING THE DNA BINDING ABILITY OF THE FULL- LENGTH CXXC5 PROTEIN ...... 11 2.1. Cloning of Sequences Encoding the CXXC5 Full Length Protein and the Zinc Finger CXXC Domain into pET28a-MHL, pET28GST-LIC and pET28-MKH8SUMO Vectors for Bacterial Expression ...... 11 2.2. Recombinant Protein Expression Using The LEX Expression System ...... 12 2.3. Purification of the Recombinant Full-Length CXXC5 Protein and the Zinc Finger CXXC Domain ...... 12 2.4.1. Immobilized Metal Affinity Chromatography (IMAC) with the Use of His-Select Nickel Resins ...... 13 xi 2.4.2. Ion Exchange Chromatography ...... 13 2.4.3. Gel Filtration (Size Exclusion) Chromatography ...... 13 2.5. Isothermal Titration Calorimetry (ITC) ...... 14 2.6. Electrophoretic Mobility Shift Assay (EMSA) ...... 15 PART II: STRUCTURAL CHARACTERIZATION OF THE FULL-LENGTH CXXC5 PROTEIN ...... 15 2.7. Protein Crystals Screening ...... 15 2.8. X-ray Crystallography ...... 16 PART III: IDENTIFICATION OF PROTEIN INTERACTION PARTNERS OF CXXC5 ...... 16 2.9. Cell Culture and Transfection ...... 16 2.10. Proximity Dependent Biotinylation (BioID) Assay ...... 16 2.11. Mass Spectrometry and Data Acquisition ...... 18 2.12. Co-immunoprecipitation (Co-IP) ...... 18 2.12.1. Western Blot ...... 19 2.12.2. Immunocytochemistry (ICC) ...... 20 2.13. Identification of the Protein Interaction Region(s) of CXXC5 ...... 21 3. RESULTS AND DISCUSSION ...... 23 PART I: TESTING THE DNA BINDING ABILITY OF THE FULL- LENGTH RECOMBINANT CXXC5 PROTEIN ...... 23 3.1. Choosing an Optimal Recombinant Protein Expression System ...... 23 3.2. Purification of the Recombinant Full-Length CXXC5 and Its CXXC Domain ...... 23 3.3. Testing the Binding Ability of the Full-Length CXXC5 and CXXC Domain to CpG Containing DNA ...... 26 PART II: STRUCTURAL CHARACTERIZATION OF THE FULL-LENGTH CXXC5 PROTEIN ...... 30 3.4. Crystallization of the Full-length CXXC5 Protein ...... 30 PART III: IDENTIFICATION OF PROTEIN INTERACTION PARTNERS OF CXXC5 ...... 34 3.5. Synthesis of the CXXC5-BirA* Fusion Protein in MCF7 Cells ...... 35

xii 3.6. CXXC5 Associated Proteins in MCF7 Cell Line ...... 37

3.7. Intracellular Localization of CXXC5 Protein Partners ...... 40 3.8. Verification of CXXC5 protein partners by co-immunoprecipitation assay ...... 41 3.8.1. Protein Interaction between MeCP2 and CXXC5 ...... 42 3.8.2. Protein Interaction between MAZ and CXXC5 ...... 44 3.8.3. Protein Interaction between EMD and CXXC5 ...... 45 3.9. Identification of protein interaction region(s) of CXXC5 and MeCP2 ...... 48 4. CONCLUSION AND FUTURE DIRECTIONS ...... 53

REFERENCES ...... 57

APPENDICES ...... 69 A. VECTOR MAPS ...... 69 B. PRIMER SEQUENCES ...... 75 C. SECONDARY STRUCTURE PREDICTION OF CXXC5 ...... 77 D. LIST OF ALL PROTEINS IDENTIFIED BY LC/MS-MS ANALSIS ...... 79 E. IMMUNOPRECIPITATION OF OVEREXPRESSED CXXC5 BY USING VARIOUS CXXC5 ANTIBODIES ...... 83 F. COMPARISON OF TRANSFECTION EFFICIENCY IN MCF7 AND HEK293 CELLS ...... 85

CURRICULUM VITAE ...... 87

xiii

LIST OF FIGURES

FIGURES

Figure 1: Schematic representation of the CXXC5 protein and the CXXC domain...... 2

Figure 2: Sequence alignment of CXXC5 protein orthologues...... 5

Figure 3: Sequence alignment of CXXC domains of ZF-CXXC family members. .... 6

Figure 4: Purification of recombinant proteins...... 25

Figure 5: Verification of the recombinant full-length CXXC5 protein...... 26

Figure 6: Isothermal titration calorimetry ...... 27

Figure 7: Comparison of different surrounding DNA sequences in the binding of the CXXC5 protein to CpG dinucleotides ...... 28

Figure 8: Electrophoretic mobility shift assay (EMSA) ...... 29

Figure 9: Hemimethylated CpG DNA...... 29

Figure 10: Protein crystallization ...... 31

Figure 12: Prediction of intrinsically unstructured regions of CXXC5 protein by IUPred web server...... 33

Figure 13: Prediction of protein binding regions of CXXC5 protein...... 33

Figure 14: BioID...... 34

Figure 15: BirA* biotinylates endogenous proteins in MCF7 cells...... 36

Figure 16: Summary of identified proteins with LC-MS/MS ...... 38

xiv Figure 17: Immunocytochemistry of localization of the CXXC5 protein and its partners...... 41

Figure 18: Interaction of MeCP2 and CXXC5...... 43

Figure 19: Interaction of MAZ and CXXC5 ...... 45

Figure 20: Interaction of EMD and CXXC5 ...... 47

Figure 21: Schematic representation of CXXC5 truncation fragments...... 48

Figure 22: Identification of interaction region(s) of CXXC5 with MeCP2 and MAZ ...... 49

Figure 23: Identification of the interaction domain of MeCP2 with CXXC5...... 50

xv

LIST OF TABLES

TABLES

Table 1: CXXC5 primers for in-fusion PCR cloning ...... 75

Table 2: CXXC5 cloning primers for BioID and pcDNA3.1(-) vectors ...... 75

Table 3: Tag sequences (DNA and aminoacid) ...... 75

Table 4: Primers for generation CXXC5 truncations ...... 76

Table 5: Putative protein partners of CXXC5 from LC-MS/MS analysis……....40

xvi

CHAPTER 1

INTRODUCTION

1.1. CXXC5 as an E2-ERα responsive gene in breast cancer

Estrogen hormones, particularly 17β-estradiol (E2), are involved in the regulation of physiological and pathophysiological functions of many organs and tissues including breast tissue [1]. E2 has also a critical role in the initiation and development of breast cancer. The effects of E2 in breast tissue are mediated mainly by estrogen receptor α (ERα) as a ligand-dependent transcription factor. Upon binding to E2, ERα regulates the expression of target genes through genomic signaling pathways. The binding of E2-ERα to specific DNA sequences called estrogen responsive elements (EREs) and subsequent regulation of estrogen responsive gene constitutes the ERE-dependent genomic signaling pathway. The modulation of gene expressions through interactions of the E2-ERα complex with other transcription factors already bound to their cognate responsive elements referred to as the ERE-independent genomic signaling [1]– [3].

Because of the critical role of the E2-ERα signaling in the physiology and pathophysiology of breast tissue, identifying key genes regulated by estrogens involved in cellular proliferation is critical for the development of new prognostic and/or therapeutic targets for breast cancer. Our previous microarray studies suggested that the expression of CXXC5 is modulated by E2-ERα [3]. In subsequent studies, we found that CXXC5 is indeed a bona fide E2-ERα

1 responsive gene and its expression is regulated through a direct interaction with an ERE sequence in the CXXC5 locus [4].

1.2. CXXC type Zinc Finger Protein 5 (CXXC5)

CXXC5 was identified for the first time in hematopoietic stem cells in 2000 [5]. The CXXC5 gene localizes to the 5q31.2 chromosomal region that is mostly deleted in acute myeloid leukemia (AML) and myelodysplasia syndrome (MDS) [6], [7]. CXXC5 encodes a 322 amino-acid long protein with a molecular mass of 33 kDa. CXXC5 contains a nuclear localization signal (NLS) juxtaposed to a highly conserved and carboxyl-terminally located CXXC type zinc finger domain, ZF-CXXC (Figure 1). Due to the presence of the CXXC domain, CXXC5 is considered to be a member of the zinc finger CXXC protein family that binds to CpG dinucleotides of CpG islands (CGIs) of transcriptionally active DNA regions. This binding prevents cytosine methylation through the recruitment of chromatin-modifying proteins and establishes an active transcription state [8].

Figure 1: Schematic representation of the CXXC5 protein and the CXXC domain. NLS, nuclear localization signal (The figure was generated with the use of IBS, http://ibs.biocuckoo.org/index.php)

Because of the NLS and the CXXC domain, CXXC5 could be involved in DNA interactions, and acting as a DNA-binding factor [6], [8], [9]. Studies have showed that CXXC5 participates in a wide variety of biological functions including signal transduction, , differentiation, angiogenesis and cellular energy metabolism.

2

Apart from our result that CXXC5 is an estrogen responsive gene [4], it was reported that CXXC5 is a novel retinoic acid responsible gene, whose protein product induces differentiation of leukemic cells [6]. Moreover, previous studies also suggested that CXXC5 is a critical target of bone morphogenetic protein 4 (BPM4) which is a regulator of vascular development and homeostasis through transcriptionally targeting Flk-1, vascular endothelial growth factor receptor [10], [11]. Acting as a negative regulator in WNT/β catenin signaling by directly binding to Dishevelled (DVL) protein via its C- terminal CXXC domain, CXXC5 is suggested to be a candidate tumor suppressor [12]. It is also suggested that CXXC5 negatively regulates osteoblast differentiation and bone formation through interaction with DVL [13]. CXXC5 also regulates differentiation of skeletal muscles by augmenting the expression of genes involved in the differentiation process [14]. Moreover, CXXC5 is reported to interact with vitamin D receptor (VDR) and regulates 1,25-dihydroxyvitamin D3 signaling [15].

CXXC5 is also suggested to be a critical component in the DNA damage induced p53 activation, apoptosis and growth arrest [16]. It appears that the interaction of CXXC5 with a protein kinase called ATM (Ataxia telangiectasia mutated), which is activated by DNA double strand breaks, leads to an enhanced phosphorylation of p53. Thus, CXXC5 may play a critical role in p53 induced cellular death upon DNA damage signaling [16]. CXXC5 was found to be also associated with Smad proteins, facilitating the phosphorylation of Smad3 and nuclear translocation of Smad4, to mediate TNFα (Tumor Necrosis Factor α) induced apoptosis [17]. Similarly, CXXC5 was shown to suppress cell cycle arrest and induces apoptosis mediated by TGFβ (Transforming Growth Factor β) signaling in a hepatocellular carcinoma derived cell line model [18].

These studies collectively indicate that CXXC5 is involved in the modulation of cellular proliferation, differentiation and apoptosis as a transcription factor and/or co-regulator. Moreover, CXXC5 was reported to be involved in regulating

3 cellular energy metabolism as well. Cytochrome C Oxidase (COX) is the enzyme critical for electron transfer from cytochrome c to oxygen in the electron transport chain. COX is composed of multiple subunits encoded by both the nuclear and the mitochondrial DNA. One of the nuclearly encoded subunits is COX4I2 (subunit 4 isoform 2), which is essential for regulation of intracellular oxygen levels under varying environmental oxygen concentrations. CXXC5 was shown to be an important transcription factor that involved in adapting changing oxygen conditions. Under high oxygen conditions, CXXC5 appears to function as a repressor by binding to the promoter of COX4I2; whereas, under hypoxia conditions, transcript levels of COX4I2 are increased as CXXC5 dissociates from the promoter region [19]. In addition, an altered CXXC5 expression was observed to be correlated with adverse prognosis and resistance to therapies of several human cancers including Acute Myeloid Leukemia (AML), hepatocellular carcinoma, breast cancer and prostate cancer [20]–[24].

According to all these studies, CXXC5 is a multifunctional protein and highly conserved protein among orthologous based on the homology modeling of sequences. This strongly suggests that CXXC5 has a basic and an essential cellular function (Figure 2).

4

Figure 2: Sequence alignment of CXXC5 protein orthologues. The amino acid sequences are highly conserved among species. The alignment was generated with Espript (http://espript.ibcp.fr/ESPript/ESPript). All the sequences were obtained from NCBI (H. sapiens (Human): NP_001304140.1; M. musculus (Mouse): NP_598448.1; Rattus norvegicus (Rat): NP_001007629.1; Pan troglodytes (Chimpanzee): JAA39144.1; Gorilla gorilla (Gorilla): XP_004042663.1; Pongo abelii (Orangutan): NP_001126301.1; Macaca mulatta (Macaque): NP_001253535.1; Canis lupus familiaris (Dog): XP_005617324.1; Oryctolagus cuniculus (Rabbit): XP_008253335.2

1.3. The Zinc Finger CXXC Protein Family

The CXXC domain, which is a zinc finger domain, selectively binds to unmodified CpG DNA. The CXXC-type zinc finger domain (ZF-CXXC) is characterized by the presence of a motif with eight conserved residues

(CGxCxxC-n5-CGxCxxC repeats) that bind to two zinc ions [25], [26]. There are twelve CXXC containing proteins in this family, which are CFP1 (CXXC finger protein 1), MLL1 and MLL2 (Mixed-lineage leukemia 1 & 2), KDM2A and KDM2B (Lysine (K)-specific Demethylase 2A & 2B), FBXL19 (F-box and rich repeat protein 19), MBD1 (Methyl-CpG-binding dpmain protein 1),

5

DNMT1 (DNA (cytosine-5)-Methyltransferase 1), CXXC4 (CXXC-type zinc finger protein 4), CXXC5 (CXXC-type zinc finger protein 5), TET1 and TET3 (Tet methylcytosine Dioxygenase 1 & 3). These proteins, predominantly localized in the nucleus, are DNA binding proteins with chromatin modifying properties [27]–[31]. Despite the sequence variations among proteins of this family, their ZF-CXXC domain structures are highly conserved [6] (Figure 3).

Figure 3: Sequence alignment of CXXC domains of ZF-CXXC family members. The alignment was generated with Espript (http://espript.ibcp.fr/ESPript/ESPript). All the sequences were obtained from NCBI (CXXC5: NP_057547.5; CXXC4: NP_079488.2; TET1: NP_085128.2; TET3: NP_001274420.1; CFP1: NP_055408.2; MLL1: NP_005924 MLL2: NP_055542.1; KDM2A: NP_036440.1; KDM2B: NP_115979.3; FBXL19: NP_001093254.2; MBD1_3rd CXXC: DNMT1: NP_001124295.1)

The ZF-CXXC domain containing proteins were divided into four groups based on the involvement in chromatin modification through interacting partners: 1) histone H3K4 methylation (CFP1, MLL1 and MLL2), 2) histone H3K36 demethylation (KDM2A and KDM2B), 3) DNA methylation (DNMT1) and 4) DNA hydroxymethylation (TET1, TET3, CXXC4 and CXXC5) [8]. Recent studies re-grouped these proteins based on their DNA-binding specificities: 1) CpGpG binding (Class I), 2) CpG binding (Class II), 3) CpN (N; any nucleotide) binding (Class III) and 4) no or weak CpG binding (Class IV) [32]. CFP1 protein, the only member of Class I, recognizes CpGpG motif through its CXXC domain. MLL1, MLL2, FBXL19, KDM2A, KDM2B and the 3rd CXXC domain of MBD1 are members of Class II specifically recognize non-methylated CpG-containing DNA and the methylation of central CpG diminishes this binding. The CXXC

6 domains of TET1, TET3, CXXC4 and CXXC5, constituting Class III proteins, bind to cytosine containing DNA with a preference for CpG over CpH (H any base rather than G) [32]. Although there are conflicting results for the DNA- binding ability of the DNMT1-CXXC domain whether it is a non-methylated or methylated DNA binding protein, Xu et al. reported that the CXXC domain of DNMTs as Class IV displays a weak or no-binding to DNA [32], [33].

1.4. CpG Islands and ZF-CXXC Proteins

Epigenetic modifications are defined as inheritable changes in and activity only by the addition of a chemical group to DNA or histone component of nucleosomes without any alterations in DNA sequences [34]. DNA methylation as a result of the covalent transfer of a methyl group to the C-5 position of the cytosine ring of DNA is involved in numerous processes including X- inactivation, imprinting or suppression of foreign DNA activity such as retroviral elements [35]. These modifications affect gene expression profiles in different tissues, developmental stages and in diverse diseases including cancer [34], [36]. DNA methylation and histone modification are the major epigenetic regulations of gene transcription in mammalian cells. These two mechanisms are associated and can affect each other [36].

In vertebrates, DNA methylation is the main mechanism of gene silencing and it mostly occurs in CpG dinucleotide, in the context of sequence 5’-CG-3’, of the genome. DNA methyltransferase enzymes (DNMTs) catalyze this covalent modification. Methylation of cytosine residues results in the recruitment of methyl-CpG-binding proteins (MBPs) that have transcriptionally repressive enzymatic activities [37]. Even most of CpGs in mammalian genomic DNA are methylated [38], about 70% of human gene promoters are associated with non- methylated DNA sequences called CpG islands (CGIs) [39].

CpG islands that are characterized by a high density of CpG dinucleotides and are often found in the promoter regions of genes that are mostly hypomethylated.

7

CXXC-domain containing proteins specifically bind to non-methylated cytosines in CpG island promoters. This binding prevents cytosine methylation and establishes a transcriptionally active chromatin state by interacting with histone modifying proteins directly or through other regulatory proteins [40]. CpG binding proteins act as transcriptional regulators and demonstrate a unique DNA binding specificity for non-methylated CpG motifs [9]. Although studies on functional features of CXXC5 are scarce, CXXC5 appears to participate in cellular events as a transcription factor, epigenetic regulator and/or protein- partner of various signaling pathways. Our studies indicate that CXXC5 is also an estrogen responsive gene in cells derived from breast carcinomas. We therefore predict that CXXC5 as a non-mehylated CpG binding protein plays a fundamental role in estrogen-mediated cellular responses as well.

1.5. The aim of study

The identification of E2-ERα target genes, the elucidation of mechanisms responsible for gene expression and understanding their protein functions has critical importance in defining E2-ERα roles in target tissue physiology and pathology. This could in turn contribute to the development of novel approaches for the prevention/treatment of target tissue malignancies, including breast cancer.

Our laboratory studies encompass approaches directed at testing the prediction that CXXC5 synthesized as the primary response gene product participates in the regulation of E2-mediated cellular proliferation and death. The aim of this study is to clarify the structural features and protein interaction partners of CXXC5 as a full-length protein. We aim to confirm the binding ability of the full-length CXXC5 protein to CpG containing DNA to which the CXXC domain of CXXC5 was recently shown to bind [32]. Furthermore, we want to determine the high- resolution crystal structure of the full-length CXXC5 protein complexed with CpG containing DNA sequences.

8

In addition, we also aim to elucidate its interaction partners in order to understand the cellular function of CXXC5 protein in breast cancer cell models. Since functional features of proteins are a reflection of their structural properties, identifying interacting protein partners could lead us to an understanding of the functional role of CXXC5 in cellular context. Elucidating structural details of CXXC5 alone or in complex with interacting partners, including DNA, would be crucial for the development of a CXXC5 model as a basis for the elucidation of molecular mechanisms underlying its physiological functions. Therefore, we aim to screen potential partners of CXXC5 in cells by the use of a recently developed method, proximity dependent-biotinylation assay (BioID). We hope that a successful completion of the study would provide important insights into the role of CXXC5 in E2-mediated cellular events. This could in turn aid the development of novel/alternative treatment modalities to combat breast cancer.

9

10

CHAPTER 2

MATERIALS AND METHODS

PART I: TESTING THE DNA BINDING ABILITY OF THE FULL- LENGTH CXXC5 PROTEIN

2.1. Cloning of Sequences Encoding the CXXC5 Full Length Protein and the Zinc Finger CXXC Domain into pET28a-MHL, pET28GST-LIC and pET28-MKH8SUMO Vectors for Bacterial Expression

In-fusion cloning method is a directional cloning of one or more fragments of DNA into any vector (Clontech Kit User Manual). The cDNA encoding the full- length (FL) CXXC5 gene sequence (1-969 bp) and the CXXC domain (748-927 bp) generated by PCR were inserted into the pET28a-MHL, pET28GST-LIC and pET28-MKH8SUMO vectors linearized by BseRI and BsaI restriction enzymes. Details of each vector are presented in Appendix A. Vectors are designed to use 6xHis, GST and SUMO tags in tandem at the 5’ end of the inserted target cDNA that result in an amino-terminally 6xHis, GST and SUMO tagged target protein. Vectors also contain kanamycin resistance gene for antibiotic selection. The presence of chloramphenicol resistance gene in vectors allows counter selection to be carried out during protein expression. Primers for in-fusion PCR cloning were designed according to the manufacturer’s instructions. In essence, the PCR- generated CXXC5 cDNA with oligomers that contain overlapping sequences to each different vector linearized with restriction enzymes was ligated to the 11 vector. The primers used for CXXC5 cloning are shown in Table 1 (Appendix B). The reaction mixture was incubated for 15 min at 37°C, followed by 15 min at 50°C, and then placed on ice. At the end of the reaction, bacterial transformation procedure was performed. All plasmids were verified by sequencing.

2.2. Recombinant Protein Expression Using The LEX Expression System

The LEX system is designed and manufactured by Structural Genomic Consortium at the University of Toronto to produce high amount of recombinant proteins by generating efficient air circulation in the culture (http://sgc.utoronto.ca/SGC-WebPages/Technology/lex.php). The amino- terminally 6xHis, GST and SUMO tagged full-length CXXC5 protein (322 amino acids) and the 6xHis tagged CXXC domain (59 amino acids; [250-309]) were expressed in E.coli BL21(DE3)V2R-pRARE2 cells. A 250 ml flask containing LB medium supplemented with 50 µg/ml kanamycin and 34 µg/ml chloramphenicol was inoculated with single colony of the transformed bacteria. The flask was grown for overnight at 37°C with shaking at 200 rpm. 50 ml overnight starter culture was then inoculated into a 1600 ml of Terrific Broth (TB, Sigma T0918) supplemented with 0.8% glycerol, 50 µg/ml kanamycin, 17

µg/ml chloramphenicol, 50 µM ZnCl2 and 800 µl antifoam-204 (Sigma A-8311) in a 2 L flask and grown at 37°C with shaking at 200 rpm. When OD600 reached to 1.6, the temperature was reduced to 16°C and 250 µM IPTG was then added into the culture to induce protein expression. Cultures were aerated overnight at 16°C with shaking at 200 rpm and cell pellets were collected by centrifugation at 7000 rpm for 15 minutes and frozen immediately in liquid nitrogen for storage at -80°C.

2.3. Purification of the Recombinant Full-Length CXXC5 Protein and the Zinc Finger CXXC Domain

Frozen cell pellet were thawed and re-suspended in 200 ml lysis buffer (50 mM Tris pH 8.0, 500 mM NaCl, 1 mM PMSF, 3 mM beta-mercaptoethanol, 50 µM

ZnCl2 and Benzonase) and homogenized using an Ultra-Turrax T8 homogenizer 12 at maximal setting for 60 seconds per pellet. Then cells were sonicated on ice for 5 seconds with pulses of 8.5 (half-maximal frequency) with 7 seconds rest in between pulses. The total sonication time was 10 minutes. Cells were then centrifuged at 15000 rpm for 1 hour at 4°C.

2.4.1. Immobilized Metal Affinity Chromatography (IMAC) with the Use of His-Select Nickel Resins

The cleared lysate was subsequently mixed with 10 ml of equilibrated His-Select Resin (Sigma) and incubated for 30 minutes at 4°C with continuous mixing. The mixture was then loaded onto an empty column and washed with 300 ml wash buffer (50 mM Tris pH 8.0, 2 M NaCl, 13 mM imidazole pH 8.0, 3 mM beta- mercaptoethanol, 50 µM ZnCl2). The protein was eluted from the resin by 20 ml of elution buffer (50 mM Tris pH 8.0, 300 mM NaCl, 250 mM imidazole pH 8.0 and 1 mM TCEP). Concentration of eluted proteins was estimated by nanodrop.

2.4.2. Ion Exchange Chromatography

Concentrated protein samples were further purified with a HiTrapTM SP FF (GE Healthcare) using gradient salt concentrations (Buffer A: 50 mM Tris pH 8.0, 1 mM TCEP and Buffer B: 50 mM Tris pH 8.0, 1 M NaCl, 1 mM TCEP). Fractions that corresponding to protein peaks were loaded on the SDS-PAGE gel (4-20% gradient) and the gel was stained by InstantBlue coomassie dye (Expedeon) for visualization. Protein fractions were collected and concentrated to 2 ml for gel filtration chromatography. The same procedure was repeated with HiTrapTM Heparin HP (GE Healthcare) column, which is designed for the purification of DNA binding proteins.

2.4.3. Gel Filtration (Size Exclusion) Chromatography

SuperdexTM 200 (10/300) gel filtration column (GE Healthcare) was used to further purify protein mixtures according to their sizes in 20 mM Tris pH 7.5, 150 mM NaCl and 1 mM TCEP buffer. Protein fractions were loaded on the SDS-

13

PAGE gel (4-20% gradient) and the gel was stained by InstantBlue coomassie dye (Expedeon) for visualization. The fractions that contained the putative FL- CXXC5 protein or the ZF-CXXC domain were collected and concentrated to 2 ml for Isothermal Titration Calorimetry (ITC). The concentration of protein was estimated by nanodrop (Thermo Fisher Inc. USA) using extinction coefficient value at 280 nm.

2.5. Isothermal Titration Calorimetry (ITC)

ITC is a physical technique that quantitatively measures the binding affinity, enthalpy changes and binding stoichiometry of interacting two or more molecules, such as protein-DNA and protein-protein interactions in a solution. ITC works by titrating of one reactant into another under isothermal condition. Signals are measured after a series of injections that release or absorb heat based upon the interaction (binding) of these two reactants [41]. When the binding of two molecules reaches saturation, the heat change approaches to zero. The DNA binding ability of the protein mixture containing the full-length CXXC5 following gel filtration chromatography was tested by ITC experiment. All oligonucleotides as single stranded DNA to be used in ITC was purchased from Integrated DNA Technologies (IDT). Each pairs of single-strand DNA were mixed in a 1:1 molar ratio, and then heated to 95°C and cooled down to room temperature to form double-stranded DNA. Isothermal titration calorimetry measurements were recorded at 25°C using a VP-ITC MicroCalorimeter (MicroCal Inc.). 10 µl of 1 mM palindromic non-methylated CpG DNA (5’- GAGAGACCGGTCTCTC-3’), methylated CpG DNA (5’- GAGAGACmCGGTCTCTC-3’) or DNA without CpG sequence as control (5’- GAGAGACATGTCTCTC-3’) and 1 mM non-palindromic non-methylated CpG DNA (5’-GTGATACCGGATCAGT-3’), methylated CpG DNA (5’- GTGATACmCGGATCAGT-3’) or DNA without CpG sequence as control (5’- GTGATACATGATCAGT-3) was injected into the solution containing 20 mM Tris pH 7.5, 150 mM NaCl and 1 mM TCEP and 25 µM full-length CXXC5

14 protein. DNA was dissolved in the same buffer used for concentrating purified proteins. A total of 25 injections into ITC apparatus were performed with 180 seconds intervals using the reference power of 15 µcal/sec. Binding isotherms were plotted and analyzed using Origin Software (MicroCal Inc.). ITC measurements were fit to a one-site binding model.

2.6. Electrophoretic Mobility Shift Assay (EMSA)

EMSA is used to study protein-DNA complex interactions. In EMSA, protein- DNA complexes migrate more slowly than the unbound linear DNA on a non- denaturing gel, causing a "shift”. 5% TBE ready gels (Bio-Rad Laboratories, Inc., USA) and 0.5X TBE running buffer from 10X stock solution were used. The concentration of each double-stranded DNA was 50 µM and mixed with protein up to 1:4 molar ratio. SybrGold dye was used to stain the gel for visualization.

PART II: STRUCTURAL CHARACTERIZATION OF THE FULL- LENGTH CXXC5 PROTEIN

2.7. Protein Crystals Screening

Determination of protein structure by X-ray crystallography relies on obtaining singly formed protein crystal suitable for diffraction data collection. For this reason, conditions for protein crystallization have to be determined empirically for each protein [42]. For DNA-protein co-crystallization, the purified FL- CXXC5 protein was mixed with DNA fragment containing a non-methylated CpG dinucleotide sequence (5’GTGATACCGGATCAGT-3’) at a molar ratio of 1:1 and then transferred in 96-well deep well plates using a liquid handling robot (Crystal Phoneix, from Art Robbins Instruments) in different screening buffers. We used Hampton Research Natrix screening kit, SGC-1, RW, Index and PEG/Ion screening buffers by “sitting drop” method at 18°C.

15

2.8. X-ray Crystallography

After the formation of crystals of CXXC5 with non-methylated CpG DNA, crystals were soaked in a cryoprotectant solution containing 20% glycerol before flash-frozen. Crystals were analyzed by X-ray crystallography in the Structural Genomic Consortium (SGC), University of Toronto to understand three- dimensional structure of the full-length CXXC5 protein.

PART III: IDENTIFICATION OF PROTEIN INTERACTION PARTNERS OF CXXC5

2.9. Cell Culture and Transfection

MCF7 and HEK293 cell lines were kind gift of Dr. Rengül Çetin Atalay (Middle East Technical University, Ankara, Turkey). Cells were cultured in phenol red free, high glucose (4.5 g/L) containing Dulbecco’s modified Eagle’s medium (DMEM, Lonza, Belgium, BE12-917F) supplemented with 10% fetal bovine serum (FBS, Lonza), 1% L-Glutamine (Lonza, Belgium, BE17-605E) and 1% Penicillin/Streptomycine (Lonza, Belgium). Both MCF7 and HEK293 cells were transiently transfected with Turbofect transfection reagent (Thermo Fisher Scientific). Transfected cells were maintained for 48 hours, if not otherwise specified, for protein expression.

2.10. Proximity Dependent Biotinylation (BioID) Assay

To identify protein interaction partners of CXXC5, we screened potential interactors in MCF7 cell line derived from a breast adenocarcinoma by proximity dependent biotinylation assay (BioID), which uses a promiscuous biotin ligase to detect direct or indirect protein-protein associations in living cells. Biotinylated proteins are then captured by streptavidin affinity and identified by liquid chromatography-tandem mass spectrometry (LC-MS/MS). To carry out BioID 16 assay, we cloned the CXXC5 cDNA into the pcDNA3.1-MCS-BirA* (R118G)- HA vector obtained from Addgene (36047) by XhoI and EcoRI restriction enzymes. Primers used for overlapping PCR are shown in Table 2 (Appendix B). MCF7 cells (2.5 x106), grown in 10 cm2 dishes (total of 10 dishes) for 48 hours, were transfected with vectors bearing none or the CXXC5 cDNA. 24 hours after transfections, 50 µM Biotin (Sigma B4639, Germany) and 1mM ATP (Adenosine 5’-triphosphate disodium salt hydrate, Sigma A2283, Germany) were added to the medium. 18 hours later, cells were collected and washed with cold PBS and then lysed at room temperature in lysis buffer (50 mM Tris, pH 7.4, 500 mM NaCl, 0.4% SDS, 5 mM EDTA, 2% TritonX, 1 mM DTT, 1x protease inhibitor). Then cell lysates were sonicated for 5 seconds and 10 seconds rest in between pulses (active sonication for 7.5 min) and centrifuged at 14000 rpm for 15 minutes at 4°C. Supernatants were incubated with 400 µl Streptavidin magnetic beads (NEB S1420S) overnight. Beads were collected and washed twice with

Wash Buffer I (2% SDS in dH2O) for 10 minutes. Then, beads were washed once with Wash Buffer II (2% deoxycholate, 1% TritonX, 50 mM NaCl, 50 mM Hepes pH 7.5, 1 mM EDTA) for 10 minutes, once with Wash Buffer III (0.5% NP-40, 0.5% deoxycholate, 1% TritonX, 500 mM NaCl, 1 mM EDTA, 10 mM Tris pH 8.0) for 10 minutes, and once with Wash Buffer IV (50 mM Tris, pH 7.4, 50 mM NaCl) for 30 minutes. All washing steps were carried out at room temperature. 10% of the eluted proteins from the streptavidin beads with 50 µl of Laemmli-DTT sample buffer containing 500 nM Biotin was boiled at 95°C for 10 minutes for WB analysis and the remaining (90%) of was sent to mass spectrometry analysis at the Koç University, Istanbul, Turkey. Streptavidin captured biotinylated proteins were loaded to 10% SDS-PAGE for protein separation followed by WB using an anti-biotin (Abcam, ab533494) antibody to detect biotinylated proteins and a CXXC5 antibody (Abcam, ab106533) to detect the CXXC5 protein.

17

2.11. Mass Spectrometry and Data Acquisition

Samples for LC-MS/MS were processed at the Koç University Cell Proteomics Laboratory (Istanbul, Turkey). Proteins bound to beads were digested with trypsin (Thermo Pierce MS Grade Trypsin Protease). Peptide purification and concentration were performed with C18 StageTip (Thermo-Fisher). Peptide analysis was performed by Orbitrap LC-MS/MS mass spectrometer (Thermo- Fisher). Proteome Discoverer 1.4 software (Thermo-Fisher) was used to identify proteins.

2.12. Co-immunoprecipitation (Co-IP)

BioID coupled LC-MS/MS results revealed that CXXC5 putatively interacts with a number of proteins. Of these interacting partners, we selected Methyl CpG Binding Protein 2 (MeCP2), MYC Associated Zinc Finger Protein (MAZ) and Emerin (EMD) based on the functional properties to verify that they are indeed CXXC5 interactors. For co-immunoprecipitations assay, PCR amplified CXXC5, MeCP2 (kindly provided by Dr. K. Miyake, University of Yamanashi, Japan), EMD (HsCD00324605, obtained from Harvard Plasmid) or MAZ (HsCD00377948, obtained from Harvard Plasmid) cDNA were cloned into a pcDNA 3.1(-) vector bearing in-frame sequences at the 5’-end of the multiple cloning site that encode for an amino-terminally located 3xFlag or HA tag by the use of NheI and BamHI restriction enzymes. Tag sequences are shown in Table 3 (Appendix B). To carry out Co-IP experiment, HEK293 cells were transfected with expression vectors bearing the 3xFlag tagged-CXXC5 cDNA alone or together with the HA tagged vector or the HA tagged vector bearing MeCP2, - MAZ or -EMD cDNA and grown for 48 h. We also transfected cells with the HA-tagged CXXC5 and 3xFlag-tagged MeCP2, -MAZ or -EMD to further confirm that protein interactions. Cell pellets were collected and lysed with NE- PER Nuclear Extraction Reagents (Thermo Scientific, 78833) that contained freshly added 1X protease inhibitor (Roche, 11 873 580 001) and 1X PhosSTOP (Roche, 04 906 837 001). Protein concentrations were measured with Bradford

18

Protein Assay (Thermo Fisher). In order to remove non-specific protein binding to magnetic beads, 500 µg total lysates were incubated with 25 µl Protein A and G conjugated magnetic beads at 4°C for 1 hour. The lysates were then applied to magnetic field for 30 seconds to pull beads to the side of the tube. The supernatant was transferred to a clean 1.5 ml microcentrifuge tube and beads were discarded. The pre-cleared lysates were subsequently incubated with an anti-HA antibody (Abcam, ab9110) at 4°C for overnight with gently rotation. Next day, 25 µl Protein A and G magnetic beads were added to each tube and incubated for 1 hour at 4°C in order to capture the HA antibody bound to target proteins. Beads were then washed extensively with IP buffer containing 10 mM

HEPES pH 7.5, 150 mM NaCl, 10 mM MgCl2, 0.5% Igepal, protease inhibitor and phosphatase. Bead pellets were re-suspended in 30 µl of 2X SDS Loading buffer (187.5 mM Tris pH 6.8, 6% (w/v) SDS, 30% glycerol, 150 mM DTT, 0.03% (w/v) bromophenol blue, 10% β-mercaptoethanol) and incubated at 95°C for 5 minutes. The sample was applied to magnetic field for 30 seconds. The supernatant was loaded on 10-15% SDS-PAGE gel and followed by western blot assay.

2.12.1. Western Blot

Equal volumes of samples were loaded into wells. Gel was subjected to electrophoresis for approximately 90 min at 100 V. Proteins were then transferred from the gel onto a PVDF membrane (Advansta, WesternBrightTM PVDF-CL, L- 08008-001) using wet transfer system for 60 minutes at 100 V. The membrane was blocked with a buffer containing 5% skim milk (Bio-Rad, 170-6404) in 0.1% Tris Buffered Saline-Tween (TBS-T) for the Flag antibody (Sigma F1834) or the HA antibody (Abcam, ab9110) for overnight at 4°C. The Flag or the HA antibody was diluted to 1:1000 in the blocking buffer and the membrane was incubated for one hour at room temperature. After incubation, the membrane was washed three times with 0.1% TBS-T. An HRP conjugated goat anti-mouse IgG secondary antibody (Advansta, R-05071-500) for the Flag antibody or a HRP-

19 conjugated goat anti-rabbit IgG secondary antibody (Advansta, R-05072-500) for the HA antibody was prepared as 1:4000 dilution in the blocking buffer and the membrane was incubated for one hour at room temperature. After three times washing with 0.1% TBS-T, the membrane was incubated for two minutes with WesternBright ECL (Advansta, K-12045-D50) in 1:1 luminol- enhancer reagent:peroxide reagent ratio. Visualization was done with ChemiDocTM MP system (Bio-Rad, 1708280) and images were analyzed with Image LabTM software (Bio-Rad). PageRulerTM Prestained Protein Ladder (Thermo Fischer, 26616) was used as molecular weight marker.

2.12.2. Immunocytochemistry (ICC)

HEK293 cells were plated on coverslips and transfected with plasmids encoding HA-CXXC5 (500 ng) and/or 3xFlag MeCP2 (500 ng); HA-CXXC5 (500 ng) and/or 3xFlag MAZ (500 ng); HA-CXXC5 (500 ng) and/or 3xFlag EMD (500 ng) using Turbofect transfection reagent (Thermo Scientific, R0532) according to the manufacturer’s instructions. We also set identical experiments using constructs bearing reverse tags: that is, 3xFlag CXXC5 (500ng) and/or HA- MeCP2 (500 ng) etc. After 48 hours, the coverslips were washed three times with 1X PBS and fixed with 3.7% formaldehyde for 30 minutes. The cells were then permeabilized with 0.4% Triton-X-100 for 10 minutes followed with 10% normal goat serum (for the HA antibody) or 10% BSA (for the Flag antibody) to block non-specific protein interactions for 1 hour. The primary antibody against the HA (1:500, Abcam, ab9119) or the Flag (1:250, Sigma Aldrich, F-1804) tag was then added to cells for 2 hours. Cells were subsequently washed three times with PBS, and incubated with either an Alexa Fluor 488-conjugated goat anti- mouse diluted in 3% BSA or an Alexa Fluor 594-conjugated goat anti-rabbit secondary antibody diluted in 2% NGS (1:1000) for 1 hour at room temperature. Cells were rinsed in PBS three times and mounted onto slides with mounting medium containing DAPI for nuclear staining (Abcam, ab104139). Slides were

20 viewed on Nikon Eclipse 50i Fluorescence Microscope. ImageJ software was used for image analysis.

2.13. Identification of the Protein Interaction Region(s) of CXXC5

To identify the protein-interaction region(s) of CXXC5 to each putative partner (MeCP2, MAZ or EMD), I generated truncated cDNAs of CXXC5 (1-249, 101- 249, 101-322 and 250-322) by PCR using cloning primers with NdeI, AgeI and BamHI restriction enzyme cut sites and cloned into the pcDNA 3.1(-) containing sequences encoding an amino terminally HA or 3xFlag tag. Since CXXC5 is a nuclear protein and contains a nuclear localization signal juxtaposed to the CXXC domain, we wanted to ensure that truncated CXXC5 variants could also translocate to the nucleus. To accomplish this, we also inserted in-frame sequences encoding the nuclear localization signal of SV40 T antigen to the 3’ of the sequence encoding the Flag or HA tag. Primer sequences for the generation of CXXC5 truncations are shown in Table 4 (Appendix B). PCR templates were cloned into the pcDNA 3.1(-), which contains sequences that encode an amino terminally 3xFlag tagged with NheI and BamHI restriction enzymes. Truncations mutants of CXXC5 were expressed in transiently transfected HEK293 cells to examine the protein synthesis. Co-immunoprecipitation assay was performed as described in section 2.12.

21

22

CHAPTER 3

RESULTS AND DISCUSSION

PART I: TESTING THE DNA BINDING ABILITY OF THE FULL-LENGTH RECOMBINANT CXXC5 PROTEIN

3.1. Choosing an Optimal Recombinant Protein Expression System

To successfully carry out recombinant protein expression and purification, several factors, such as host cells (bacteria, yeast, insect or mammalian cells), type of affinity tags (His tag, GST tag or SUMO tag), conditions of expression induction (IPTG concentration, temperature and incubation time) and good purification strategies should be taken into consideration. Because of these reasons, I initially carried out a small-scale CXXC5 synthesis in bacteria, insect cell and yeast to obtain a soluble protein. I also screened different fusion tags (6xHis, GST or SUMO tag with 8xHis) conjugated to the full-length (FL) CXXC5 protein in order to increase the purity and the amount of the CXXC5 protein (data not shown). Of these approaches, the 6xHis tagged full-length CXXC5 protein and its CXXC domain were successfully expressed in and purified from bacteria.

3.2. Purification of the Recombinant Full-Length CXXC5 and Its CXXC Domain

Using a bacterial expression system, I expressed full-length CXXC5 or the CXXC domain and purified it with immobilized metal affinity chromatography (IMAC).

23

The recombinant protein was eluted from the IMAC resin column with an elution buffer containing 50 mM Tris pH 8.0, 250 mM imidazole pH 8.0 and 1 mM TCEP. To remove the high concentration of imidazole from protein mixture, the mixture was dialyzed against 50 mM Tris pH 8.0, 150 mM NaCl and 1 mM TCEP at 4ºC for overnight. During the dialysis procedure, TEV enzyme (1 µg TEV for 10 µg recombinant protein) produced at the Structural Genomic Consortium (SGC) (University of Toronto, Canada) was added directly into the protein mixture. 7K (CXXC5) or 3K (CXXC domain) MW cut-off snakeskin dialysis tubing (Thermo Fisher) was then used for dialysis. The overnight-dialyzed protein samples were subsequently loaded to a column containing his-select resin (5 ml) and incubated for 3 minutes. Upon digestion with TEV, the proteins without tag flow through the column. Thus, the flow-through fraction contains the target recombinant protein. Following dialysis, samples were loaded onto a SDS-PAGE gel (4-20% gradient) and stained with InstantBlue coomassie dye (Expedeon) for visualization.

The eluted protein mixture was concentrated using 10K (CXXC5) or 3K (CXXC domain)-MW cut-off concentrator. Cation exchange chromatography was then performed to further purify the protein mixture with a gradient salt concentration (Buffer A: 50 mM Tris pH 8.0, 1 mM TCEP and Buffer B: 50 mM Tris pH 8.0, 1 M NaCl, 1 mM TCEP). Fractions that correspond to protein peaks were loaded onto a SDS-PAGE gel (4-20% gradient) for protein separation. The gel was then stained with InstantBlue coomassie dye for visualization. Fractions that contained the FL- CXXC5 protein or the CXXC domain were collected and concentrated to a 2 ml volume for gel filtration chromatography. The concentrated protein samples were then loaded onto a gel filtration (size exclusion) column in a buffer containing 20 mM Tris pH 7.5, 150 mM NaCl and 1 mM TCEP. Fractions corresponding to protein peaks were loaded onto a SDS-PAGE gel (4-20% gradient) for protein separation. The gel was then stained with InstantBlue coomassie dye for visualization (Figure 4). The analysis of the chromatogram and SDS-PAGE reveals that I obtained a relatively pure population of the recombinant FL-CXXC5 and CXXC domain proteins.

24

I then performed mass spectrometry to analyze molecular weight of the recombinant FL-CXXC5 protein, which gave the highest peak around 33 kDa (Figure 5A). I also carried out WB analysis to confirm our result by using a CXXC5 antibody (Abcam, ab106533) (Figure 5B). After I confirmed the successful purification of the recombinant full-length CXXC5 protein, I tested the DNA binding ability of FL- CXXC5 using isothermal titration calorimetry (ITC) assay.

Figure 4: Purification of recombinant proteins. The full-length CXXC5 protein (A) and the CXXC domain (B) were purified by gel filtration and collected fractions (specifying with red lines) were loaded onto a SDS-PAGE gel and then stained with InstantBlue coomassie dye. “MW” indicates molecular masses in kDa.

25

Figure 5: Verification of the recombinant full-length CXXC5 protein. Mass spectrometry analysis was performed to verify the molecular weight of the purified recombinant CXXC5 protein (A). The same recombinant protein was also loaded onto 12% SDS-PAGE for WB. The CXXC5 antibody (Abcam, ab106533) was used for the detection of the protein (B).

3.3. Testing the Binding Ability of the Full-Length CXXC5 and CXXC Domain to CpG Containing DNA

Due to the presence of the zinc finger CXXC domain, the CXXC5 protein is considered to be a member of the ZF-CXXC protein family that binds to DNA bearing non-methylated CpG dinucleotides. Xu et al. (2018) recently showed that the CXXC domain of CXXC5 binds to non-methylated CpG DNA [32]. Therefore, I also used the CXXC domain as a positive control. I initially examined the binding of the FL-CXXC5 to a non-methylated CpG dinucleotide bearing DNA (5’- GTGATACCGGATCAGT-3’) in vitro using ITC. Results showed that the FL- CXXC5 binds to a DNA fragment containing a CpG dinucleotide (CG; Figure 6). Moreover, this binding was abolished when I used a DNA fragment containing

26 identical sequences but with methylated CpG sequence (5’- GTGATACmCGGATCAGT-3’; mCG; Figure 6). The binding of CXXC5 to the non-methylated CpG containing DNA fragment is specific because when I used a DNA fragment containing CATG sequence instead of non-methylated CCGG, I observed no binding (No CG; Figure 6). This suggests that the recombinant FL- CXXC5 specifically binds to the non-methylated CpG containing DNA. I also tested the ability of FL-CXXC5 to bind to DNA fragment containing the identical test binding sequence but with different surrounding sequences (5’- GAGAGAxxxxTCTCTC-3’ sequence instead of 5’-GTGATAxxxxATCAGT-3’) to ensure that CXXC5 binds specifically CpG dinucleotides independent of the surrounding sequences. Indeed, FL-CXXC5 bound to both DNA fragments with similar affinities (Figure 7).

Figure 6: Isothermal titration calorimetry. Binding affinities of the full-length CXXC5 protein and the CXXC domain to DNA sequences. The sequence of the central four nucleotides (CpG and/or non-CpG) embedded into the same surrounding sequence in DNA fragments is underlined. Data are represented as mean ± SE. NB denotes no detectable binding.

27

Figure 7: Comparison of different surrounding DNA sequences in the binding of the CXXC5 protein to CpG dinucleotides. Binding affinities of the full-length CXXC5 to different DNA sequences measured by ITC. CXXC5 binds to non-methylated CpG dinucleotide independently of surrounding DNA sequences.

ITC results collectively indicate that CXXC5 is a CpG binder, as recently shown for the recombinant CXXC domain of CXXC5 [32]. To further confirm ITC results, I also tested CpG DNA binding ability of FL-CXXC5 with electrophoretic mobility shift assay (EMSA). To assess the required amount of CpG DNA for binding assay, various protein concentrations at 1:0.5, 1:1, 1:2, 1:4 and 1:8 ratio to DNA (50 µM) were used. Then, 50 µM non-methylated (5’-GTGATACCGGATCAGT-3’; CpG) or methylated (5’-GTGATACmCGGATCAGT-3’; mCpG) DNA was mixed with the FL-CXXC5 protein and samples were run onto 5% native TBE gel. The gel was stained and subjected to UV spectrometry for visualization. Results revealed that FL- CXXC5 binds to non-methylated CpG containing DNA fragment, as the CXXC domain used as the positive control, but cannot bind to DNA fragment bearing methylated mCpG as similarly observed with ITC (Figure 8).

28

Figure 8: Electrophoretic mobility shift assay (EMSA). 50µM non-methylated (CG) DNA was mixed with the full-length CXXC5 protein at 1:0.5, 1:1, 1:2, 1:4 and 1:8 ratio to DNA. 50 µM methylated (mCG) was mixed with the full-length CXXC5 protein. As a positive control, 50 µM non-methylated (CG) DNA was mixed with 200 µM CXXC domain. Samples were run onto 5% native TBE gel, stained and visualized with UV spectrometry. “M” represents molecular marker.

Figure 9: Hemimethylated CpG DNA. 50µM non-methylated (CG), methylated (mCG) or hemimethylated (hemi-mCG) DNA were mixed with the FL-CXXC5 as indicated. Samples were run onto 5% native TBE gel, stained and visualized with UV spectrometry. “M” represents molecular marker.

29

In mammals, DNA methylation generally occurs at CpG dinucleotides in both DNA strands symmetrically. However, in the genome, some CpGs can be hemimethylated, meaning that only one strand of DNA is methylated [43]. Therefore, we also tested whether FL-CXXC5 binds to a hemimethylated DNA (the upper strand: 5’- GTGATACCGGATCAGT-3’; the lower strand: 5’-GTGATACmCGGATCAGT-3’) compared to the fully methylated one (Figure 9). Interestingly CXXC5 also binds, although with an apparent lower affinity, to hemimethylated CpG.

PART II: STRUCTURAL CHARACTERIZATION OF THE FULL-LENGTH CXXC5 PROTEIN

3.4. Crystallization of the Full-length CXXC5 Protein

Protein crystallization is the process of forming protein crystals to be used for the elucidation of three-dimensional structure of a protein using X-ray crystallography [44], [45]. To understand the structural feature of the FL-CXXC5 protein in the absence (apo protein) or presence of a non-methylated CpG DNA (5’ GTGATACCGGATCAGT 3’) as ligand, I screened different buffer conditions. After screening of CXXC5 protein crystals with different screening buffers using a commercially available kit (SGC, RW2, Natrix, Index, PEG/Ion), I obtained FL- CXXC5 crystals in four different Hampton Research Natrix buffer conditions; 0.08 M Potassium chloride, 0.02 M Magnesium chloride hexahydrate, 0.04 M Sodium cacodylate trihydrate pH 6.0, 45% v/v (+/-)-2-Methyl-2,4-pentanediol, 0.012 M Spermine tetrahydrochloride (Figure 10a), 0.002 M Calcium chloride dehydrate, 0.05 M Sodium cacodylate trihydrate pH 6.0, 1.8 M Ammonium sulfate, 0.0005 M Spermine (Figure 10b), 0.01 M Magnesium chloride hexahydrate, 0.05 M HEPES sodium pH 7.0, 1.6 M Ammonium sulfate (Figure 10c), 0.02 M Magnesium chloride hexahydrate, 0.05 M MOPS pH 7.0, 2 M Ammonium sulfate, 0.0005 M Spermine (Figure 10d).

30

Figure 10: Protein crystallization. The formation of the full-length CXXC5 protein crystals was observed under a light microscope with polarizing optics in different buffer conditions. (a) 0.08 M Potassium chloride, 0.02 M Magnesium chloride hexahydrate, 0.04 M Sodium cacodylate trihydrate pH 6.0, 45% v/v (+/-)-2-Methyl-2,4-pentanediol, 0.012 M Spermine tetrahydrochloride, (b) 0.002 M Calcium chloride dehydrate, 0.05 M Sodium cacodylate trihydrate pH 6.0, 1.8 M Ammonium sulfate, 0.0005 M Spermine, (c) 0.01 M Magnesium chloride hexahydrate, 0.05 M HEPES sodium pH 7.0, 1.6 M Ammonium sulfate, (d) 0.02 M Magnesium chloride hexahydrate, 0.05 M MOPS pH 7.0, 2 M Ammonium sulfate, 0.0005 M Spermine

Protein structure is determined by X-ray crystallography, which is dependent on obtaining a single protein crystal suitable for diffraction data collection [42]. X-ray crystallography is a technique used for visualizing the atomic and molecular structure of a crystal not only in single state but also in all of their molecular interactions. After obtained protein crystals, crystals were pre-screened by X-ray beamline at the Structural and Genomic Consortium (SGC) Laboratory, University of Toronto, Canada (Figure 11). Then two crystals possibly suitable for diffraction were sent to the synchrotron X-ray beamline, which has a microdiffractometer to obtain high-resolution diffraction data. Unfortunately, data we obtained were not useful for structural analysis. 31

Figure 11: X-ray diffraction pattern of the CXXC5 protein complexed with non- methylated CpG DNA. White arrow indicates each black dot (reflections), which represents the scattering of the protein crystal at one diffraction angle.

X-ray diffraction analysis is applied to a protein crystal, which should be in sufficient size, homogenous and high quality to permit accurate data analysis [45]. Therefore, the protein crystal is the prime element for the entire process. Protein concentration and its purity, additives or buffer conditions which crystals are grown in are some of the parameters required for high quality data from X-ray crystallography. There are some possible reasons that I could not succeed in obtaining FL-CXXC5 crystals useful for X-ray diffraction. Based on the amino acid sequences and secondary structure prediction (Appendix C), the amino-terminal region of CXXC5 tends to show a flexible, unstructured conformation. Based on the IUPred prediction tool, CXXC5 only showed “order structure” related to the CXXC domain (Figure 12).

Amino-terminal regions of most of cellular proteins show intrinsically disorder/unstructured region, which display very poor tertiary structural features under native conditions [46], [47].

32

Figure 12: Prediction of intrinsically unstructured regions of CXXC5 protein by IUPred web server (https://iupred2a.elte.hu). The prediction of intrinsically unstructured regions from amino acid sequences is dependent on the estimation of energy content of total pairwise inter-residue interactions. The score ranges from 0 (complete order) to 1 (complete disorder). A score above 0.5 is accepted as intrinsically unstructured regions.

Based on these results, the amino-terminal region of CXXC5, the first 150 amino acids, appears to have disorder structure. Unstructured proteins, are thought to carry out basic functions in signal transduction, cell-cycle regulation and transcription [48] through the stabilization of structure by interacting with protein-specific ligands including DNA, RNA or other proteins [49]. These unstructured proteins display a significantly folded structure upon binding to specific ligands [47]. The prediction of protein binding regions in CXXC5 also revealed that possible protein binding regions located at the amino-terminus are correlated with the unstructured regions of the protein (Figure 13).

Figure 13: Prediction of protein binding regions of CXXC5 protein. Based on the ANCHOR prediction tool, probabilities of protein interaction sites were higher in the disordered regions.

33

These results suggest that the crystallization of the FL-CXXC5 protein could be more successful if and when regions of the amino-terminus are structured upon binding to interaction partners, for examples proteins. For these reasons, I aimed to identify possible protein interaction partners of CXXC5.

PART III: IDENTIFICATION OF PROTEIN INTERACTION PARTNERS OF CXXC5

Since proteins perform their functions in a dynamically changing network of proteins whose identities and amounts vary with time and space, the identification of CXXC5 protein partners could also provide critical information about functions of CXXC5. To address this issue, I used the Proximity Biotinylation Assay (BioID) approach. BioID is used to define the spatio-temporal protein interactions in cells (Figure 14) [50].

Figure 14: BioID. This approach is based on the genetic fusion of a promiscuous E. coli biotin ligase enzyme (BirA*) to a targeting protein. In the presence of biotin, this fused enzyme biotinylates interacting or proximal proteins. After cell lysis, biotinylated proteins

34 are captured by streptavidin magnetic beads and identified by mass spectrometry (The figure reprinted from [51] with permission).

3.5. Synthesis of the CXXC5-BirA* Fusion Protein in MCF7 Cells

BioID harnesses a promiscuous biotin ligase to biotinylate proteins based on proximity. The ligase is genetically fused to a protein of interest and expressed in cells, where it biotinylates proteins in proximity (that is a ~10nm labeling radius), enabling a selective isolation and identification with biotin-affinity capture of biotinylated proteins. Proteins identified by BioID are considered to be candidate protein interactors for the protein of interest. BirA (DNA binding biotin ligase of Escherichia coli) is 35kDa protein. BirA mutant (R118G, or BirA*), which is defective in both self-association and DNA binding, is used in BioID approach [50], [52]. I genetically fused the Flag-CXXC5 cDNA to the 5’ end of the sequences encoding BirA*-HA cDNA present in a pcDNA vector (pcDNA3.1-MCS-Flag- CXXC5-BirA*(R118G)-HA). To ensure that the fusion of CXXC5 to BirA* does not affect the synthesis and the intracellular localization of CXXC5 in transiently transfected MCF7 cells, I firstly checked the synthesis and cellular localization of the CXXC5-BirA* fusion protein. To carry out this, the expression vector bearing BirA* (R118G)-HA (empty vector, EV) or the Flag-CXXC5-BirA*(R118G)-HA cDNA were transiently transfected into MCF7 cells. 24 hours after transfection, cells were exposed to 50 µM biotin and 1 mM ATP for 18 hours. Cells were then subjected to immunocytochemistry (ICC) or WB. For ICC, cells were fixed, permeabilized and incubated with an antibody against biotin (Abcam, ab533494) or an antibody against the Flag tag (Sigma, F1804) followed by an appropriate fluorescent-conjugated secondary antibody. For WB analysis, cells were lysed and total proteins were incubated with streptavidin magnetic beads for overnight. After beads were washed extensively, biotinylated proteins were separated from beads with heat (95°C for 5 min) in the presence of 1x Laemmli buffer. Biotinylated proteins were then subjected to SDS-PAGE and WB using the biotin or the CXXC5 antibody (Abcam, ab106533) followed by an HRP conjugated-secondary antibody for visualization (Figure 15A and B).

35

Figure 15: BirA* biotinylates endogenous proteins in MCF7 cells. MCF7 cells were transiently transfected for 24 hours with pcDNA3.1-MCS-BirA* (R118G)-HA empty vector (EV) or pcDNA3.1-MCS-Flag-CXXC5-BirA*(R118G)-HA (CXXC5). 18 hours after transfection, cells were treated with or without biotin (50 µM) and ATP (1 mM). (A) Western blot analysis showed that in the presence of biotin, the amount of biotinylated proteins increases compared to EV transfected group. Among the biotinylated

36 proteins, CXXC5 was also biotinylated by BirA* enzyme. The biotin (Abcam, ab533494) antibody was used to detect biotinylated proteins whereas; the CXXC5 antibody (Abcam, ab106533) was used to detect the CXXC5 protein. (B) ICC results showed the localization of the fusion CXXC5 protein in MCF7 cells. In presence of biotin, biotinylated proteins were detected with the anti-biotin (red) and the CXXC5 proteins were detected with the anti-flag (green) antibody. DAPI was used for nuclear staining.

3.6. CXXC5 Associated Proteins in MCF7 Cell Line

When I carried out BioID assay in MCF7 cells using biotin-affinity capture followed by LC-MS/MS, a large number of proteins was identified from two biological replicates. As a negative control, I also performed same experiments with untransfected (UNT) or empty vector (EV) transfected MCF7 cells. For each biological experiment, there were two technical replicates for the mass spectrometry analysis. The common proteins obtained from two different technical replicates of each biological experiment were included for the subsequent in silico analysis. In order to remove background proteins, proteins obtained from untransfected or empty vector transfected cells were excluded from the CXXC5-BirA* synthesizing cells. Based on results, there were 104 proteins specific for the CXXC5-BirA* synthesizing cells (Figure 16A). These 104 proteins (listed in Appendix D) were clustered based on their biological processes using DAVID bioinformatics tool (David Bioinformatics Resources 6.8) which grouped proteins involved in the regulation of transcription, chromosome organization and mRNA metabolic process (Figure 16B).

37

Figure 16: Summary of identified proteins with LC-MS/MS. Venn diagram showing the overlap of proteins from untransfected (UNT), empty vector (EV) and CXXC5-BirA* cDNA transfected groups (A). Proteins obtained from LC-MS/MS were clustered based on their biological processes using the DAVID bioinformatics tool (David Bioinformatics Resources 6.8). The annotated clusters that have an enrichment score above 1.3 were included to the functional annotation cluster (B).

Identified proteins by LC-MS/MS were ranked based on their high percentage of coverage value from two biological replicates (Table 5). These proteins are transcription factors, chromatin regulators, nuclear lamins and members of NuRD complex. For example, GATAD2A & GATAD2B (also known as p66α and p66β), ZNF512B (zinc finger protein 512B) and CHD8 (Chromodomain-helicase-DNA- binding protein 8) are involved in nucleosome remodeling and deacetylase (NuRD) complex mediated ATP-dependent chromatin remodeling [53]–[55]. Similarly, RUVBL1 (RuvB-like 1), which possess helicases activities, is involved in the ATP- dependent nucleosome remodeling [56]. ADNP (Activity-dependent neuroprotective protein) was identified as a novel element in the SWI/SNF chromatin-remodeling complex [57]. MeCP2 (Methyl-CpG binding protein) is a multifunctional protein that 38 regulates chromatin reorganization and possesses transcriptional activity [58]. EMD and TMPO are proteins involved in the structural organization of the nuclear envelope [59]. NUMA1 (Nuclear mitotic apparatus protein 1) is a microtubule binding protein, associates with microtubules during mitosis and required for chromosome separation [60], [61]. GRHL2 (Grainyhead-like transcription factor 2) is a transcription factor that modulates the expression of genes involved in epithelial differentiation [62]. MAZ (Myc-associated zinc finger protein) is also another transcription factor with dual roles in transcription initiation and termination [63]. SNW1 (SNW domain-containing protein 1) is a member of the SNW gene family, which functions as a coactivator for enhancing transcription [64] and as a splicing factor [65].

In the BioID system, proteins in close proximity to the protein of interest are also biotinylated. A positive result might be a reflection of direct or indirect physical interaction or simply due to the proximity without physical contact. Therefore, detection of a biotinylated protein is not a direct evidence for physical interaction. Interactions should be verified with various approaches including co- immunoprecipitation and/or co-immunocytochemistry to show that detected proteins are indeed interacting in cells [52].

To confirm the in situ interaction of CXXC5 with some of the identified proteins, I initially set out co-immunoprecipitation studies using the CXXC5 antibody from nuclear extracts of MCF7 cells. Even though, the CXXC5 antibody we are using effectively detect endogenous CXXC5 in ICC and WB ([4] and unpublished results), the immunoprecipitation of the endogenous CXXC5 proved to be difficult. That we also effectively immunoprecipitated the overexpressed 3xFlag-CXXC5 introduced exogenously into MCF7 cells with the same CXXC5 antibody (Appendix E) suggests that the inefficient precipitation of CXXC5 is due to low levels of endogenous CXXC5 synthesis in MCF7. To circumvent this problem, I decided to use transiently transfected mammalian cells as models to assess the interaction of CXXC5 with putative protein partners. I used HEK293 cells, which are derived from transformed human embryonic kidney cells, rather than MCF7 cells for co-

39 immunoprecipitation studies, because HEK293 cells have a remarkably high transfection efficiency compared to MCF7 cells (Appendix F). Based on these observations, I cloned the cDNAs of these proteins listed in Table 5 into mammalian expression vectors for co-localization and co-immunoprecipitation approaches.

Table 5: Putative protein partners of CXXC5 from LC-MS/MS analysis

Accession # Protein ΣCoverage Σ#Proteins Σ# Unique Σ# Peptides Σ# PSMs Peptides

Q7LFL8 CXXC5 45.03 12 22 22 108

P42166 TMPO 43.37 2 18 25 46

Q86YP4 GATAD2A 33.97 11 17 18 43

Q14980 NUMA1 31.22 22 54 54 62

Q6ISB3 GRHL2 23.84 2 9 9 22

Q8WXI9 GATAD2B 21,92 1 8 9 21

Q13573 SNW1 19.22 4 7 7 12

Q9H2P0 ADNP 18.87 2 15 15 26

Q96KM6 ZNF512B 13.00 1 7 7 12

Q9Y265 RUVBL1 12.28 5 4 4 7

P50402 EMD 10.24 2 2 2 3

P51608 MeCP2 8.23 4 4 4 4

Q9HCK8 CHD8 7.25 7 12 13 24

P56270 MAZ 6.08 5 1 2 6

PSMs; displays the total number of identified peptide sequences for the protein.

3.7. Intracellular Localization of CXXC5 Protein Partners

Among these proteins listed in Table 5, we selected chromatin modifier MeCP2, transcription factor MAZ and nuclear membrane EMD protein as members of distinct groups according to the DAVID bioinformatics tool described in section 3.6 for studies to assess whether these proteins are indeed CXXC5 interactors. To validate the interactions between CXXC5 and MeCP2, MAZ or EMD proteins, I initially examined the intracellular location of these proteins with ICC, using

40

HEK293 cells transiently transfected with pcDNA-3xFlag-CXXC5 and/or pcDNA- HA-MeCP2, pcDNA-HA-MAZ or pcDNA-HA-EMD for 48 hours (Figure 17). As expected, all tagged proteins were localized in the nucleus and showed an overlapping pattern. DAPI was used to stain the nucleus (data not shown). EMD, as a nuclear inner membrane protein, displayed mostly nuclear membrane as well as intranuclear staining.

Figure 17: Immunocytochemistry of localization of the CXXC5 protein and its partners. Representative fluorescent images of HEK293 cells transiently transfected with 3xFlag-CXXC5 and HA-MeCP2, HA-MAZ or HA-EMD for 48 hours and stained with both the Flag and HA antibodies. DAPI staining is not shown. Scale bar represents 50µm (40X).

3.8. Verification of CXXC5 protein partners by co-immunoprecipitation assay

After verification that in transfected cells HA tagged MeCP2, MAZ or EMD is a nuclear protein as 3xFlag-CXXC5, I carried out co-immunoprecipitation experiments. I transiently transfected HEK293 cells with expression vector bearing none as a control or a HA tagged putative partner cDNAs and/or 3xFlag-CXXC5. I also used reversed tag strategy to ensure that co-immunoprecipitation is not affected

41 by the nature of the tag. Nuclear extracts of transiently transfected cells were subjected to immunoprecipitation using the HA (or the Flag) antibody together with A and G magnetic beads. The interaction of CXXC5 with these proteins was verified by western blot assay using the Flag (or the HA) antibody. I found that the HA- MeCP2, -MAZ or -EMD protein was co-precipitated with 3xFlag-CXXC5 as similarly observed with reverse tagged proteins.

3.8.1. Protein Interaction between MeCP2 and CXXC5

Cytosine methylation at CpG dinucleotides is one of the important epigenetic mechanisms of to control gene expressions in vertebrates. Proteins with a methyl- CpG binding domain (MBD) bind to methylated CpG and regulate transcription [66]. One of MBD family members is MeCP2 (Methyl-CpG-Binding Protein 2). Even though MeCP2 was initially identified as a methyl-CpG binding protein, studies further suggested that MeCP2 is a nuclear multifunctional protein involved in several cellular processes, including large-scale chromatin reorganization and architecture as well as transcription regulation [58], [67]. MeCP2 is an intrinsically disordered protein that has five domains; N-terminal domain (NTD), a methyl-binding domain (MBD), an intervening domain (ID), a transcriptional repression domain (TRD) and a carboxyl terminal domain (CTD) containing α and β sub-regions [68]. MBD, TRD and CTD are considered to be the main functional domains. The MBD is responsible for the protein’s ability to bind to methylated or non-methylated CpG dinucleotides [69]. MeCP2 acts as a chromatin remodeller by interacting with corepressor complexes containing transcriptional repressor mSin3A and histone deacetylases through its TDR domain [70]. It was also reported that the TDR domain of MeCP2 directly interacts with c-Ski and N-CoR co-repressors [71]. Group II WW domain binding region within the carboxyl-terminal domain of MeCP2 is involved in the binding of the protein to splicing factors such as FBP11 and HYPC [72]. MeCP2 is expressed ubiquitously in almost all tissue but its expression in the brain is the highest. Mutations in MeCP2 cause various neurological disorders such as Rett syndrome and autism [73]. 42

Synthesis of the tagged MeCP2 and CXXC5 proteins alone or together in HEK293 cells were verified by western blot analysis using either the HA or the Flag antibody (Figure 18A). To determine whether MeCP2 and CXXC5 are interacting partners, I carried out co-immunoprecipitation (Co-IP) of MeCP2 protein with the HA antibody in nuclear lysates of transiently transfected HEK293 cells. Results revealed that 3xFlag-CXXC5 was detected with the Flag antibody in the HA precipitated lysates but not in the IgG precipitated lysates. This result suggests that CXXC5 and MeCP2 are interacting partners (Figure 18B).

Figure 18: Interaction of MeCP2 and CXXC5. The protein synthesis of indicated constructs was also verified by western blot using either the HA or the Flag antibody. HDAC1 was used as a loading control (A). HEK293 cells were transiently co-transfected with indicated constructs and subsequently were subjected to co-immunoprecipitation with the HA antibody or the isotype-matched IgG and the precipitants were analyzed with western blotting (WB) using the Flag antibody. 50 µg of nuclear extracts was used as an 43 input control; 500 µg nuclear extracts was subjected to immunoprecipitation. Interaction was also assessed with reversed co-immunoprecipitation approach (B). “MW” indicates molecular masses in kDa. The figure shows the whole western blot membrane.

3.8.2. Protein Interaction between MAZ and CXXC5

MAZ (Myc-associated Zinc Finger Protein) is a transcription factor with C2H2-type zinc finger motifs that bind GC-rich promoters of target genes to control transcription initiation and termination [63] It has been shown that MAZ is overexpressed in breast cancer and that the knockdown of MAZ levels suppresses proliferation breast cancer cell models. This suggests that MAZ is involved in cellular proliferation [74].

Synthesis of the HA or 3xFlag tagged MAZ or CXXC5 alone or together in transiently transfected HEK293 cells were verified by WB analysis using the HA or the Flag antibody (Figure 19A). To determine whether MAZ and CXXC5 are interacting partners, I carried out co-immunoprecipitation of MAZ protein with the HA antibody. 3xFlag-CXXC5 was detected with Flag antibody in the HA but not in the IgG precipitated lysates. This result suggests that MAZ is a CXXC5 interacting protein partner (Figure 19B).

44

Figure 19: Interaction of MAZ and CXXC5. Protein syntheses of indicated constructs were verified by western blot using either the HA or the Flag antibody. HDAC1 was used as a loading control (A). HEK293 cells were transiently co-transfected with indicated constructs and subsequently were subjected to co-immunoprecipitation with the HA antibody or an isotype-matched IgG and the precipitants were analyzed in WB using the Flag antibody. 50 µg of nuclear lysates was used as an input control; 500 µg nuclear extracts was subjected to immunoprecipitation. Interaction was also assessed with reversed co- immunoprecipitation approach (B). “MW” indicates molecular masses in kDa.

3.8.3. Protein Interaction between EMD and CXXC5

Emerin (EMD) is a member of the LEM domain family of the inner nuclear membrane proteins [75]. Studies indicate that EMD is involved in the structural organization of chromatin, nuclear membrane assembly, and gene regulation [75]– [77] through interaction with various transcription factors [78], [79].

45

It appears that EMD facilitates repressive chromatin formation at the nuclear periphery by directly interacting with and increasing the catalytic activity of histone deacetylates 3 (HDAC3), the catalytic subunit of the nuclear co-repressor (NCoR) complex [76].

Synthesis of the HA or 3xFlag tagged EMD or CXXC5 alone or together in transiently transfected HEK293 cells were verified by WB analysis using the HA or the Flag antibody (Figure 20A). The human EMD protein has many phosphorylation and O-GlcNAc sites. EMD, as other nuclear envelope proteins, are hyperphosphorylated during the cell cycle phases to regulate nucleoskeletal reorganization and nuclear envelope disassembly [80]. In WB analyses using the HA or the Flag antibody, I observed three EMD proteins with distict molecular masses (Figure 20A). The EMD protein with the fastest electrophoretic migration corresponds to the predicted molecular mass (32 kDa) of EMD. The EMD proteins with higher molecular masses are likely post-translationally modified forms.

To determine whether EMD and CXXC5 are interacting partners, I carried out immunoprecipitation of HA-EMD protein with HA antibody using nuclear lysates of HEK293 cells transiently co-transfected with expression vectors. 3xFlag-CXXC5 was detected with the Flag antibody in the HA, but not in the IgG precipitated lysates, suggesting that CXXC5 and EMD interact in cells. Immunoprecipitation of HA-CXXC5 with the HA antibody also revealed that only the unmodified 3xFlag- EMD protein interacts with CXXC5 (Figure 20B).

46

Figure 20: Interaction of EMD and CXXC5. Protein synthesis of indicated constructs was also verified by western blot using either the HA or the Flag antibody. HDAC1 was used as a loading control (A). HEK293 cells were transiently co-transfected with respective constructs and subsequently were subjected to co-immunoprecipitation with the HA antibody or the isotype-matched IgG. The precipitants were analyzed in western blotting (WB) using the Flag antibody. 50 µg of nuclear lysates was used as an input control; 500 µg nuclear extracts was subjected to immunoprecipitation. Interaction was also assessed with reversed co- immunoprecipitation approach (B). “MW” indicates molecular weight marker in kDa.

47

3.9. Identification of protein interaction region(s) of CXXC5 and MeCP2

To identify which region(s) of CXXC5 is responsible for interacting with MeCP2 or MAZ, I generated cDNAs by PCR that encode for amino and/or carboxyl-terminally truncated CXXC5 fragments (Figure 21A). Since CXXC5 localizes to the nucleus through a nuclear localization signal present at the immediate amino-terminus of the CXXC domain [16] , I wanted to ensure that truncated CXXC5 variants also localize to the nucleus by inserting sequences that encode a nuclear localization signal (NLS) derived from the SV40 T antigen nuclear localization sequence [81] (Appendix B) to the 5’ of each truncated CXXC5 variant cDNA. To examine the synthesis of these variants, I performed WB analysis with the Flag antibody using nuclear extracts of transiently transfected HEK293 cells. Results revealed that proteins were synthesized with predicted molecular masses (Figure 21B).

Figure 21: Schematic representation of CXXC5 truncation fragments. Each mutant was generated by PCR and cloned into pcDNA3.1(-) bearing N-terminally 3xFlag tag and nuclear localization signal (NLS). All plasmids were verified by sequencing (A). Synthesis of proteins was verified by western blot analysis using the Flag antibody. Predicted molecular mass in kDa of each mutant is indicated (B).

To carry out Co-IP experiments, HEK293 cells were transiently transfected with the expression vector bearing 3xFlag-CXXC5 variants alone or co-transfected with the HA tagged MeCP2 or -MAZ cDNA and grown for 48 hours. The HA tagged MeCP2

48 or MAZ protein was immunoprecipitated with the HA antibody. Results revealed that the variants CXXC51-249 and CXXC5101-249 lacking the carboxyl-terminus ZF-

CXXC domain were not detectable in the precipitants, while CXXC5101-322 and

CXXC5250-322 were detectable with the Flag antibody in the HA, but not in the IgG precipitated lysates, indicating that carboxyl-terminal region of CXXC5 is necessary for the interaction with both MeCP2 (Figure 22A) and MAZ (Figure 22B). Since, there is an apparent decrease in the amount of CXXC5250-322 precipitated compared to CXXC5101-322, our results also suggest that residues 101-250 adjacent to the ZF- CXXC domain participates in the stability of interaction between the ZF-CXXC domain of CXXC5 and MECP2 or MAZ.

Figure 22: Identification of interaction region(s) of CXXC5 with MeCP2 and MAZ. Co- IP was performed with 3xFlag tagged truncated CXXC5 (1-249, 101-249, 101-322 and 250- 322) and the HA tagged MeCP2 (A) or MAZ (B) proteins. HEK293 cells were transiently co-transfected similarly performed as in full-length interaction with respective constructs and subsequently were subjected to co-immunoprecipitation with the HA antibody or the isotype-matched IgG and the precipitants were analyzed in western blotting (WB) using the Flag antibody. 50 µg of nuclear extracts was used as an input control; 500 µg nuclear extracts was subjected to co-immunoprecipitation. The HA antibody was used in WB to verify immunoprecipitation of the HA tagged MeCP2 or MAZ. Molecular masses of truncated CXXC5 proteins are indicated in kDa.

To address whether we could also locate a region within the MeCP2 protein involved in the interaction with CXXC5, we used a MeCP2 variant. A previous detailed study showed that the carboxyl-terminus of MeCP2 is required for the interaction with the

49

WW domains of FBP11 and HYPC proteins, as the truncation of the carboxyl- terminal 89 amino acids of MeCP2, which also result from a frameshift mutation present in a large group of Rett syndrome patients, abrogates interactions FBP11 and HYPC [72] (Figure 24A). Based on the importance of the carboxyl-terminus of MeCP2 in protein interactions, I initially generated a cDNA encoding this MeCP2 truncation variant (MeCP21-400) with PCR and cloned into a pcDNA3.1(-) bearing sequences encoding the 3xFlag or the HA tagged.

Figure 23: Identification of the interaction domain of MeCP2 with CXXC5. Schematic representation of the full length and deleted MeCP2 protein (A). Co-IP was performed with

3xFlag tagged CXXC5 and the HA tagged MeCP2 (FL) or MeCP21-400 protein in transiently transfected HEK293 cells and subsequently were subjected to co-immunoprecipitation with the HA antibody or the isotype-matched IgG and the precipitants were analyzed in WB using the Flag antibody. 10% of nuclear lysates was used as an input control; the remaining was subjected to immunoprecipitation. (B). We also carried out co-immunoprecipitation using reversed tagged containing constructs (C). Molecular masses are indicated.

50

To determine whether the carboxyl-terminal region of MeCP2 is necessary for the interaction with CXXC5, I carried out immunoprecipitation of HA-MeCP21-400 with the HA antibody in nuclear extracts of transiently transfected HEK293 cells. The truncation effectively abrogated the interaction of MeCP21-400 with CXXC5 (Figure 24B). This observation was further confirmed by using the reversed tagged containing constructs (Figure 24C). These results suggest that the 86 amino acid- length region located at the carboxyl-terminus of MeCP2 is required for interactions with CXXC5.

51

52

CHAPTER 4

CONCLUSION AND FUTURE DIRECTIONS

According to Cancer Facts & Figures 2018 reported by American Cancer Society, breast cancer is one of the most common cancers diagnosed (excluding skin cancer) and it is the second leading cause of cancer death among women. Estrogen signaling is important for both the initiation of breast cancer and its progression. Because of the critical role of the E2-ERα signaling in the physiology and pathophysiology of breast tissue, identifying key genes regulated by estrogens is critical for the development of new prognostic and/or therapeutic targets for breast cancer. Our previous studies suggested that the expression of CXXC5 gene is regulated by E2- ERα through a direct interaction with an ERE sequence in the CXXC5 locus.

The CXXC-type zinc finger protein 5 (CXXC5), one of the members of zinc finger CXXC protein family, appears to participate in cellular events as a transcription factor, epigenetic regulator and/or protein partner of various signaling pathways.

In this dissertation, my goal was to carry out an initial characterization of the CXXC5 protein. To achieve this, I used three distinct approaches for: 1) Verification of DNA binding ability, 2) Tertiary structure prediction through protein crystallization, and 3) Identification of interaction partners.

PART I: The CXXC5 protein is a non-methylated CpG DNA binder

1) To perform functional and structural analysis of CXXC5, I initially aimed to express and purify CXXC5 as a full-length recombinant protein. As a result of many different expression and purification strategies, I successfully

53

expressed and purified amino-terminally 6xHis tagged full-length CXXC5 protein using a bacterial expression system. 2) I then carried out studies aimed at testing the DNA binding ability of the recombinant CXXC5 protein using ITC and EMSA approaches. 3) Our results showed that CXXC5 protein binds to non-methylated CpG containing DNA fragment through its zinc finger CXXC domain but cannot bind to DNA fragment bearing methylated mCpG. These results indicate that CXXC5 is a CpG DNA binding protein and could indeed involve in various transcriptional events as a transcription factor, co-regulator and/or epigenetic factor. Our ongoing studies are currently addressing these issues. 4) I also observed that FL-CXXC5 binds to hemi-methylated DNA. Although the importance of this observation is unclear, an intriguing possibility is that CXXC5 could also be involved in the maintenance of replication, as suggested for DNMTs [82], [83], as well as monoallelic expression patterns of imprinted genes, which are lost in human cancer [84].

PART II: Crystallization of the recombinant full-length CXXC5 protein complexed with a non-methylated CpG DNA ligand.

1) To solve tertiary structural properties of CXXC5, I carried out crystallization of the full-length CXXC5 protein complexed with a non-methylated CpG DNA ligand. Even though I obtained protein crystals, we could not obtain data useful for structural analysis because of the low-resolution diffraction data. 2) Possible reasons that we could not succeed in solving the crystal structure of CXXC5 are a. The amino-terminal region of CXXC5 tends to show a flexible, loop conformation. b. Amino-terminal of CXXC5 displays very poor tertiary structural features under native conditions because of the disorder/unstructured regions

54

c. The purity and quality of the recombinant CXXC5 protein was not enough to obtain the crystal in sufficient size, homogenously and/or high quality to permit accurate data analysis. 3) These results suggest that the crystallization of the full-length CXXC5 protein could be more successful if and when amino-terminal regions become ordered upon binding to an interaction partner. We are planning to solve the crystal structure of CXXC5 in a complex with an interacting protein partner.

PART III: Identification interaction partners of CXXC5

1) To identify protein interaction partners of CXXC5, we used the Proximity Biotinylation Assay (BioID) approach, which is used to define the spatio- temporal protein interactions in cells. 2) We identified putative 104 proteins that are direct or indirect interaction partners of the CXXC5 protein. Of these proteins, our initial analyses indicated that CXXC5 interacts with MeCP2, MAZ and EMD proteins. We are currently exploring the verification of potential interacting partners identified with BioID. 3) Unlike in silico analyses for the prediction of CXXC5 interaction regions, which suggested that amino-terminal regions of CXXC5 are involved the protein binding, our co-immunoprecipitation experiments indicate that the ZF-CXXC domain is required for not only non-methylated CpG DNA binding but also protein interaction of CXXC5. This suggests that the ZF- CXXC domain is a bi-functional domain. However, we only showed the interaction of the ZF-CXXC domain with MeCP2 and MAZ proteins. There might be other protein partners of CXXC5 that interact with amino- terminal region(s) as well. 4) Our initial studies on the identification of a sub-region within MeCP2 responsible for the ability of the protein to interact with CXXC5 showed that the carboxyl-terminal 86 amino acids are critical for CXXC5 interactions. In our ongoing studies with a number of truncated and/or

55

mutated protein variants, we are further delineating specific region(s) and residue(s) responsible for protein interactions between CXXC5 and MeCP2. 5) We will also aim to perform similar experiments to identify interaction region(s) for MAZ and EMD proteins as well.

56

REFERENCES

[1] J. Huang, X. Li, C. Maguire, R. Hilf, R. Bambara, and M. Muyan, “Binding of estrogen receptor beta to estrogen response element in situ is independent of estradiol and impaired by its amino terminus.,” Mol. Endocrinol., vol. 19, no. 11, pp. 2696–712, 2005.

[2] K. J. Hamilton, Y. Arao, and K. S. Korach, “Estrogen hormone physiology: Reproductive findings from estrogen receptor mutant mice,” Reproductive Biology, vol. 14, no. 1. pp. 3–8, 2014.

[3] S. L. Nott, Y. Huang, X. Li, B. R. Fluharty, X. Qiu, W. V. Welshons, S. Y. Yeh, and M. Muyan, “Genomic responses from the estrogen-responsive element-dependent signaling pathway mediated by estrogen receptor alpha are required to elicit cellular alterations,” J. Biol. Chem., vol. 284, no. 22, pp. 15277–15288, 2009.

[4] P. Yaşar, G. Ayaz, and M. Muyan, “Estradiol-Estrogen Receptor α Mediates the Expression of the CXXC5 Gene through the Estrogen Response Element- Dependent Signaling Pathway,” Sci. Rep., vol. 6, p. 37808, Nov. 2016.

[5] Q. Zhang, M. Ye, X. Wu, S. Ren, M. Zhao, C. Zhao, G. Fu, Y. Shen, H. Fan, G. Lu, M. Zhong, X. Xu, Z. Han, J. Zhang, J. Tao, Q. Huang, J. Zhou, G. Hu, J. Gu, S. Chen, and Z. Chen, “Cloning and Functional Analysis of cDNAs with Open Reading Frames for 300 Previously Undefined Genes Expressed in CD34 + Hematopoietic Stem / Progenitor Cells,” Genome Res., pp. 1546– 1560, 2000.

[6] F. Pendino, E. Nguyen, I. Jonassen, B. Dysvik, A. Azouz, M. Lanotte, E. Ségal-Bendirdjian, and J. R. Lillehaug, “Functional involvement of RINF, retinoid-inducible nuclear factor (CXXC5), in normal and tumoral human myelopoiesis,” Blood, vol. 113, no. 14, pp. 3172–3181, 2009.

57

[7] M. B. Treppendahl, L. Möllgård, E. Hellström-Lindberg, P. Cloos, and K. Grønbæk, “Downregulation but lack of promoter hypermethylation or somatic mutations of the potential tumor suppressor CXXC5 in MDS and AML with deletion 5q,” Eur. J. Haematol., vol. 90, no. 3, pp. 259–260, 2013.

[8] H. K. Long, N. P. Blackledge, and R. J. Klose, “ZF-CxxC domain-containing proteins, CpG islands and the chromatin connection,” Biochem. Soc. Trans., vol. 41, no. 3, pp. 727–740, 2013.

[9] J. H. Lee, K. S. Voo, and D. G. Skalnik, “Identification and characterization of the DNA binding domain of CpG-binding protein.,” J. Biol. Chem., vol. 276, no. 48, pp. 44669–44676, 2001.

[10] T. Andersson, E. Södersten, J. K. Duckworth, A. Cascante, N. Fritz, P. Sacchetti, I. Cervenka, V. Bryja, and O. Hermanson, “CXXC5 is a novel BMP4-regulated modulator of Wnt signaling in neural stem cells,” J. Biol. Chem., vol. 284, no. 6, pp. 3672–3681, 2009.

[11] H. Y. Kim, D. H. Yang, S. W. Shin, M. Y. Kim, J. H. Yoon, S. Kim, H. C. Park, D. W. Kang, D. Min, M. W. Hur, and K. Y. Choi, “CXXC5 is a transcriptional activator of Flk-1 and mediates bone morphogenic protein- induced endothelial cell differentiation and vessel formation,” FASEB J., vol. 28, no. 2, pp. 615–626, 2014.

[12] M. S. Kim, S. K. Yoon, F. Bollig, J. Kitagaki, W. Hur, N. J. Whye, Y. P. Wu, M. N. Rivera, J. Y. Park, H. S. Kim, K. Malik, D. W. Bell, C. Englert, A. O. Perantoni, and S. B. Lee, “A novel wilms tumor 1 (WT1) target gene negatively regulates the WNT signaling pathway,” J. Biol. Chem., vol. 285, no. 19, pp. 14585–14593, 2010.

[13] H. Y. Kim, J. Y. Yoon, J. H. Yun, K. W. Cho, S. H. Lee, Y. M. Rhee, H. S. Jung, H. J. Lim, H. Lee, J. Choi, J. N. Heo, W. Lee, K. T. No, D. Min, and K. Y. Choi, “CXXC5 is a negative-feedback regulator of the Wnt/β-catenin pathway involved in osteoblast differentiation,” Cell Death Differ., vol. 22, no. 6, pp. 912–920, 2015. 58

[14] G. Li, X. Ye, X. Peng, Y. Deng, W. Yuan, Y. Li, X. Mo, X. Wang, Y. Wan, X. Liu, T. Chen, Z. Jiang, X. Fan, X. Wu, and Y. Wang, “CXXC5 regulates differentiation of C2C12 myoblasts into myocytes,” J. Muscle Res. Cell Motil., vol. 35, no. 5–6, pp. 259–265, 2014.

[15] P. A. Marshall, Z. Hernandez, I. Kaneko, T. Widener, C. Tabacaru, I. Aguayo, and P. W. Jurutka, “Discovery of novel vitamin D receptor interacting proteins that modulate 1,25-dihydroxyvitamin D3 signaling,” J. Steroid Biochem. Mol. Biol., vol. 132, no. 1–2, pp. 147–159, 2012.

[16] M. Zhang, R. P. Wang, Y. Y. Wang, F. C. Diao, F. Lu, D. Gao, D. Y. Chen, Z. H. Zhai, and H. B. Shu, “The CXXC finger 5 protein is required for DNA damage-induced p53 activation,” Sci. China, Ser. C Life Sci., vol. 52, no. 6, pp. 528–538, 2009.

[17] X. Wang, P. Liao, X. Fan, Y. Wan, Y. Wang, Y. Li, Z. Jiang, X. Ye, X. Mo, K. Ocorr, Y. Deng, X. Wu, and W. Yuan, “CXXC5 Associates with Smads to Mediate TNF-alpha; Induced Apoptosis,” Curr. Mol. Med., vol. 13, no. 8, pp. 1385–1396, 2013.

[18] X. Yan, J. Wu, Q. Jiang, H. Cheng, J. D. J. Han, and Y. G. Chen, “CXXC5 suppresses hepatocellular carcinoma by promoting TGF-β-induced cell cycle arrest and apoptosis,” J. Mol. Cell Biol., vol. 10, no. 1, pp. 48–59, 2018.

[19] S. Aras, O. Pak, N. Sommer, R. Finley, M. Hü Ttemann, N. Weissmann, and L. I. Grossman, “Oxygen-dependent expression of cytochrome c oxidase subunit 4-2 gene expression is mediated by transcription factors RBPJ, CXXC5 and CHCHD2,” Nucleic Acids Res., vol. 41, no. 4, pp. 2255–2266, 2013.

[20] A. Astori, H. Fredly, T. Aloysius, L. Bullinger, V. M.-D. Mas, P. de la Grange, F. Delhommeau, K. M. Hagen, C. Récher, I. Dusanter-Fourt, S. Knappskog, J. R. Lillehaug, F. Pendino, and Ø. Bruserud, “CXXC5 (Retinoid- Inducible Nuclear Factor, RINF) is a Potential Therapeutic Target in High- Risk Human Acute Myeloid Leukemia.,” Oncotarget, vol. 4, no. 9, pp. 1438– 59

1448, 2013.

[21] S. Tan, H. Li, W. Zhang, Y. Shao, Y. Liu, H. Guan, J. Wu, Y. Kang, J. Zhao, Q. Yu, Y. Gu, K. Ding, M. Zhang, W. Qian, Y. Zhu, H. Cai, C. Chen, P. E. Lobie, X. Zhao, J. Sun, and T. Zhu, “NUDT21 negatively regulates PSMB2 and CXXC5 by alternative polyadenylation and contributes to hepatocellular carcinoma suppression,” , vol. 37, no. 35, pp. 4887–4900, 2018.

[22] S. Knappskog, L. M. Myklebust, C. Busch, T. Aloysius, J. E. Varhaug, P. E. Lønning, J. R. Lillehaug, and F. Pendino, “RINF (CXXC5) is overexpressed in solid tumors and is an unfavorable prognostic factor in breast cancer,” Ann. Oncol., vol. 22, no. 10, pp. 2208–2215, 2011.

[23] L. E. I. Fang, Y. U. Wang, Y. Gao, and X. Chen, “Overexpression of CXXC5 is a strong poor prognostic factor in ER+ breast cancer,” Oncol. Lett., vol. 16, no. 1, pp. 395–401, 2018.

[24] I. Benedetti, A. M. De Marzo, J. Geliebter, and N. Reyes, “CXXC5 expression in prostate cancer: implications for cancer progression,” Int. J. Exp. Pathol., vol. 98, no. 4, pp. 234–243, 2017.

[25] M. D. Allen, C. G. Grummitt, C. Hilcenko, S. Y. Min, L. M. Tonkin, C. M. Johnson, S. M. Freund, M. Bycroft, and A. J. Warren, “Solution structure of the nonmethyl-CpG-binding CXXC domain of the leukaemia-associated MLL histone methyltransferase.,” EMBO J., vol. 25, no. 19, pp. 4503–4512, 2006.

[26] K. S. Voo, D. L. Carlone, B. M. Jacobsen, A. Flodin, and D. G. Skalnik, “Cloning of a mammalian transcriptional activator that binds unmethylated CpG motifs and shares a CXXC domain with DNA methyltransferase, human trithorax, and methyl-CpG binding domain protein 1.,” Mol. Cell. Biol., vol. 20, no. 6, pp. 2108–21, 2000.

[27] C. Xu, C. Bian, R. Lam, A. Dong, and J. Min, “The structural basis for selective binding of non-methylated CpG islands by the CFP1 CXXC domain.,” Nat. Commun., vol. 2, p. 227, 2011.

60

[28] M. Ko, J. An, H. S. Bandukwala, L. Chavez, T. Aijö, W. a Pastor, M. F. Segal, H. Li, K. P. Koh, H. Lähdesmäki, P. G. Hogan, L. Aravind, and A. Rao, “Modulation of TET2 expression and 5-methylcytosine oxidation by the CXXC domain protein IDAX.,” Nature, vol. 497, no. 7447, pp. 122–6, 2013.

[29] H. Zhang, X. Zhang, E. Clark, M. Mulcahey, S. Huang, and Y. G. Shi, “TET1 is a DNA-binding protein that modulates DNA methylation and gene transcription via hydroxylation of 5-methylcytosine.,” Cell Res., vol. 20, no. 12, pp. 1390–1393, 2010.

[30] Y. Xu, C. Xu, A. Kato, W. Tempel, J. G. Abreu, C. Bian, Y. Hu, D. Hu, B. Zhao, T. Cerovina, J. Diao, F. Wu, H. H. He, Q. Cui, E. Clark, C. Ma, A. Barbara, G. J. C. Veenstra, G. Xu, U. B. Kaiser, X. S. Liu, S. P. Sugrue, X. He, J. Min, Y. Kato, and Y. G. Shi, “Tet3 CXXC domain and dioxygenase activity cooperatively regulate key genes for xenopus eye and neural development,” Cell, vol. 151, no. 6, pp. 1200–1213, 2012.

[31] J. C. Zhou, N. P. Blackledge, a. M. Farcas, and R. J. Klose, “Recognition of CpG Island Chromatin by KDM2A Requires Direct and Specific Interaction with Linker DNA,” Mol. Cell. Biol., vol. 32, no. 2, pp. 479–489, 2012.

[32] C. Xu, K. Liu, M. Lei, A. Yang, Y. Li, T. R. Hughes, and J. Min, “DNA Sequence Recognition of Human CXXC Domains and Their Structural Determinants,” Structure, vol. 26, no. 1, p. 85–95.e3, 2018.

[33] C. Frauer, A. Rottach, D. Meilinger, S. Bultmann, K. Fellinger, S. Hasenöder, M. Wang, W. Qin, J. Söding, F. Spada, and H. Leonhardt, “Different binding properties and function of CXXC zinc finger domains in Dnmt1 and Tet1,” PLoS One, vol. 6, no. 2, 2011.

[34] Q. W. Chen, X. Y. Zhu, Y. Y. Li, and Z. Q. Meng, “Epigenetic regulation and cancer (review),” Oncol. Rep., vol. 31, no. 2, pp. 523–532, 2014.

[35] H. M. Rowe, M. Friedli, S. Offner, S. Verp, D. Mesnard, J. Marquis, and T. Aktas, “De novo DNA methylation of endogenous retroviruses is shaped by

61

KRAB-ZFPs / KAP1 and ESET,” vol. 529, pp. 519–529, 2013.

[36] B. E. Bernstein, A. Meissner, and E. S. Lander, “The Mammalian Epigenome,” Cell, vol. 128, no. 4, pp. 669–681, 2007.

[37] P. A. Defossez and I. Stancheva, “Biological functions of methyl-CpG- binding proteins,” Prog. Mol. Biol. Transl. Sci., vol. 101, pp. 377–398, 2011.

[38] Z. D. Smith and A. Meissner, “DNA methylation: roles in mammalian development,” Nat. Rev. Genet., vol. 14, p. 204, Feb. 2013.

[39] S. Saxonov, P. Berg, and D. L. Brutlag, “A genome-wide analysis of CpG dinucleotides in the distinguishes two distinct classes of promoters,” Proc. Natl. Acad. Sci., vol. 103, no. 5, pp. 1412–1417, 2006.

[40] J. P. Thomson, P. J. Skene, J. Selfridge, T. Clouaire, J. Guy, S. Webb, A. R. W. Kerr, A. Deaton, R. Andrews, K. D. James, D. J. Turner, R. Illingworth, and A. Bird, “CpG islands influence chromatin structure via the CpG-binding protein Cfp1,” Nature, vol. 464, no. 7291, pp. 1082–1086, 2010.

[41] M. R. Duff, Jr., J. Grubbs, and E. E. Howell, “Isothermal Titration Calorimetry for Measuring Macromolecule-Ligand Affinity,” J. Vis. Exp., no. 55, pp. 2–5, 2011.

[42] T. Skarina, X. Xu, E. Evdokimova, and A. Savchenko, “High-throughput crystallization screening,” Methods Mol. Biol., vol. 1140, pp. 159–168, 2014.

[43] J. Sharif and H. Koseki, “Hemimethylation: DNA’s lasting odd couple,” Science, vol. 359, no. 6380, pp. 1102–1103, 2018.

[44] A. McPherson, “Protein crystallization in the structural genomics era,” Journal of Structural and Functional Genomics. pp. 3–12, 2004.

[45] A. McPherson and J. A. Gavira, “Introduction to protein crystallization,” Acta Crystallogr. Sect. F Struct. Biol. Commun., vol. 70, no. 1, pp. 2–20, 2014.

[46] Z. Dosztányi, V. Csizmok, P. Tompa, and I. Simon, “IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on 62

estimated energy content,” Bioinformatics, vol. 76, no. 188, pp. 60140–60309, 2005.

[47] V. N. Uversky, “What does it mean to be natively unfolded ?,” vol. 12, no. October 2001, pp. 2–12, 2002.

[48] L. M. Iakoucheva, C. J. Brown, J. D. Lawson, Z. Obradović, and A. K. Dunker, “Intrinsic disorder in cell-signaling and cancer-associated proteins,” J. Mol. Biol., vol. 323, no. 3, pp. 573–584, 2002.

[49] A. K. Dunker, C. J. Brown, J. D. Lawson, L. M. Iakoucheva, and Z. Obradović, “Intrinsic disorder and protein function,” Biochemistry, vol. 41, no. 21, pp. 6573–6582, 2002.

[50] K. J. Roux, D. I. Kim, M. Raida, and B. Burke, “A promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells,” J. Cell Biol., vol. 196, no. 6, pp. 801–810, 2012.

[51] R. Varnaitė and S. A. MacNeill, “Meet the neighbors: Mapping local protein interactomes by proximity-dependent labeling with BioID,” Proteomics, vol. 16, no. 19, pp. 2503–2518, 2016.

[52] E. N. Firat-Karalar and T. Stearns, “Probing mammalian centrosome structure using BioID proximity-dependent biotinylation,” in Methods in Cell Biology, 2015, pp. 153–170.

[53] A. Y. Lai and P. A. Wade, “Cancer biology and NuRD : a multifaceted chromatin remodelling complex,” Nat. Publ. Gr., vol. 11, no. 8, pp. 588–596, 2011.

[54] S. L. Kloet, H. I. Baymaz, M. Makowski, V. Groenewold, P. W. T. C. Jansen, M. Berendsen, H. Niazi, G. J. Kops, and M. Vermeulen, “Towards elucidating the stability, dynamics and architecture of the nucleosome remodeling and deacetylase complex by using quantitative interaction proteomics,” FEBS J., vol. 282, no. 9, pp. 1774–1785, May 2015.

63

[55] B. A. Thompson, V. Tremblay, G. Lin, and D. A. Bochar, “CHD8 Is an ATP- Dependent Chromatin Remodeling Factor That Regulates -Catenin Target Genes,” Mol. Cell. Biol., vol. 28, no. 12, pp. 3894–3904, Jun. 2008.

[56] K. Mayes, Z. Qiu, A. Alhazmi, and J. W. Landry, “ATP-Dependent Chromatin Remodeling Complexes as Novel Targets for Cancer Therapy,” in Advances in Cancer Research, 2014, pp. 183–233.

[57] S. Mandel and I. Gozes, “Activity-dependent neuroprotective protein constitutes a novel element in the SWI/SNF chromatin remodeling complex,” J. Biol. Chem., vol. 282, no. 47, pp. 34448–34456, 2007.

[58] F. Babbio, I. Castiglioni, C. Cassina, M. B. Gariboldi, C. Pistore, E. Magnani, G. Badaracco, E. Monti, and I. M. Bonapace, “Knock-down of methyl CpG- binding protein 2 (MeCP2) causes alterations in cell proliferation and nuclear lamins expression in mammalian cells,” BMC Cell Biol., vol. 13, pp. 1–12, 2012.

[59] G. Therapy, M. B. Vol, M. Goldberg, E. Nili, G. Cojocaru, Y. B. Tzur, M. Brandies, G. Rechavi, Y. Gruenbaum, and J. Amos, “Functional organization of the nuclear lamina,” vol. 4, no. December, pp. 143–158, 1999.

[60] D. A. Compton and D. W. Cleveland, “NuMA is required for the proper completion of mitosis,” J. Cell Biol., vol. 120, no. 4, pp. 947–957, 1993.

[61] M. Price and D. E. Pettijohn, “Redistribution of the Nuclear Mitotic Apparatus Protein (NuMA) during Mitosis and Nuclear Assembly. Properties of purified NuMA protein,” vol. 166, no. 2, pp. 295–311, 1986.

[62] M. Werth, K. Walentin, A. Aue, J. Schonheit, A. Wuebken, N. Pode-Shakked, L. Vilianovitch, B. Erdmann, B. Dekel, M. Bader, J. Barasch, F. Rosenbauer, F. C. Luft, and K. M. Schmidt-Ott, “The transcription factor grainyhead-like 2 regulates the molecular composition of the epithelial apical junctional complex,” Development, vol. 137, no. 22, pp. 3835–3845, Nov. 2010.

64

[63] S. A. Bossone, C. Asselin, A. J. Patel, and K. B. Marcu, “MAZ, a zinc finger protein, binds to c-MYC and C2 gene sequences regulating transcriptional initiation and termination.,” Proc. Natl. Acad. Sci., vol. 89, no. 16, pp. 7452– 7456, Aug. 1992.

[64] P. Folk, F. Půta, and M. Skružný, “Transcriptional coregulator SNW/SKIP: The concealed tie of dissimilar pathways,” Cell. Mol. Life Sci., vol. 61, no. 6, pp. 629–640, 2004.

[65] N. Sato, M. Maeda, M. Sugiyama, S. Ito, T. Hyodo, A. Masuda, N. Tsunoda, T. Kokuryo, M. Hamaguchi, M. Nagino, and T. Senga, “Inhibition of SNW1 association with spliceosomal proteins promotes apoptosis in breast cancer cells,” Cancer Med., vol. 4, no. 2, pp. 268–277, Feb. 2015.

[66] T. C. Roloff, H. H. Ropers, and U. A. Nuber, “BMC Genomics,” BMC Genomics, vol. 4, pp. 1–9, 2003.

[67] H. Shikun He, “Is MeCP2 a Gene Suppressor or Activator?,” J. Biomol. Res. Ther., vol. 03, no. 02, pp. 2–3, 2014.

[68] K. C. Hite, V. H. Adams, and J. C. Hansen, “Recent advances in MeCP2 structure and function,” Biochem. Cell Biol., vol. 87, no. 1, pp. 219–227, Feb. 2009.

[69] T. C. Galvao, “Structure-specific binding of MeCP2 to four-way junction DNA through its methyl CpG-binding domain,” Nucleic Acids Res., vol. 33, no. 20, pp. 6603–6609, Nov. 2005.

[70] X. Nan, H.-H. Ng, C. A. Johnson, C. D. Laherty, B. M. Turner, R. N. Eisenman, and A. Bird, “Transcriptional repression by the methyl-CpG- binding protein MeCP2 involves a histone deacetylase complex,” Nature, vol. 393, no. 6683, pp. 386–389, May 1998.

[71] K. Kokura, S. C. Kaul, R. Wadhwa, T. Nomura, M. M. Khan, T. Shinagawa, T. Yasukawa, C. Colmenares, and S. Ishii, “The Ski Protein Family Is Required for MeCP2-mediated Transcriptional Repression,” J. Biol. Chem.,

65

vol. 276, no. 36, pp. 34115–34121, Sep. 2001.

[72] J. P. Buschdorf and W. H. Stratling, “A WW domain binding region in methyl-CpG-binding protein MeCP2: impact on Rett syndrome,” J. Mol. Med., vol. 82, no. 2, pp. 135–143, Feb. 2004.

[73] C. Bueno, R. Tabares-Seisdedos, J. M. Moraleda, and S. Martinez, “Rett syndrome mutant neural cells lacks MeCP2 immunoreactive bands,” PLoS One, vol. 11, no. 4, pp. 1–14, 2016.

[74] Z. H. Yu, S. M. Lun, R. He, H. P. Tian, H. J. Huang, Q. S. Wang, X. Q. Li, and Y. M. Feng, “Dual function of MAZ mediated by FOXF2 in basal-like breast cancer: Promotion of proliferation and suppression of progression,” Cancer Lett., vol. 402, pp. 142–152, 2017.

[75] S. Vlcek and R. Foisner, “Lamins and lamin-associated proteins in aging and disease,” Curr. Opin. Cell Biol., vol. 19, no. 3, pp. 298–304, 2007.

[76] J. Demmerle, A. J. Koch, and J. M. Holaska, “The nuclear envelope protein emerin binds directly to histone deacetylase 3 (HDAC3) and activates HDAC3 activity,” J. Biol. Chem., vol. 287, no. 26, pp. 22080–22088, 2012.

[77] J. M. Berk, K. E. Tifft, and K. L. Wilson, “The nuclear envelope LEM-domain protein emerin,” Nucleus, vol. 4, no. 4, pp. 298–314, Jul. 2013.

[78] J. M. K. Mislow, J. M. Holaska, M. S. Kim, K. K. Lee, M. Segura-Totten, K. L. Wilson, and E. M. McNally, “Nesprin-1α self-associates and binds directly to emerin and lamin A in vitro,” FEBS Lett., vol. 525, no. 1–3, pp. 135–140, Aug. 2002.

[79] E. Markiewicz, K. Tilgner, N. Barker, M. van de Wetering, H. Clevers, M. Dorobek, I. Hausmanowa-Petrusewicz, F. C. S. Ramaekers, J. L. V Broers, W. M. Blankesteijn, G. Salpingidou, R. G. Wilson, J. A. Ellis, and C. J. Hutchison, “The inner nuclear membrane protein Emerin regulates β-catenin activity by restricting its accumulation in the nucleus,” EMBO J., vol. 25, no. 14, pp. 3275–3285, Jul. 2006.

66

[80] Y. Hirano, M. Segawa, F. S. Ouchi, Y. Yamakawa, K. Furukawa, K. Takeyasu, and T. Horigome, “Dissociation of Emerin from Barrier-to- autointegration Factor Is Regulated through Mitotic Phosphorylation of Emerin in a Xenopus Egg Cell-free System,” J. Biol. Chem., vol. 280, no. 48, pp. 39925–39933, Dec. 2005.

[81] D. Kalderon, B. L. Roberts, W. D. Richardson, and A. E. Smith, “A short amino acid sequence able to specify nuclear location,” Cell, vol. 39, no. 3, pp. 499–509, Dec. 1984.

[82] P. Bashtrykov, G. Jankevicius, A. Smarandache, R. Z. Jurkowska, S. Ragozin, and A. Jeltsch, “Specificity of dnmt1 for methylation of hemimethylated CpG sites resides in its catalytic domain,” Chem. Biol., vol. 19, no. 5, pp. 572–578, 2012.

[83] C. Xu and V. G. Corces, “Nascent DNA methylome mapping reveals inheritance of hemimethylation at CTCF/cohesin sites,” Science, vol. 359, no. 6380, pp. 1166–1170, 2018.

[84] P. Jelinic and P. Shaw, “Loss of imprinting and cancer,” J. Pathol., vol. 211, no. 3, pp. 261–268, Feb. 2007.

67

68

APPENDIX A

VECTOR MAPS pET28-MHL Vector (GenBank accession EF456735)

T7 promoter 4984-5000 N-terminal tag 5071-5124 N-terminal cloning site 5110-5124 C-terminal cloning site 7140-7154 T7 terminator 7254-7300 f1 origin 12-467 aph coding sequence 563-1375 pBR322 origin 2084 lacI coding sequence 3518-4597

https://www.thesgc.org/sites/default/files/toronto_vectors/pET28-MHL.pdf

69 pET28a-LIC Vector (GenBank accession EF442785)

T7 promoter 4984-5000 N-terminal tag coding sequence 5071-5127 N-terminal cloning site 5113-5127 C-terminal cloning site 7143-7157 T7 terminator 7257-7303 f1 origin 12-467 aph coding sequence 563-1375 pBR322 origin 2084 lacI coding sequence 3518-4597 sacB coding sequence 5649-7067

https://www.thesgc.org/sites/default/files/toronto_vectors/pET28a-LIC.pdf

70 pET28-MKH8SUMO Vector (GenBank accession N/A)

T7 promoter 4980-5002 N-terminal tag 5070-5447 Upstream BsaI site 5454-5449 Downstream BseRI 7140-7154 T7 terminator 7254-7300 f1 origin 12-467 KmR coding sequence 562-1374 ColE1 replicon 1485-2099 lacI coding sequence 3517-4596 sacB coding sequence 5969-7387

https://www.thesgc.org/sites/default/files/toronto_vectors/pET28-MKH8SUMO.pdf

71 pcDNA3.1 MCS-BirA(R118G) (Addgene accession #36047)

https://www.addgene.org/36047/

72 pcDNA 3.1 (-) (In vitrogen # V79520)

https://tools.thermofisher.com/content/sfs/vectors/pcdna3.1-.pdf

73

74

APPENDIX B

PRIMER SEQUENCES

Table 1: CXXC5 primers for in-fusion PCR cloning pET28-MHL CXXC5 FP TTGTATTTCCAGGGCTCGAGCCTCGGCGGTGGC* CXXC5 REP CAAGCTTCGTCATCACTGAAACCACCGGAAGGCGGC pET28-MHL CXXC 250-309 FP TTGTATTTCCAGGGCGCCTCTGCCATCAGCTCC CXXC 250-309 REP CAAGCTTCGTCATCACTTCTCCAGAGCAGCGGAAG pET28GST-LIC CXXC5 FP GTTCCGCGTGGTAGTTCGAGCCTCGGCGGTGGC CXXC5 REP CAAGCTTCGTCATCACTGAAACCACCGGAAGGCGGC pET28- CXXC5 FP TTGTATTTCCAGGGCTCGAGCCTCGGCGGTGGC MKH8SUMO CXXC5 REP CAAGCTTCGTCATCACTGAAACCACCGGAAGGCGGC

Table 2: CXXC5 cloning primers for BioID and pcDNA3.1(-) vectors

XhoI_NcoI_NheI_CXX CGCATATACTCGAGATTACCATGGAAGCTAGCATGTCGAGCCT C5 FP CGGCGGTGGCTC EcoRI_PolyA_BamHI_ CGCATGGGATCCTTTATTAGAATTCCTGAAACCACCGGAAGGC CXXC5 REP GG

Table 3: Tag sequences (DNA and aminoacid)

FLAG gattacaaggatgacgacgataag DYKDDDDK 3xFLAG gactacaaagaccatgacggtgattataaagatcatgacatcgactacaaagacgatgacgacaag DYKDHDGDYKDHDIDYKDDDDK HA tacccatacgatgttccagattacgct YPYDVPDYA NLS cccaagaaaaagaggaaggtgggctcacccaagaaaaagaggaaggtggcggctca PKKKRKVGSPKKKRKVGSS

75

Table 4: Primers for generation CXXC5 truncations

FP_NheI_AgeI_1-100_C5 CGCATGCTAGCACCGGTTCGAGCCTCGGCGGTGGCTCC

FP_NheI_AgeI_101-249_C5 CGCATGCTAGCACCGGTTCTGCTGACAAGGCCACTGCG

FP_NheI_AgeI_250-322_C5 CGCATGCTAGCACCGGTGCCTCTGCCATCAGCTCCGGC

REP_BamHI_PolA_EcoRI_1- CGCATGGGATCCTTTATTAGAATTCCTCTCCGCCCATCATGCTGCC 100_C5

REP_BamHI_PolA_EcoRI CGCATGGGATCCTTTATTAGAATTCCAGCTCTCCCTGCATGGG _101-249_C5

*Sequences in bold are specific to the CXXC5 gene

76

APPENDIX C

SECONDARY STRUCTURE PREDICTION OF CXXC5

SPIDER2: Sequence-based prediction of local and non-local structural features of CXXC5 protein

H: Helix E: extended strand in parallel and/or anti-parallel β-sheet conformation

Summary Best template: 4hp1C p-value 4.91e-09 -71(22%) residues are modeled -267(82%) positions predicted as disordered Secondary struct: 12%H, 2%E, 85%C Solvent access: 23%E, 70%M, 6%B

77

78

APPENDIX D

LIST OF ALL PROTEINS IDENTIFIED BY LC/MS-MS ANALSIS

Accession Description ΣCoverage Σ# Σ# Σ# Σ# Proteins Unique Peptides PSMs Peptides

O60814 H2B1K 74,6 3 1 16 53

P42167 LAP2B 46,92 5 10 17 34

Q7LFL8 CXXC5 45,03 12 22 22 108

P42166 LAP2A 43,37 2 18 25 46

G5E9C0 G5E9C0 34,92 11 19 19 27

Q86YP4 P66A 33,97 11 17 18 43

Q14980-2 NUMA1 31,22 22 54 54 62

P19338 NUCL 27,04 6 12 12 23

P05549 AP2A 26,54 14 8 16 38

P37802 TAGL2 26,13 3 3 3 6

P08727 K1C19 25,75 9 3 12 28

P22087 FBRL 24,92 10 6 6 12

P62851 RS25 24 1 3 3 6

Q6ISB3 GRHL2 23,84 2 9 9 22

P22626 ROA2 22,38 3 6 6 11

Q8WXI9 P66B 21,92 1 8 9 21

O43395 PRPF3 21,82 2 12 12 19

P12956 XRCC6 20,69 3 10 10 18

Q86V81 THOC4 20,23 2 3 3 4

Q02543 RL18A 19,89 5 3 3 7

P17844 DDX5 19,87 14 11 11 20

Q9UIG0 BAZ1B 19,82 2 26 26 53

Q13573 SNW1 19,22 4 7 7 12

79

Q9H2P0 ADNP 18,87 2 15 15 26

Q96AG4 LRC59 18,24 1 4 4 7

O60264 SMCA5 17,68 3 15 15 34

P32519 ELF1 17,61 11 7 7 15

P46013 KI67 17,57 3 35 35 66

P39880 CUX1 16,68 11 18 18 33

O94842 TOX4 16,43 10 8 8 17

P55010 IF5 16,01 7 6 6 11

Q9UQR0 SCML2 15,71 3 7 7 15

Q9BR76 COR1B 15,13 4 5 5 10

P50990 TCPQ 14,78 4 6 6 12

Q15365 PCBP1 14,61 2 1 3 6

Q7Z589 EMSY 14,07 13 11 11 17

P31151 S10A7 13,86 1 1 1 2

Q5T3J3 LRIF1 13,26 1 6 6 10

Q96QC0 PP1RA 13,09 1 9 9 14

Q96KM6 Z512B 13 1 7 7 12

Q14839 CHD4 12,45 3 15 19 39

Q9Y265 RUVB1 12,28 5 4 4 7

Q9UHF7 TRPS1 12,18 7 12 12 25

Q8NC51 PAIRB 11,76 4 4 4 6

Q8IYB3 SRRM1 10,95 6 7 7 14

Q9ULH7 MKL2 10,94 6 6 6 11

O60832 DKC1 10,31 6 3 3 6

O43390 HNRPR 9,64 6 4 5 10

Q16630 CPSF6 9,44 4 3 3 6

Q9UQR1 ZN148 9,32 1 4 5 8

P08107 HSP71 9,2 10 3 4 6

P11142 HSP7C 8,82 16 4 5 10

P14923 PLAK 8,59 5 5 5 10

Q6MZP7 LIN54 8,54 5 4 4 8

P51608 MECP2 8,23 4 4 4 8

Q9HCK8 CHD8 8,21 6 12 14 28

80

Q9NYF8 BCLF1 8,15 10 6 6 10

O00712 NFIB 8,1 8 2 2 4

C9JPD0 C9JPD0 7,69 4 1 1 1

Q6P4R8 NFRKB 7,54 4 8 8 14

P51610 HCFC1 7,52 5 12 12 22

Q15233 NONO 7,43 5 3 3 6

P22670 RFX1 7,05 1 4 4 9

Q13263 TIF1B 6,83 3 4 4 8

Q6PJG2 EMSA1 6,6 3 4 4 8

P11388 TOP2A 6,4 5 8 8 11

Q92766 RREB1 6,4 8 8 8 12

Q03164 KMT2A 6,35 5 16 16 30

Q9UGU5 HMGX4 6,16 3 2 2 3

Q9UPN9 TRI33 6,03 4 5 5 9

Q96C00 ZBTB9 5,92 3 2 2 3

Q9NR30 DDX21 5,75 2 4 4 6

Q08211 DHX9 5,75 2 6 6 10

Q14157 UBP2L 5,7 11 4 4 8

Q9H0A0 NAT10 5,46 5 4 4 7

Q9NRL2 BAZ1A 5,14 3 6 6 12

P49116 NR2C2 4,87 3 2 2 4

Q13148 TADBP 4,35 3 1 1 2

Q13416 ORC2 4,33 1 2 2 4

Q8WWQ0 PHIP 4,17 6 5 5 10

Q5QJE6 TDIF2 4,1 1 2 2 4

P12270 TPR 4,06 1 7 7 15

Q9BTC0 DIDO1 3,93 2 6 6 14

Q14676 MDC1 3,88 4 4 4 8

Q13435 SF3B2 3,69 4 3 3 6

Q12830 BPTF 3,61 6 9 9 17

Q9UGU0 TCF20 3,52 2 4 4 6

P42224 STAT1 3,47 4 2 2 4

Q86UE4 LYRIC 3,44 2 1 1 2

81

Q8WYP5 ELYS 3,18 3 4 4 7 A6NHR9 SMHD1 3,14 4 5 5 9 Q02880 TOP2B 3,01 3 3 4 8 P78347 GTF2I 2,91 5 2 2 6 P28347 TEAD1 2,82 14 1 1 2

Q14938 NFIX 2,79 12 1 1 4

Q8WWM7 ATX2L 2,79 9 2 2 4

O96028 NSD2 2,78 7 2 2 3

Q6PKG0 LARP1 2,65 5 2 2 4

Q9Y2K7 KDM2A 2,5 5 2 2 4

Q7Z3K3 POGZ 2,2 6 3 3 6

P35251 RFC1 1,83 3 2 2 3

Q8N163 CCAR2 1,52 3 1 1 2

Q14966 ZN638 1,47 4 2 2 4

Q9UMN6 KMT2B 1,18 1 2 2 4

82

APPENDIX E

IMMUNOPRECIPITATION OF OVEREXPRESSED CXXC5 BY USING VARIOUS CXXC5 ANTIBODIES

The 3xFlag tagged CXXC5 transfected nuclear extract were subjected to co- immunoprecipitation with CXXC5 antibodies from different companies; CST (Cell Signaling Technology, D1O4P) and Abcam (ab106533) or the isotype-matched IgG. The Flag antibody (Sigma, was also used as a positive control. As an input control, 10% of cell lysates was used.

83

84

APPENDIX F

COMPARISON OF TRANSFECTION EFFICIENCY IN MCF7 AND HEK293 CELLS

pAC-GFP vector (2µg/well) was used for transfection of MCF7 and HEK293 cells for 24 h. After 24 h, cells were visualized under fluorescence microscope.

85

86

CURRICULUM VITAE

PERSONAL INFORMATION Surname, Name: Ayaz, Gamze Nationality: Turkish (TC) Date and Place of Birth: 11 September 1986, Ankara Phone: +90 312 210 76 64 Fax: +90 312 210 79 76 Email: [email protected]

EDUCATION Degree Institution Year of Graduation MSc Hacettepe University Biology Department 2012 BSc Hacettepe University Biology Department 2009 High School Bahçelievler Deneme High School, Ankara 2004

RESEARCH EXPERIENCE AND EMPLOYMENT RECORD Visiting PhD Student June 2016 – June 2017 Chromatin Structural Biology & Epigenetics Lab, Structural Genomic Consortium, University of Toronto

Visiting Scientist 4-18 October 2015 Insect Cell Expression and Purification European Molecular Biology Laboratory (EMBL), Grenoble Outstation, France

Teaching Assistant February 2013 – May 2013 Medical Biology Department, Ondokuz Mayıs University Medical School Samsun, Turkey

87

Regulatory Affairs Specialist September 2012 – January 2013 Bio-Gen Pharmaceutical, Ankara, Turkey

Study Site Coordinator September 2011- June 2012 Clinical Trial Department, Atlas Medical Services Staff Outsourcing Ltd., Ankara, Turkey

Summer Internship August 2008 – September 2008 Virology Laboratory, Refik Saydam National Public Health Agency

Summer Internship July 2008 – August 2008 Department of Biochemistry Gulhane Military Medical Academy (GATA), Ankara, Turkey

FOREIGN LANGUAGES Advanced English

PUBLICATIONS 1. Ayaz G, Yaşar P, Tunçbağ N, & Muyan M. (2018) CXXC5 interacting partners identified with BioID includes transcription factors and epigenetic regulators (In preparation). 2. Ayaz G, Yaşar P, Razizadeh N, Karakaya B, Kahraman D, Atalay R, Şahin Ö, Min J & Muyan M. (2018) CXXC5 plays a critical role in estrogen-mediated cellular proliferation (In preparation). 3. Zhang H*, Devoucoux M*, Li L*, Ayaz G*, Cheng H, Tempel W, Dong C, Loppnau P, Cote J and Min J. Structural basis for EPC1-mediated recruitment of MBTD1 into the TIP60/NuA4 acetyltransferase complex. Nucleic Acids Research, NAR-02431-2018 (*Co-first author; in review) 4. Ayaz G*, Yaşar P*, Olgun ÇE, Karakaya B, Kars G, Razizadeh N, Yavuz K, Turan G and Muyan M. Dynamic transcriptional events mediated by estrogen receptor alpha. Frontiers in Bioscience, Landmark, 24,245-276, January 1, 2019 (*Co-first author)

88

5. Yaşar P, Ayaz G and Muyan M (2016). Estradiol-Estrogen Receptor α Mediates the Expression of the CXXC5 Gene through the Estrogen Response Element-Dependent Signaling Pathway. Scientific Reports, 6, 37808; doi: 10.1038/srep37808 (2016) 6. Yaşar P*, Ayaz G*, User SD, Güpür G and Muyan M (2016). Molecular Mechanism of Estrogen-Estrogen Receptor Signaling. Reproductive Medicine and Biology, Reprod Med Biol 2016; 1–17 (DOI: 10.1002/rmb2.12006) (*Co-first author) 7. Muyan M, Güpür G, Yaşar P, Ayaz G, User SD, Kazan HH, Huang Y (2015). Modulation of Estrogen Response Element-Driven Gene Expressions and Cellular Proliferation with Polar Directions by Designer Transcription Regulators. PLoS ONE 10(8): e0136423. (doi:10.1371/journal.pone.0136423) 8. Ayaz G and Muyan M (2015). CARD10 (Caspase recruitment domain family, member 10). Atlas Genet Cytogenet Oncol Haematol. 2015; 19(6):376-378 9. Ayaz G and Muyan M (2014). Modulator of Apoptosis 1 (MOAP1). Atlas Genet Cytogenet Oncol Haematol. 2014;18(9):650-651

SCHOLARSHIPS AND GRANTS Biostruct-X, Grant No: 9711 June 2015 EMBL, Grenoble, France

EMBO Travel Grant September 2014 PEPC9 Practical Course, EMBO

PhD Full Scholarship November 2013 – September 2017 The Scientific and Technological Research Council of Turkey (TUBITAK 2211-A)

International Research Scholarship June 2016 – June 2017 The Scientific and Technological Research Council of Turkey (TUBITAK 2214-A)

89