UNIVERSITY OF CALIFORNIA

Los Angeles

Dissecting transcriptional control by Klf4 in somatic cell reprogramming

A dissertation submitted in partial satisfaction of the

requirements for the degree Doctor of Philosophy

in Biological Chemistry

by

Huajun Zhou

2017

ABSTRACT OF THE DISSERTATION

Dissecting transcriptional control by Klf4 in somatic cell reprogramming

by

Huajun Zhou

Doctor of Philosophy in Biological Chemistry

University of California, Los Angeles, 2017

Professor Gregory S. Payne, Chair

Ectopic expression of four transcription factors, Oct4, Sox2, Klf4, and c-Myc, coverts somatic cells directly into induced pluripotent stem cells (iPSCs), which are functionally equivalent to embryonic stem cells (ESCs). The discovery of iPSC has been reshaping the methodology of disease modeling and drug screening in the past decade, and provides tremendous promise for regenerative medicine. However, the mechanism underlying this conversion process, reprogramming, is not yet fully understood. I aimed to dissect the reprogramming process, by characterizing the functional domains of one reprogramming factor

Klf4. The transcriptional activation domain (TAD) of Klf4 was revealed to be critical for reprogramming. To search for the factors that mediates the functionality of Klf4 TAD, I identified transcriptional coactivators CBP/p300 and complex as the physical interaction partners of Klf4 TAD, and further showed that this interaction is functionally required

ii

for Klf4 mediated transcriptional activation in reprograming. Clathrin heavy chain, initially identified as a physical interaction partner of Klf4 TAD, was shown to be not required for Klf4 transcriptional activation. Clathrin heavy chain was furthered characterized for its potential transcriptional activation activity in CHC-TFE3, a chromosomal fusion discovered in renal cell carcinoma. Clathrin heavy chain, as a conventional vesicle coat involved cellular membrane trafficking, was further studied with its accessory protein Irc6 in budding yeast.

iii

The dissertation of Huajun Zhou is approved.

Michael F. Carey

Siavash K. Kurdistani

Esteban C. Dell'Angelica

Kathrin Plath

Gregory S. Payne, Committee Chair

University of California, Los Angeles

2017

iv

This work is dedicated to my parents. Thank you for your love and support.

v

TABLE OF CONTENTS

TABLE OF CONTENTS ...... vi

FIGURES AND TABLES ...... viii

ACKNOWLEDGEMENTS ...... xi

VITA ...... xiii

CHAPTER 1 ...... 1

Abstract ...... 2

The discovery of iPSC ...... 2

Nuclear reprogramming ...... 4

Molecular mechanisms of reprogramming ...... 5

Epigenetic changes in reprogramming ...... 8

Overview of Oct4, Sox2, and Klf4 ...... 14

Molecular mechanism of transcriptional control by Oct4, Sox2, and Klf4 ...... 16

Concluding remarks ...... 18

References ...... 19

CHAPTER 2 ...... 28

Abstract ...... 29

Introduction ...... 29

Results ...... 31

Discussion ...... 40

Contributions ...... 43

Materials and Methods ...... 44

References ...... 83

vi

CHAPTER 3 ...... 88

Abstract ...... 89

Introduction ...... 89

Results ...... 95

Discussion ...... 100

Materials and Methods ...... 103

References ...... 121

CHAPTER 4 ...... 126

Abstract ...... 127

Introduction ...... 127

Results ...... 129

Discussion ...... 134

Materials and Methods ...... 137

References ...... 155

CHAPTER 5 ...... 157

Functional significance of CBP/p300 in H3K27ac domain establishment ...... 158

Structural role of Mediator in 3D organization of genome ...... 159

Interwoven relationships of multiple factors in enhancer-promoter interaction and

transcriptional activation ...... 160

Concluding remarks ...... 161

References ...... 162

vii

FIGURES AND TABLES

Figure 2-1 Klf4 regions important for reprogramming...... 56

Figure 2-2 TAD sequence requirements for transcriptional activation and reprogramming...... 58

Figure 2-3 bound differentially to wild-type TAD compared to reprogramming defective mutants...... 60

Figure 2-4 Direct and functional interactions of p300/CBP with Klf4 TAD...... 62

Figure 2-5 Direct and functional interactions of Mediator with Klf4 TAD...... 64

Figure 2-6 Genome-wide expression effects of wild-type and I106A Klf4 TAD...... 66

Table 2-1 Number of significantly up- or down- regulated in cells expressing full-length

Klf4 (FL), wild-type (WT) or I106A TAD fused to Klf4 aa210-483 compared to cells expressing tdTomato (TOM) or WT compared to I106A. Significance determined as described in Methods.

...... 68

Figure 2-S1 Characterization of Klf4 constructs...... 69

Figure 2-S2 Klf4 TAD transcriptional activation activity...... 71

Figure 2-S3 Binding of CBP/p300 domains to Klf4 TAD and overexpression of CBP and p300.

...... 73

Figure 2-S4 Purified Mediator complex...... 75

Figure 2-S5 Overexpression of Med25 ACID domain, MED14, and Med15 in Gal4-TAD transcriptional activation assays...... 77

Figure 2-S6 Principal component analysis of the transcriptomes of cells expressing full-length

Klf4 (FL), wild-type (WT) or I106A TAD fused to Klf4 aa210-483, or tdTomato (TOM)...... 79

viii

Table 2-S1 Mass spectrometry result of GST-TAD affinity purification in normalized spectral abundance factor (NSAF). (Spreadsheet appended)...... 81

Table 2-S2 Protein complexes detected by Klf4 TAD affinity binding...... 81

Table 2-S3 Expression changes in selected MET-related genes in MEF expressing Klf4 constructs...... 82

Figure 3-1 Diagrams of current understandings of TFE3 regulation network and CHC-TFE3 fusion proteins...... 107

Figure 3-2 The interaction between Klf4 and clathrin does not appear to be functional in reprogramming and transcriptional activation...... 109

Figure 3-3 CHC 1-579 harbors transcriptional activation activity...... 111

Figure 3-4 Effects of ectopic expression of CHC-TFE3 on cell growth and cell cycle...... 113

Figure 3-S1 CHC inhibition or overexpression in dual luciferase assay...... 115

Figure 3-S2 CHC constructs exhibit transactivation activity in mESCs...... 117

Figure 3-S3 CHC harbors transactivation activity in the context of CHC-TFE3...... 119

Figure 4-1 G protein-like domain of Irc6p does not appear to function dominant-negatively. .. 141

Figure 4-2 Screening and testing for mutations that affects the functionality of Irc6 C terminal region...... 143

Figure 4-3 The last two amino acids of Irc6p are responsible for binding to AP-1 and AP-2. .. 146

Figure 4-S1 chs6Δ irc6Δ::TRP1 exhibits worse cell growth than chs6Δ irc6Δ::HIS3 on CCFW.

...... 148

Figure 4-S2 Immuno-blots of 3xFLAG-Irc6p and Irc6p structure, related to Figure 4-2...... 150

ix

Figure 4-S3 analysis of the negative genetic interaction partners of IRC6...... 152

Table 4-S1 List of IRC6 negative genetic interaction genes from (Costanzo et al., 2016) other studies used for gene ontology analysis. (Spreadsheet appended) ...... 154

Table 4-S2 Strains used in this study. (Spreadsheet appended) ...... 154

x

ACKNOWLEDGEMENTS

I would like to give my sincere acknowledgements to my mentor Gregory S. Payne.

Thank him for accepting me as his student and giving me full freedom to try my ideas and design experiments on my own. Whenever I have questions and concerns, academic or anything else, he was always there to provide advice. He is not only a principle investigator, but also a life coach who is very considerate of his students and does whatever he can to help his students. I would also like to thank Kathrin Plath for giving me opportunities to work on gene regulation in reprogramming, which is one of my favorite research topics. I would like to thank my committee members and all my colleagues for their contributions of ideas on my projects and instructions on experiments.

I would also like to thank my parents for their emotional support in pursuing my PhD degree, and to thank all my friends that I have met at UCLA, without whom my life would be much more boring. I would like to give my special thanks to my ex-boyfriend Jason Diwa, who helped me getting through many tough times.

Finally, I would like to thank the inventor of Pomodoro time management. It’s unfortunate that I didn’t know this time management technique until recently. It greatly increased my productivity and helped me getting through tough dissertation writing, paper reading, and multi-tasking. I wish I could have known this earlier.

Chapter 2 is a version of a manuscript in preparation for publication authored by Huajun

Zhou, Ryan Schmidt, Bernadette Papp, Ajay Vashischt, Robin McKee, Constantinos Chronis,

Suman Kalyan Pradhan, Alissa Minkovsky, Michael Carey, James Wohlschlegel, Gregory

Payne, and Kathrin Plath.

Chapter 3 is a version of a manuscript in preparation for publication authored by Huajun

xi

Zhou and Gregory Payne.

Chapter 4 is a version of a manuscript in preparation for publication authored by Huajun

Zhou and Gregory Payne.

xii

VITA

Education

September, 2006 – June, 2011 Fudan Univerisity

B.S. in Biological Sciences

Honors and Awards

September, 2011 – August, 2015 China Scholarship Council scholarship

May, 2011 Fudan University Outstanding Graduate

October, 2009 FosumPharma Scholarship

October, 2010 Fudan University Outstanding Graduate

October, 2008 Fudan University Excellent Undergraduate Student Scholarship 3rd Prize

October, 2007 Fudan University Excellent Undergraduate Student Scholarship 2nd Prize

xiii

CHAPTER 1

INTRODUCTION

1

Abstract

Induced pluripotent stem cells (iPSCs), which arise by the ectopic expression of several transcription factors in somatic cells, have triggered a surge of attention and follow-up studies in the past decade since its discovery. iPSCs are functionally equivalent to embryonic stem cells

(ESCs) in terms of the ability to differentiate into any cell type of the organism, but appear to be more useful than ESCs, given the convenience of obtaining disease-specific iPSCs directly from patients. iPSCs thus have huge potential in disease modeling, drug screening, and futuristic regenerative medicine. Despite the heady progress in the application field of iPSCs, the mechanism of how the pluripotent cell state is achieved (i.e. reprogramming) is still under intensive investigation. We study the mechanism of reprogramming in order to understand fundamental questions regarding cell plasticity, cell identity, and cell fate.

The discovery of iPSC

In the 1950s, Conrad Waddington proposed a model that illustrates embryonic development as a ball rolling down a hill to its final differentiated state. In this model, the differentiated state is not able to turn into another differentiated state, nor is it able to revert to a pluripotent state (Waddington, 1957). In 1958, however, John Gurdon demonstrated that a somatic nucleus can be “rejuvenated” by generating a frog from an oocyte transplanted with a somatic nucleus, demonstrating that the nucleus of a somatic cell contains all the genetic information required to generate an organism (Gurdon et al., 1958). In 1987, researchers also found that cell fate can be manipulated across different differentiated states, as ectopic expression of MyoD, a master regulator of muscle development, converted mouse embryonic fibroblasts to myoblasts (Davis et al., 1987). Later on, studies showed additional cell fate transitions, such as over-expression of GATA1 converting myeloblasts to megakaryocyte and

2

erythrocyte precursors (Kulessa et al., 1995), and CCAAT/enhancer-binding protein-α (CEBPα) or CEBPβ converting B lymphocytes to macrophages (Xie et al., 2004). These findings inspired

Shinya Yamanaka and his colleagues to hypothesize that there may exist one or several embryonic stem cell master regulator genes that are sufficient to convert a somatic cell to a pluripotent state (Takahashi and Yamanaka, 2016). This finally led to the remarkable discovery of iPSCs, the pluripotent stem cells induced by ectopic expression of defined factors in somatic cells (Takahashi and Yamanaka, 2006). Given the prospect of broad application of iPSC in regenerative medicine, disease modeling, and drug discovery, the past decade has seen the flourishing growth of studies related to iPSC. The first iPSC-derived medical treatment was a retinal epithelium autograft carried out in 2014 (Cyranoski, 2014), and in 2017 a surgery utilizing an HLA-compatible iPSC-derived retinal epithelium transplantation was successfully achieved (Cyranoski, 2017). Even though the regenerative medical application of iPSC is not growing as fast as initially expected due to safety issues concerning the carcinogenic potential of non-fully-reprogrammed iPSC or mutations in iPSC, iPSC has become a powerful tool for disease modeling and drug discovery, especially in combination with high-efficiency gene- editing tool CRISPR/Cas9. Initiatives have been undertaken to establish cell banks of patient- derived iPSC from a wide variety of diseases, ranging from monogenic diseases to complex disorders, and from early-onset to late-onset diseases (reviewed in Avior et al., 2016), providing researchers ever growing opportunities to study diseases. On the other hand, reprogramming per se is an interesting biological process worth studying, as it can help us to understand fundamental questions regarding cell state, cell fate, and cell identity.

3

Nuclear reprogramming

John Gurdon for the first time successfully obtained sexually mature Xenopus laevis from transplantation of a single somatic nucleus into the oocyte, demonstrating that a somatic nucleus contains all the genetic information to make an organism and can be reprogrammed into a pluripotent state, that is, a state capable of generating any cell type of the body (Gurdon et al.,

1958). This somatic cell nuclear transfer (SCNT) technology was successfully implemented in mammalian cells in the late 1990s (Wakayama et al., 1998; Wilmut et al., 1997), and paved way for the discovery of iPSC. In the original paper describing generation of iPSC, mouse embryonic fibroblasts were reprogrammed by retroviral delivery of four transcription factors: Oct4, Sox2,

Klf4, and c-Myc (Takahashi and Yamanaka, 2006). Shortly thereafter, it was shown that human iPSC can be generated with ectopic expression of the same set of transcription factors (Takahashi et al., 2007) or with OCT4, SOX2, NANOG, and LIN28 (Yu et al., 2007), and that the proto- oncogene c-Myc is dispensable for the reprogramming cocktail (Nakagawa et al., 2008; Wernig et al., 2008b). Oct4, Sox2, Klf4, and c-Myc (so-called Yamanaka factors) can be replaced by a variety of genes (reviewed in Theunissen and Jaenisch, 2014). Of note, reprogramming has also been achieved using just small molecule compounds (Hou et al., 2013).

The reprogramming efficiency of iPSC is typically between 0.001-4%, depending on the cells of origin and methods of delivering the genes (reviewed in Robinton and Daley, 2012).

Reprogramming efficiency can be improved with the co-delivery of a variety of genes and by culture in conditions supplemented with defined molecules (reviewed in Ebrahimi, 2015). Even though reprogramming efficiency is generally much lower compared to SCNT (45-50% in ~3.5 days; Markoulaki et al., 2008), the technical difficulty of reprogramming by defined factors is

4

also much lower, and thus it quickly gained popularity and has gradually become a routine approach for iPSC generation and stem cell research.

Molecular mechanisms of reprogramming

Understanding the molecular mechanism of reprogramming helps to elucidate fundamental questions regarding cell fate, cell identity, and cell state. Even though great effort has been directed at the study of the mechanism of reprogramming since the birth of iPSC, the precise molecular mechanism underlying reprogramming is still far from being understood.

Nevertheless, accumulated studies have gradually shaped several models and provide important insights into the molecular mechanism of reprogramming.

Deterministic and stochastic reprogramming

In a typical reprogramming experiment using defined factors, only a small fraction of cells progress to the pluripotent state within a two to four week period, raising the question whether only a subset of the infected cells can become iPSCs, or whether all the infected cells can become iPSCs but with different latency. As an approach to eliminate the heterogeneity of cell populations that result from variations in virus-transduction efficiency, a drug-inducible single OSKM (Oct4, Sox2, Klf4, and c-Myc) cassette expression system was established (Hanna et al., 2009). Even with identical cell types and genomes, cells still reprogram at different rates, but ultimately almost all cells will successfully reprogram upon continued growth, demonstrating that reprogramming is a stochastic process (Hanna et al., 2009). Recently, deterministic reprogramming, that is, cells reprogrammed in a synchronized manner with nearly 100% efficiency, has been discovered in a subset of murine hematopoietic stem cells (Guo et al., 2014), in B cells with transient over-expression of C/EBP⍺ (Di Stefano et al., 2014), and in Mbd3- deleted mouse embryonic fibroblast (Rais et al., 2013). From these studies, it appears that cell-

5

cycle and epigenetic modification are the two underlying factors that contribute to the stochastic reprogramming. These deterministic reprogramming models provide researchers a novel platform to study the mechanism of reprogramming with synchronized cell populations, and thus will facilitate a better understanding of reprogramming.

Two-phase reprogramming

The reprogramming process can be generally divided into two phases, where two major transcriptional wave changes occur. In the early phase that occurs over the first few days of reprogramming, genes involved in proliferation, metabolism, and cytoskeletal organization are up-regulated (Cacchiarelli et al., 2015; Hansson et al., 2012; Polo et al., 2012). A mesenchymal- to-epithelial transition (MET) (Cacchiarelli et al., 2015; Li et al., 2010; Samavarchi-Tehrani et al., 2010) marked by expression of E-cadherin and repression of Snai1 and Snai2 is also observed in this phase, and this transition is necessary for successful reprogramming (Li et al.,

2010). In this process, Klf4 is generally responsible for the activation of epithelial genes, whereas Oct4, Sox2, and c-Myc take charge of repressing most mesenchymal genes (Li et al.,

2010).

The second wave of transcriptional change occurs relatively late in the reprogramming process, separated from the first wave by a period of less obvious transcriptional change (Polo et al., 2012). Genes involved in embryonic development and pluripotency maintenance are expressed in this phase (Brambrink et al., 2008; Polo et al., 2012; Stadtfeld et al., 2008). By single-cell expression analysis, the gene expression program occurs in a more hierarchical manner in this later phase, with Sox2 being the upstream factor for the gene expression hierarchy

(Buganim et al., 2012). The activation of pluripotency-associated genes Utf1, Dppa2, Esrrb, and

Lin28 appears to be a better predictor for full reprogramming than Oct4 (Buganim et al., 2012).

6

The two-phase model is straightforward, as it reveals the trajectory of reprogramming by profiling gene expression, epigenetic states, and genomic binding of transcription factors of cells at different stages. The stochastic nature of reprogramming makes it difficult to obtain synchronized cell population for analysis, but it has been possible to purify cell populations that are in progress to become iPSCs using particular stage-specific cell surface markers such as

Thy1 and SSEA1 (Polo et al., 2012). Analysis of the two-phase model has also been simplified by profiling two intermediate cell groups: cells that express ectopic Yamanaka factors for only

48 hours, and cells “trapped” in a pre-iPSC stage at which cells fail to express pluripotency marker gene Nanog, but maintain a colony-like morphology (Chronis et al., 2017; Mikkelsen et al., 2008). These two cell populations are relatively homogeneous and thus facilitate analysis.

Seesaw model of reprogramming

An interesting phenomenon of reprogramming is that reprogramming factors can be those that are not highly expressed in ESC, as revealed by a study in which Oct4 and Sox2 can be replaced in reprogramming by their respective developmental lineage specifiers (Shu et al.,

2013). In ESCs, Oct4, Sox2, and Nanog occupy their own and each other’s promoters, forming an autoregulatory circuit, which is thought to facilitate the stabilization of pluripotency

(reviewed in Jaenisch and Young, 2008). It was also proposed that pluripotency may be viewed as an inherently “precarious” condition in which mutually exclusive lineage specifiers continually compete to induce their own lineage differentiation, rather than an intrinsically stable

“ground state” (Loh and Lim, 2011). Supporting the latter view, it was found in early studies that overexpression of Oct4 and Sox2 in mESCs can trigger the mesendodermal and neuroectodermal differentiation, respectively (Kopp et al., 2008; Niwa et al., 2000). Oct4 and Sox2 themselves function as lineage specifiers, as they promote their respective lineages (Oct4 for mesendoderm

7

and Sox2 for neuroectorderm) and repress the other, during early stages of embryonic development (Niwa, Miyazaki, & Smith, 2000; Thomson et al., 2011; Z. Wang, Oron, Nelson,

Razis, & Ivanova, 2012). In the seesaw model, based on the assumptions that Sox2 and Oct4 can mutually activate each other, and they promote their own lineage specification and repress the other’s, mathematical modeling predicts a pluripotency state, located between mesendoderm state and neuroectoderm state, that is reachable when mesendodermal genes and ectodermal genes are both properly induced (Shu et al., 2013). Thus, it is not surprising that adjusting the relative expression level of Sox2 and Oct4 can affect the efficiency of reprogramming

(Papapetrou et al., 2009; Yamaguchi, Hirano, Nagata, & Tada, 2011) and the epigenetic properties of iPS cells (Carey et al., 2011). It remains to be understood, however, how the lineage specifiers, in the absence of both Oct4 and Sox2, can finally activate the core pluripotency circuitry composed of Oct4 and Sox2. It is likely that lineage specifiers themselves form a core regulatory circuitry involving Oct4 or Sox2 in their respective cell lineages.

Epigenetic changes in reprogramming

Change of cell identity, cell state, and cell fate, is accompanied by changes in epigenetic states. Nuclear reprogramming resets the somatic epigenome to an ES-like state (Maherali et al.,

2007; Mikkelsen et al., 2008). Manipulation of epigenetic states, either by small molecules or by maneuvering of the expression of epigenetic modifiers, has significant influences on the outcome of reprogramming. In addition, efforts have been made to dissect the step-wise epigenetic changes during the process of reprogramming, in order to capture the interplay between transcription factors and epigenetic modifications that underlie the transcriptional control of reprogramming. Next, I will review these two aspects of epigenetic reprogramming in more detail.

8

Interplay between epigenetic modifiers and reprogramming

Given the association between epigenetic states and cell identity, it is not surprising that modifying the epigenetic states by addition of small molecules or co-expression of enzymes involved in epigenetic modification can affect reprogramming. For example: histone deacetylase

(HDAC) inhibitors Valproic acid, trichostatin A, and butyrate can enhance induction of pluripotency (Huangfu et al., 2008; Mali et al., 2010); histone acetylation transferase inhibitor

C646 hampers reprogramming (Mali et al., 2010); DNA methyltransferase inhibitor 5-aza- cytidine significantly enhances reprogramming (Mali et al., 2010; Mikkelsen et al., 2008); depletion of H3K9 methyltransferases enhances reprogramming as well (Chen et al., 2013;

Sridharan et al., 2013; see Buganim et al., 2013 for a detailed review). As these modifiers may cause large scale changes in the epigenetic state, it remains of interest to dissect which genes are preferentially affected, and why there is such a preference for these genes. For example, H3K79 methyltransferase DOTL1 has an inhibitory effect on reprogramming, as it methylates H3K79 in the gene body of most fibroblast-related genes associated with epithelial to mesenchymal transition, to keep them from being silenced (Onder et al., 2012). Given that H3K79 methylation is associated with the transcriptional activation, it remains to be answered why DOTL1 preferably activates fibroblast genes but not pluripotency genes.

The overall histone modification level of ESC seems to be a “guide” for which histone modifier should be used to enhance reprogramming. For example, global H3K79 methylation level is higher in MEFs than in ESCs (Sridharan et al., 2013). Inhibition of H3K79 methyltransferase DOTL1 during reprogramming may reduce the global H3K79 methylation level so that it will resemble the level in ESCs. Similarly, the overall histone acetylation level is higher in ESCs than in MEFs (Sridharan et al., 2013). Suppression of histone deacetylases

9

(HDACs) by butyrate accelerates reprogramming, possibly because cells will gain a high level of total histone acetylation more quickly. The same relationship also applies to the case of H3K9 trimethylation (Sridharan et al., 2013). However, questions still remain why the overall change of chromatin modification preferentially occurs in specific regions of the genome. Treatment of uninfected MEFs with Valproic acid is not sufficient to reprogram cells (Huangfu et al., 2008), suggesting that ectopic expression of Yamanaka factors may guide where the epigenetic change can happen.

The timing of inhibitor treatment is also important. For Valproic acid or butyrate, reprogramming efficiency is higher when the cells are “pulse” treated at a later stage than at an earlier stage, though both conditions lead to increased reprogramming efficiency (Mali et al.,

2010). A more dramatic difference was observed for 5-aza-cytidine. The early and middle-stage treatment of 5-aza-cytidine caused reduced reprogramming efficiency, whereas late-stage treatment radically enhanced reprogramming (Mikkelsen et al., 2008). Interestingly, DNA methylome does not exhibit significant changes until very late in reprogramming (Polo et al.,

2012). The overall histone acetylation level also seems to only be drastically increased in iPSCs or ESCs, not in the intermediate-stage pre-iPSCs (Sridharan et al., 2013). Thus, the timing of inhibitor treatment may have to coincide with the time when dramatic epigenetic changes occur.

Dissecting step-wise epigenetic changes in reprogramming

In early studies, DNA methylation and two histone marks H3K4me3 and H3K27me3 were profiled in starting cell types, intermediate cell types, and final cell type iPSC or ESC

(Hussein et al., 2014; Koche et al., 2011; Mikkelsen et al., 2008; Polo et al., 2012), to dissect the relationship between epigenetic modification and transcriptional control in reprogramming. In vertebrates, DNA methylation is thought to contribute to the formation of heterochromatin and

10

transcriptional silencing (Rose and Klose, 2014); H3K4me3 is considered as an active chromatin mark that positively regulates transcription by recruiting histone acetyltransferase, nucleosome remodeling enzymes, and TFIID to the transcription start sites (Rose and Klose, 2014;

Ruthenburg et al., 2007), and H3K27me3 is thought to function as a repressive mark that inhibits transcription by promoting chromatin compaction (Aldiri and Vetter, 2012). Even though transcriptional changes of some genes can be correlated with the proposed functionality of

H3K4me3 and H3K27me3 nicely during reprogramming, the overall correlation does not seem strong, as most of the genes that are activated early in reprogramming are those that already have

H3K4me3 marks in mouse embryonic fibroblasts (MEFs), with or without H3K27me3 (Koche et al., 2011; Polo et al., 2012). This suggests that transcription also depends on other chromatin modifications and/or specific contexts. As for DNA methylation, the DNA methylome is reset predominantly late in reprogramming (Polo et al., 2012). A comparable number of genomic loci are methylated and demethylated, and pluripotency-associated genes Nanog and Oct4 are among the demethylated gene list, coinciding with late transcriptional activation of Nanog and Oct4 during reprogramming (Polo et al., 2012). Interestingly, Nanog is reported to recruit Tet1, a methylcytosine hydroxylase involved in DNA demethylation, to the loci of Oct4 and another pluripotency gene Esrrb, to “prime” their expression before the pluripotency establishment

(Costa et al., 2013). As DNA methylation can affect histone modification (reviewed in Rose and

Klose, 2014) and the binding of a large number of transcription factors (Yin et al., 2017), DNA demethylation may be viewed as an active process required for successful reprogramming.

To comprehensively characterize the epigenetic changes that occur during reprogramming, recently a total of 9 chromatin modifications plus a histone variant H3.3 were mapped by chromatin immuno-precipitation sequencing (ChIP-seq) in cells representing 4 stages

11

of reprogramming: MEFs, MEFs in which OSKM are induced for 48 hours, pre-iPSCs, and

ESCs (Chronis et al., 2017). The combinatorial patterns of the 9 histone marks and H3.3 were analyzed by a multivariate Hidden Markov Model, as an approach to annotate the “chromatin states” of the genome (Ernst and Kellis, 2012; Ernst et al., 2011). The “chromatin states” include active and poised promoters, enhancers of varying activity levels, repressed regions, etc. Among all the chromatin states, enhancer states were identified as the most dynamic chromatin states during reprogramming, whereas promoter states appear to be conservative throughout reprogramming, suggesting that enhancer states are tightly associated with cell identity.

In addition to the chromatin modifications and H3.3, ChIP-seq of the four Yamanaka factors and other chromatin regulators was also carried out, to discover the interplay between chromatin states and these transcription factors and regulators (Chronis et al., 2017). Only a small subset of pluripotency-specific enhancers (PE) were occupied by Oct4, Sox2, and Klf4

(OSK) at the 48- stage. In addition, at the 48-hr stage, OSK predominately occupied the MEF- specific enhancers (ME) that are enriched for OSK DNA-binding motifs. ATAC-seq, which determines the chromatin accessibility of the genome (Buenrostro et al., 2013), revealed that these ME are in the “open” chromatin state in MEFs, whereas all PE are “closed” chromatin in

MEFs. This indicates that OSK may preferentially bind to open chromatin. This result is in contrast to another study, in which OSKM mostly bound to closed chromatin lacking any histone modifications at 48-hr stage (Soufi et al., 2012). However, two studies used cells of different species: human versus mouse, indicating a potential difference in reprogramming mechanisms between two species.

The ME, bound or unbound by OSK, already show reduced levels of H3K27ac, a marker of active enhancer (Creyghton et al., 2010), and p300, the histone acetyltransferase that

12

acetylates H3K27 (Jin et al., 2011), at the 48-hr stage. This suggests that OSK mediate the silencing of ME early in reprogramming, directly or indirectly. The direct silencing may be caused by the recruitment of histone deacetylase Hdac1 to the OSK-bound ME, supported by the fact that Hdac1 interacts with Oct4, Sox4, and Klf4, and the ChIP-seq signal of Hdac1 increased at these OSK-bound ME, but not at OSK-unbound ME. The indirect silencing may be a result of

OSK-guided redistribution of ME-bound MEF transcription factors, which may interact with p300, to other genomic loci, evidenced by the ChIP-seq results of those MEF transcription factors at the 48-hr stage. The silencing of ME is important for successful reprogramming, as over-expression of the MEF transcriptional factor Fra1 dramatically hampers reprogramming

(Chronis et al., 2017).

Interestingly, individual expression of Oct4 or Klf4 can also silence ME at the 48-hr stage, as indicated by the reduction of H3K27ac level. However, Oct4 silences both Oct4-bound and unbound ME, while Klf4 only silences Klf4-unbound ME, indicating different molecular mechanisms between these two transcription factors (Chronis et al., 2017). An appealing explanation for this observation is that Klf4 may directly recruit p300 (Evans et al., 2007;

Geiman et al., 2000; Zhang et al., 2010b) to counter-act the down-regulation of H3K27ac at

Klf4-bound ME, but Oct4 may not.

Finally, activation of the PE that are bound by OSK only late in reprogramming also seems critical for the pluripotency establishment. These PE sites have a lower Oct4 motif density than the PE sites engaged by Oct4 early at the 48-hr stage, but are enriched with the motif of

Esrrb, the pluripotency factor activated late in reprogramming. Over-expression of Esrrb along with OSKM drastically enhanced reprogramming, possibly through cooperative binding of Oct4 and Esrrb at those late PE sites early in reprogramming (Chronis et al., 2017).

13

Overview of Oct4, Sox2, and Klf4

Oct4 is one of the earliest discovered transcriptional factors shown to be pivotal for formation of pluripotent stem cells (Nichols et al., 1998). Oct4 alone is sufficient for reprogramming when the overexpression is combined with chemicals (Li et al., 2011; Zhu et al.,

2010), and when Oct4 is fused to the trans-activation domain of VP16 (Wang et al., 2011). In addition, Oct4, along with ASF1A, a histone chaperone that is specifically enriched in oocyte metaphase II, is sufficient to reprogram human adult dermal fibroblasts to iPSC, when cultured with oocyte-specific paracrine growth factor GDF9 (Gonzalez-Muñoz et al., 2014). These studies suggest a fundamental role of Oct4 in pluripotency establishment.

Sox2 was originally discovered as a partner of Oct4 for activating FGF-4, an essential signaling molecule for mouse post-implantation development, by synergistic interaction with

Oct4 on the FGF-4 enhancer in embryonic stem cells (Yuan et al., 1995). Sox2 was later shown to co-regulate multiple genes with Oct4 (Botquin et al., 1998; Nishimoto et al., 1999), and be required for epiblast and extraembryonic ectoderm formation (Avilion et al., 2003). Interestingly, over-expression of Sox2 in ESCs inhibits Oct4:Sox2 target genes, including Oct4, Nanog, and

FGF-4 (Boer et al., 2007), and triggers differentiation into cells that express markers of neuroectoderm, mesoderm, and trophectoderm (Kopp et al., 2008). This inhibition effect requires the C terminus of Sox2, which contains the transcriptional activation domain (Boer et al., 2007).

However, the mechanism of inhibition is still not clear. Later studies show that Sox2 promotes neural ectodermal differentiation and represses mesendodermal differentiation (Thomson et al.,

2011; Wang et al., 2012). In addition, Sox2 plays a fundamental role in neural progenitor stem

14

cell establishment, as over-expression of just Sox2 is sufficient to convert fibroblasts to multipotent neural progenitor stem cells (Ring et al., 2012).

Krüppel-like factor 4 (Klf4) belongs to Krüppel-like family of transcription factors, characterized by three Cys2His2 zinc finger domains at the carboxyl-terminus. Klf4 is also called gut-enriched Krüppel-like factor (GKLF), as it is expressed in epithelial cells of the gastrointestinal tract, the skin, the lungs, and several other organs (Garrett-Sinha et al., 1996;

Shields et al., 1996; Ton-That et al., 1997). Before the discovery of a role in pluripotency maintenance, Klf4 was better known for its association with cancer. Alterations of Klf4 has been found in a multitude of cancer types, and the roles of Klf4 in tumorigenesis have been under intense investigation. In general, Klf4 seems to be a tumor suppressor gene, as it inhibits tumor growth through inhibiting cyclin-dependent kinases and activating cyclin-dependent kinase inhibitors. It also interferes with pathways such as RAS signaling and WNT signaling, which are often associated with oncogenesis. Klf4 also promotes apoptosis in cancer cells, and inhibits epithelial-to-mesenchymal transition (EMT), which is important for metastasis. However, oncogenic activity of Klf4 was also observed in some types of cells, such as skin cells. In addition, while inhibiting the tumor cell growth by activating the cyclin-dependent kinase inhibitors such as p21, Klf4 also suppresses p53. This may transform cells in the absence of p21

(all reviewed in Tetreault et al., 2013). Klf4 was later found highly expressed in embryonic stem cells, and is down-regulated when cells are undergoing differentiation (Bruce et al., 2007; Hall et al., 2009). Klf4 is essential, as knockout of Klf4 is perinatal lethal in mouse (McConnell and

Yang, 2010). Overexpression of Klf4 promotes self-renewal of ESC and is sufficient to reprogram epiblast-derived stem cells (EpiSC) to iPSCs (Guo et al., 2009; Hall et al., 2009; Li et al., 2005) . This phenotype seems to be mainly mediated by Klf4 transactivation of Nanog,

15

which is also a master regulator of pluripotency (Jiang et al., 2008; Silva et al., 2009; Zhang et al., 2010a). Of note, Klf4 can be replaced by Klf2 and Klf5 for cellular reprogramming

(Nakagawa et al., 2008), and these three transcription factors appear to be functionally redundant for pluripotency maintenance as well (Jiang et al., 2008).

Molecular mechanism of transcriptional control by Oct4, Sox2, and Klf4

Oct4 and Sox2

The mechanism of how Oct4 and Sox2 regulate gene expression has been elusive, despite over twenty years since the discovery of those two genes. Given the co-regulation of many target genes by Oct4 and Sox2 in ES cells, researchers often study the mechanism of transcriptional control of these two proteins together, using the promoters/enhancers of their co-target genes

Fgf-4 and Nanog. It has been proposed that histone acetyl-transferase p300 appears to mediate

SOX2:OCT4 transcriptional activation of FGF-4 in ESCs, as ectopic expression of a virus gene

E1A, which sequesters p300, can inhibit this transcriptional activation (Nowling et al., 2003).

However, p300 does not seem to be a for Oct4:Sox2 in targeting Nanog, as co- expression of Oct4 and Sox2 in 293 or NIH3T3 cells, in which p300/CBP remain abundantly expressed, failed to activate a Nanog promoter reporter construct (Rodda et al., 2005). A later study found that p300 can acetylate Sox2, to induce its nuclear exportation and degradation by ubiquitination (Baltus et al., 2009), indicating a negative role of p300 in regulating Sox2.

Recently, efforts have been made to identify the transcriptional coactivator of Oct4:Sox2 biochemically, and surprisingly XPC DNA repair complex (XPC-SED23-CETN2) (Cattoglio et al., 2015; Fong et al., 2011) and dyskerin (DKC1) ribonucleoprotein complex (Fong et al., 2014), but not Mediator complex, a well-known transcriptional coactivator (Fong et al., 2011), were identified as the two coactivators for Oct4:Sox2 to activate Nanog. XPC complex is reported to

16

form a direct interaction with Oct4 and Sox2, and DKC1 complex may be recruited to Oct4 and

Sox2 via XPC complex (Cattoglio et al., 2015). Interestingly, the deletion of the C-terminal region of XPC, which contains all the functional domains, does not seem to affect pluripotency of ESCs (Ito et al., 2014). In addition, Set1a (Fang et al., 2016) and Wdr5 (Ang et al., 2011) , components of human Set1 COMPASS complex that methylates H3K4, were identified to interact with Oct4, suggesting a chromatin remodeling role for Oct4. The seemingly controversial discoveries of the co-activators of Oct4:Sox2 suggest that the transcriptional control by Oct4 and Sox2 may be executed by weak interactions with a number of co-activators and specific cellular and genomic context may affect these interactions.

Klf4

Before the discovery of Klf4 as a pluripotency establisher, Klf4 was better known as a tumor-suppressor gene or an oncogene depending on specific cellular context as mentioned earlier. Klf4 has been found to directly activate cyclin-dependent kinase inhibitor 1A (also called p21) (Rowland et al., 2005), epithelial gene CDH1 (E-cadherin) (Li et al., 2010), and directly suppresses the expression of p53 (Rowland et al., 2005), cyclin D1, D2, B1 (Klaewsongkram et al., 2007; Shie et al., 2000; Yoon and Yang, 2004), and mesenchymal genes SNAI and SLUG (Li et al., 2010; Tiwari et al., 2013; Yori et al., 2011).

The transcriptional activation activity lies in the acidic transcriptional activation domain near the N terminus of Klf4 (Geiman et al., 2000; Yet et al., 1998), possibly mediated by histone acetyltransferase p300 and its paralog CBP (Geiman et al., 2000). Evidence has shown that Klf4 can be acetylated by CBP/p300 at K225 and K229, the mutation of which negatively affects the transcriptional activation activity of Klf4 (Evans et al., 2007; Zhang et al., 2010b).

Phosphorylation of Klf4 S470 in the zinc finger region enhances the interaction between Klf4

17

and p300 (Zhang et al., 2010b). This also seems to be important for the transcriptional activation activity of Klf4, as overexpression of p300 does not further enhance the transactivation activity of Klf4 S470A mutant (Zhang et al., 2010b). The acidic transactivation domain of Klf4 seems to be a hotspot for protein interaction, as small ubiquitin-like protein SUMO-1 and MAP kinases

ERK1 and ERK2 have also been reported to interact with this domain (Du et al., 2010; Kim et al., 2012). Depletion of SUMO-1 reduces the trans-activation activity of Klf4, indicating that

SUMO-1 regulates the transactivation activity of Klf4 (Du et al., 2010). ERK1 and ERK2 can phosphorylate Ser123 of Klf4 and induce its ubiquitination and subsequent degradation (Kim et al., 2012).

In terms of the mechanism of transcriptional repression, it does not seem to be well characterized, but some evidence suggests that it may be mediated by the histone deacetylase

HDAC3 (Evans et al., 2007; Yoon and Yang, 2004). The mechanism how Klf4 switches between transcriptional activation and repression remains unknown.

Concluding remarks

The discovery of iPSC not only has provided a powerful tool for studying human diseases, but also has accelerated the pace of studying the underlying mechanism of cell fate, cell state, and cell identity transition using reprogramming as a model. To further dissect the mechanism of reprogramming factors’ control of transcriptional and epigenetic changes during reprogramming, I focused on investigating protein interaction partners of Klf4, and how they contribute to the pluripotency establishment. Chapter 2 characterizes the functional role of Klf4 transactivation domain in reprogramming, and identifies transcriptional co-activator Mediator complex and CBP/p300 as functional interaction partners of the Klf4 transactivation domain.

Chapter 3 characterizes the roles of clathrin heavy chain, another interaction partner of the Klf4

18

transactivation domain, in the transcriptional control related to oncogenesis. Chapter 4 is an extended study of clathrin heavy chain, focusing on the mechanism of a clathrin accessory protein in controlling clathrin-coated vesicle formation.

References

Aldiri, I., and Vetter, M.L. (2012). PRC2 during vertebrate organogenesis: a complex in transition. Dev. Biol. 367, 91–99.

Ang, Y.-S., Tsai, S.-Y., Lee, D.-F., Monk, J., Su, J., Ratnakumar, K., Ding, J., Ge, Y., Darr, H., Chang, B., et al. (2011). Wdr5 Mediates Self-Renewal and Reprogramming via the Embryonic Stem Cell Core Transcriptional Network. Cell 145, 183–197.

Avilion, A.A., Nicolis, S.K., Pevny, L.H., Perez, L., Vivian, N., and Lovell-Badge, R. (2003). Multipotent cell lineages in early mouse development depend on SOX2 function. Genes Dev. 17, 126–140.

Avior, Y., Sagi, I., and Benvenisty, N. (2016). Pluripotent stem cells in disease modelling and drug discovery. Nature Reviews Molecular Cell Biology 17, 170–182.

Baltus, G.A., Kowalski, M.P., Zhai, H., Tutter, A.V., Quinn, D., Wall, D., and Kadam, S. (2009). Acetylation of sox2 induces its nuclear export in embryonic stem cells. Stem Cells 27, 2175– 2184.

Boer, B., Kopp, J., Mallanna, S., Desler, M., Chakravarthy, H., Wilder, P.J., Bernadt, C., and Rizzino, A. (2007). Elevating the levels of Sox2 in embryonal carcinoma cells and embryonic stem cells inhibits the expression of Sox2:Oct-3/4 target genes. Nucleic Acids Res. 35, 1773– 1786.

Botquin, V., Hess, H., Fuhrmann, G., Anastassiadis, C., Gross, M.K., Vriend, G., and Schöler, H.R. (1998). New POU dimer configuration mediates antagonistic control of an osteopontin preimplantation enhancer by Oct-4 and Sox-2. Genes Dev. 12, 2073–2090.

Brambrink, T., Foreman, R., Welstead, G.G., Lengner, C.J., Wernig, M., Suh, H., and Jaenisch, R. (2008). Sequential expression of pluripotency markers during direct reprogramming of mouse somatic cells. Cell Stem Cell 2, 151–159.

Bruce, S.J., Gardiner, B.B., Burke, L.J., Gongora, M.M., Grimmond, S.M., and Perkins, A.C. (2007). Dynamic transcription programs during ES cell differentiation towards mesoderm in serum versus serum-freeBMP4 culture. BMC Genomics 8, 365.

Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y., and Greenleaf, W.J. (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218.

19

Buganim, Y., Faddah, D.A., and Jaenisch, R. (2013). Mechanisms and models of somatic cell reprogramming. Nat. Rev. Genet. 14, 427–439.

Buganim, Y., Faddah, D.A., Cheng, A.W., Itskovich, E., Markoulaki, S., Ganz, K., Klemm, S.L., van Oudenaarden, A., and Jaenisch, R. (2012). Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222.

Cacchiarelli, D., Trapnell, C., Ziller, M.J., Soumillon, M., Cesana, M., Karnik, R., Donaghey, J., Smith, Z.D., Ratanasirintrawoot, S., Zhang, X., et al. (2015). Integrative Analyses of Human Reprogramming Reveal Dynamic Nature of Induced Pluripotency. Cell 162, 412–424.

Carey, B.W., Markoulaki, S., Hanna, J.H., Faddah, D.A., Buganim, Y., Kim, J., Ganz, K., Steine, E.J., Cassady, J.P., Creyghton, M.P., et al. (2011). Reprogramming factor stoichiometry influences the epigenetic state and biological properties of induced pluripotent stem cells. Cell Stem Cell 9, 588–598.

Cattoglio, C., Zhang, E.T., Grubisic, I., Chiba, K., Fong, Y.W., and Tjian, R. (2015). Functional and mechanistic studies of XPC DNA-repair complex as transcriptional coactivator in embryonic stem cells. Proc. Natl. Acad. Sci. U.S.a. 112, E2317–E2326.

Chen, J., Liu, H., Liu, J., Qi, J., Wei, B., Yang, J., Liang, H., Chen, Y., Chen, J., Wu, Y., et al. (2013). H3K9 methylation is a barrier during somatic cell reprogramming into iPSCs. Nat. Genet. 45, 34–42.

Chronis, C., Fiziev, P., Papp, B., Butz, S., Bonora, G., Sabri, S., Ernst, J., and Plath, K. (2017). Cooperative Binding of Transcription Factors Orchestrates Reprogramming. Cell 168, 442– 459.e20.

Costa, Y., Ding, J., Theunissen, T.W., Faiola, F., Hore, T.A., Shliaha, P.V., Fidalgo, M., Saunders, A., Lawrence, M., Dietmann, S., et al. (2013). NANOG-dependent function of TET1 and TET2 in establishment of pluripotency. Nature 495, 370–374.

Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., et al. (2010). Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. U.S.a. 107, 21931–21936.

Cyranoski, D. (2014). Japanese woman is first recipient of next-generation stem cells. Nature.

Cyranoski, D. (2017). Japanese man is first to receive “reprogrammed” stem cells from another person. Nature.

Davis, R.L., Weintraub, H., and Lassar, A.B. (1987). Expression of a single transfected cDNA converts fibroblasts to myoblasts. Cell 51, 987–1000.

Di Stefano, B., Sardina, J.L., van Oevelen, C., Collombet, S., Kallin, E.M., Vicent, G.P., Lu, J., Thieffry, D., Beato, M., and Graf, T. (2014). C/EBPα poises B cells for rapid reprogramming into induced pluripotent stem cells. Nature 506, 235–239.

20

Du, J.X., McConnell, B.B., and Yang, V.W. (2010). A small ubiquitin-related modifier- interacting motif functions as the transcriptional activation domain of Krüppel-like factor 4. J. Biol. Chem. 285, 28298–28308.

Ebrahimi, B. (2015). Reprogramming barriers and enhancers: strategies to enhance the efficiency and kinetics of induced pluripotency. Cell Regen (Lond) 4, 10.

Ernst, J., and Kellis, M. (2012). ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216.

Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang, X., Wang, L., Issner, R., Coyne, M., et al. (2011). Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49.

Evans, P.M., Zhang, W., Chen, X., Yang, J., Bhakat, K.K., and Liu, C. (2007). Kruppel-like factor 4 is acetylated by p300 and regulates gene transcription via modulation of histone acetylation. J. Biol. Chem. 282, 33994–34002.

Fang, L., Zhang, J., Zhang, H., Yang, X., Jin, X., Zhang, L., Skalnik, D.G., Jin, Y., Zhang, Y., Huang, X., et al. (2016). H3K4 Methyltransferase Set1a Is A Key Oct4 Coactivator Essential for Generation of Oct4 Positive Inner Cell Mass. Stem Cells 34, 565–580.

Fong, Y.W., Ho, J.J., Inouye, C., and Tjian, R. (2014). The dyskerin ribonucleoprotein complex as an OCT4/SOX2 coactivator in embryonic stem cells. Elife 3, e03573.

Fong, Y.W., Inouye, C., Yamaguchi, T., Cattoglio, C., Grubisic, I., and Tjian, R. (2011). A DNA repair complex functions as an Oct4/Sox2 coactivator in embryonic stem cells. Cell 147, 120– 131.

Garrett-Sinha, L.A., Eberspaecher, H., Seldin, M.F., and de Crombrugghe, B. (1996). A gene for a novel zinc-finger protein expressed in differentiated epithelial cells and transiently in certain mesenchymal cells. J. Biol. Chem. 271, 31384–31390.

Geiman, D.E., Ton-That, H., Johnson, J.M., and Yang, V.W. (2000). Transactivation and growth suppression by the gut-enriched Krüppel-like factor (Krüppel-like factor 4) are dependent on acidic amino acid residues and protein-protein interaction. Nucleic Acids Res. 28, 1106–1113.

Gonzalez-Muñoz, E., Arboleda-Estudillo, Y., Otu, H.H., and Cibelli, J.B. (2014). Cell reprogramming. Histone chaperone ASF1A is required for maintenance of pluripotency and cellular reprogramming. Science 345, 822–825.

Guo, G., Yang, J., Nichols, J., Hall, J.S., Eyres, I., Mansfield, W., and Smith, A. (2009). Klf4 reverts developmentally programmed restriction of ground state pluripotency. Development 136, 1063–1069.

Guo, S., Zi, X., Schulz, V.P., Cheng, J., Zhong, M., Koochaki, S.H.J., Megyola, C.M., Pan, X., Heydari, K., Weissman, S.M., et al. (2014). Nonstochastic reprogramming from a privileged somatic cell state. Cell 156, 649–662.

21

GURDON, J.B., ELSDALE, T.R., and FISCHBERG, M. (1958). Sexually mature individuals of Xenopus laevis from the transplantation of single somatic nuclei. Nature 182, 64–65.

Hall, J., Guo, G., Wray, J., Eyres, I., Nichols, J., Grotewold, L., Morfopoulou, S., Humphreys, P., Mansfield, W., Walker, R., et al. (2009). Oct4 and LIF/Stat3 Additively Induce Krüppel Factors to Sustain Embryonic Stem Cell Self-Renewal. Cell Stem Cell 5, 597–609.

Hanna, J., Saha, K., Pando, B., van Zon, J., Lengner, C.J., Creyghton, M.P., van Oudenaarden, A., and Jaenisch, R. (2009). Direct cell reprogramming is a stochastic process amenable to acceleration. Nature 462, 595–601.

Hansson, J., Rafiee, M.R., Reiland, S., Polo, J.M., Gehring, J., Okawa, S., Huber, W., Hochedlinger, K., and Krijgsveld, J. (2012). Highly Coordinated Proteome Dynamics during Reprogramming of Somatic Cells to Pluripotency. Cell Rep 2, 1579–1592.

Hou, P., Li, Y., Zhang, X., Liu, C., Guan, J., Li, H., Zhao, T., Ye, J., Yang, W., Liu, K., et al. (2013). Pluripotent stem cells induced from mouse somatic cells by small-molecule compounds. Science 341, 651–654.

Huangfu, D., Maehr, R., Guo, W., Eijkelenboom, A., Snitow, M., Chen, A.E., and Melton, D.A. (2008). Induction of pluripotent stem cells by defined factors is greatly improved by small- molecule compounds. Nat. Biotechnol. 26, 795–797.

Hussein, S.M.I., Puri, M.C., Tonge, P.D., Benevento, M., Corso, A.J., Clancy, J.L., Mosbergen, R., Li, M., Lee, D.-S., Cloonan, N., et al. (2014). Genome-wide characterization of the routes to pluripotency. Nature 516, 198–206.

Ito, S., Yamane, M., Ohtsuka, S., and Niwa, H. (2014). The C-terminal region of Xpc is dispensable for the transcriptional activity of Oct3/4 in mouse embryonic stem cells. FEBS Lett. 588, 1128–1135.

Jaenisch, R., and Young, R. (2008). Stem cells, the molecular circuitry of pluripotency and nuclear reprogramming. Cell 132, 567–582.

Jiang, J., Chan, Y.-S., Loh, Y.-H., Cai, J., Tong, G.-Q., Lim, C.-A., Robson, P., Zhong, S., and Ng, H.-H. (2008). A core Klf circuitry regulates self-renewal of embryonic stem cells. Nat. Cell Biol. 10, 353–360.

Jin, Q., Yu, L.-R., Wang, L., Zhang, Z., Kasper, L.H., Lee, J.-E., Wang, C., Brindle, P.K., Dent, S.Y.R., and Ge, K. (2011). Distinct roles of GCN5/PCAF-mediated H3K9ac and CBP/p300- mediated H3K18/27ac in transactivation. Embo J. 30, 249–262.

Kim, M.O., Kim, S.-H., Cho, Y.-Y., Nadas, J., Jeong, C.-H., Yao, K., Kim, D.J., Yu, D.-H., Keum, Y.-S., Lee, K.-Y., et al. (2012). ERK1 and ERK2 regulate embryonic stem cell self- renewal through phosphorylation of Klf4. Nat. Struct. Mol. Biol. 19, 283–290.

Klaewsongkram, J., Yang, Y., Golech, S., Katz, J., Kaestner, K.H., and Weng, N.-P. (2007). Krüppel-like factor 4 regulates B cell number and activation-induced B cell proliferation. J.

22

Immunol. 179, 4679–4684.

Koche, R.P., Smith, Z.D., Adli, M., Gu, H., Ku, M., Gnirke, A., Bernstein, B.E., and Meissner, A. (2011). Reprogramming factor expression initiates widespread targeted chromatin remodeling. Cell Stem Cell 8, 96–105.

Kopp, J.L., Ormsbee, B.D., Desler, M., and Rizzino, A. (2008). Small increases in the level of Sox2 trigger the differentiation of mouse embryonic stem cells. Stem Cells 26, 903–911.

Kulessa, H., Frampton, J., and Graf, T. (1995). GATA-1 reprograms avian myelomonocytic cell lines into eosinophils, thromboblasts, and erythroblasts. Genes Dev. 9, 1250–1262.

Li, R., Liang, J., Ni, S., Zhou, T., Qing, X., Li, H., He, W., Chen, J., Li, F., Zhuang, Q., et al. (2010). A mesenchymal-to-epithelial transition initiates and is required for the nuclear reprogramming of mouse fibroblasts. Cell Stem Cell 7, 51–63.

Li, Y., McClintick, J., Zhong, L., Edenberg, H.J., Yoder, M.C., and Chan, R.J. (2005). Murine embryonic stem cell differentiation is promoted by SOCS-3 and inhibited by the zinc finger transcription factor Klf4. Blood 105, 635–637.

Li, Y., Zhang, Q., Yin, X., Yang, W., Du, Y., Hou, P., Ge, J., Liu, C., Zhang, W., Zhang, X., et al. (2011). Generation of iPSCs from mouse fibroblasts with a single gene, Oct4, and small molecules. Cell Res. 21, 196–204.

Loh, K.M., and Lim, B. (2011). A precarious balance: pluripotency factors as lineage specifiers. Cell Stem Cell 8, 363–369.

Maherali, N., Sridharan, R., Xie, W., Utikal, J., Eminli, S., Arnold, K., Stadtfeld, M., Yachechko, R., Tchieu, J., Jaenisch, R., et al. (2007). Directly reprogrammed fibroblasts show global epigenetic remodeling and widespread tissue contribution. Cell Stem Cell 1, 55–70.

Mali, P., Chou, B.K., Yen, J., Ye, Z., Zou, J., Dowey, S., Brodsky, R.A., Ohm, J.E., Yu, W., Baylin, S.B., et al. (2010). Butyrate Greatly Enhances Derivation of Human Induced Pluripotent Stem Cells by Promoting Epigenetic Remodeling and the Expression of Pluripotency-Associated Genes. Stem Cells 28, 713–720.

Markoulaki, S., Meissner, A., and Jaenisch, R. (2008). Somatic cell nuclear transfer and derivation of embryonic stem cells in the mouse. Methods 45, 101–114.

McConnell, B.B., and Yang, V.W. (2010). Mammalian Krüppel-like factors in health and diseases. Physiological Reviews 90, 1337–1381.

Mikkelsen, T.S., Hanna, J., Zhang, X., Ku, M., Wernig, M., Schorderet, P., Bernstein, B.E., Jaenisch, R., Lander, E.S., and Meissner, A. (2008). Dissecting direct reprogramming through integrative genomic analysis. Nature 454, 49–55.

Nakagawa, M., Koyanagi, M., Tanabe, K., Takahashi, K., Ichisaka, T., Aoi, T., Okita, K., Mochiduki, Y., Takizawa, N., and Yamanaka, S. (2008). Generation of induced pluripotent stem

23

cells without Myc from mouse and human fibroblasts. Nat. Biotechnol. 26, 101–106.

Nichols, J., Zevnik, B., Anastassiadis, K., Niwa, H., Klewe-Nebenius, D., Chambers, I., Schöler, H., and Smith, A. (1998). Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell 95, 379–391.

Nishimoto, M., Fukushima, A., Okuda, A., and Muramatsu, M. (1999). The gene for the embryonic stem cell coactivator UTF1 carries a regulatory element which selectively interacts with a complex composed of Oct-3/4 and Sox-2. Mol. Cell. Biol. 19, 5453–5465.

Niwa, H., Miyazaki, J., and Smith, A.G. (2000). Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells. Nat. Genet. 24, 372–376.

Nowling, T., Bernadt, C., Johnson, L., Desler, M., and Rizzino, A. (2003). The co-activator p300 associates physically with and can mediate the action of the distal enhancer of the FGF-4 gene. J. Biol. Chem. 278, 13696–13705.

Onder, T.T., Kara, N., Cherry, A., Sinha, A.U., Zhu, N., Bernt, K.M., Cahan, P., Marcarci, B.O., Unternaehrer, J., Gupta, P.B., et al. (2012). Chromatin-modifying enzymes as modulators of reprogramming. Nature 483, 598–602.

Polo, J.M., Anderssen, E., Walsh, R.M., Schwarz, B.A., Nefzger, C.M., Lim, S.M., Borkent, M., Apostolou, E., Alaei, S., Cloutier, J., et al. (2012). A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632.

Rais, Y., Zviran, A., Geula, S., Gafni, O., Chomsky, E., Viukov, S., Mansour, A.A., Caspi, I., Krupalnik, V., Zerbib, M., et al. (2013). Deterministic direct reprogramming of somatic cells to pluripotency. Nature 502, 65–70.

Ring, K.L., Tong, L.M., Balestra, M.E., Javier, R., Andrews-Zwilling, Y., Li, G., Walker, D., Zhang, W.R., Kreitzer, A.C., and Huang, Y. (2012). Direct reprogramming of mouse and human fibroblasts into multipotent neural stem cells with a single factor. Cell Stem Cell 11, 100–109.

Robinton, D.A., and Daley, G.Q. (2012). The promise of induced pluripotent stem cells in research and therapy. Nature 481, 295–305.

Rodda, D.J., Chew, J.-L., Lim, L.-H., Loh, Y.-H., Wang, B., Ng, H.-H., and Robson, P. (2005). Transcriptional regulation of nanog by OCT4 and SOX2. J. Biol. Chem. 280, 24731–24737.

Rose, N.R., and Klose, R.J. (2014). Understanding the relationship between DNA methylation and histone lysine methylation. Biochim. Biophys. Acta 1839, 1362–1372.

Rowland, B.D., Bernards, R., and Peeper, D.S. (2005). The KLF4 tumour suppressor is a transcriptional repressor of p53 that acts as a context-dependent oncogene. Nat. Cell Biol. 7, 1074–1082.

Ruthenburg, A.J., Allis, C.D., and Wysocka, J. (2007). Methylation of lysine 4 on histone H3: intricacy of writing and reading a single epigenetic mark. Mol. Cell 25, 15–30.

24

Samavarchi-Tehrani, P., Golipour, A., David, L., Sung, H.-K., Beyer, T.A., Datti, A., Woltjen, K., Nagy, A., and Wrana, J.L. (2010). Functional Genomics Reveals a BMP-Driven Mesenchymal-to-Epithelial Transition in the Initiation of Somatic Cell Reprogramming. Cell Stem Cell 7, 64–77.

Shie, J.L., Chen, Z.Y., Fu, M., Pestell, R.G., and Tseng, C.C. (2000). Gut-enriched Krüppel-like factor represses cyclin D1 promoter activity through Sp1 motif. Nucleic Acids Res. 28, 2969– 2976.

Shields, J.M., Christy, R.J., and Yang, V.W. (1996). Identification and characterization of a gene encoding a gut-enriched Krüppel-like factor expressed during growth arrest. J. Biol. Chem. 271, 20009–20017.

Shu, J., Wu, C., Wu, Y., Li, Z., Shao, S., Zhao, W., Tang, X., Yang, H., Shen, L., Zuo, X., et al. (2013). Induction of pluripotency in mouse somatic cells with lineage specifiers. Cell 153, 963– 975.

Silva, J., Nichols, J., Theunissen, T.W., Guo, G., van Oosten, A.L., Barrandon, O., Wray, J., Yamanaka, S., Chambers, I., and Smith, A. (2009). Nanog is the gateway to the pluripotent ground state. Cell 138, 722–737.

Soufi, A., Donahue, G., and Zaret, K.S. (2012). Facilitators and impediments of the pluripotency reprogramming factors' initial engagement with the genome. Cell 151, 994–1004.

Sridharan, R., Gonzales-Cope, M., Chronis, C., Bonora, G., McKee, R., Huang, C., Patel, S., Lopez, D., Mishra, N., Pellegrini, M., et al. (2013). Proteomic and genomic approaches reveal critical functions of H3K9 methylation and heterochromatin protein-1γ in reprogramming to pluripotency. Nat. Cell Biol. 15, 872–882.

Stadtfeld, M., Maherali, N., Breault, D.T., and Hochedlinger, K. (2008). Defining molecular cornerstones during fibroblast to iPS cell reprogramming in mouse. Cell Stem Cell 2, 230–240.

Takahashi, K., and Yamanaka, S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676.

Takahashi, K., and Yamanaka, S. (2016). A decade of transcription factor-mediated reprogramming to pluripotency. Nature Reviews Molecular Cell Biology 17, 183–193.

Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., and Yamanaka, S. (2007). Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861–872.

Tetreault, M.-P., Yang, Y., and Katz, J.P. (2013). Krüppel-like factors in cancer. Nat. Rev. Cancer 13, 701–713.

Theunissen, T.W., and Jaenisch, R. (2014). Molecular control of induced pluripotency. Cell Stem Cell 14, 720–734.

25

Thomson, M., Liu, S.J., Zou, L.-N., Smith, Z., Meissner, A., and Ramanathan, S. (2011). Pluripotency factors in embryonic stem cells regulate differentiation into germ layers. Cell 145, 875–889.

Tiwari, N., Meyer-Schaller, N., Arnold, P., Antoniadis, H., Pachkov, M., van Nimwegen, E., and Christofori, G. (2013). Klf4 Is a Transcriptional Regulator of Genes Critical for EMT, Including Jnk1 (Mapk8). PLoS ONE 8, e57329.

Ton-That, H., Kaestner, K.H., Shields, J.M., Mahatanankoon, C.S., and Yang, V.W. (1997). Expression of the gut-enriched Krüppel-like factor gene during development and intestinal tumorigenesis. FEBS Lett. 419, 239–243.

Waddington, C.H. (1957). The Strategy of the Genes, a Discussion of Some Aspects of Theoretical Biology, by C. H. Waddington,... With an Appendix [Some Physico-chemical Aspects of Biological Organisation] by H. Kacser,...

Wakayama, T., Perry, A.C., Zuccotti, M., Johnson, K.R., and Yanagimachi, R. (1998). Full-term development of mice from enucleated oocytes injected with cumulus cell nuclei. Nature 394, 369–374.

Wang, Y., Chen, J., Hu, J.-L., Wei, X.-X., Qin, D., Gao, J., Zhang, L., Jiang, J., Li, J.-S., Liu, J., et al. (2011). Reprogramming of mouse and human somatic cells by high-performance engineered factors. EMBO Rep. 12, 373–378.

Wang, Z., Oron, E., Nelson, B., Razis, S., and Ivanova, N. (2012). Distinct lineage specification roles for NANOG, OCT4, and SOX2 in human embryonic stem cells. Cell Stem Cell 10, 440– 454.

Wernig, M., Lengner, C.J., Hanna, J., Lodato, M.A., Steine, E., Foreman, R., Staerk, J., Markoulaki, S., and Jaenisch, R. (2008a). A drug-inducible transgenic system for direct reprogramming of multiple somatic cell types. Nat. Biotechnol. 26, 916–924.

Wernig, M., Meissner, A., Cassady, J.P., and Jaenisch, R. (2008b). c-Myc is dispensable for direct reprogramming of mouse fibroblasts. Cell Stem Cell 2, 10–12.

Wilmut, I., Schnieke, A.E., McWhir, J., Kind, A.J., and Campbell, K.H. (1997). Viable offspring derived from fetal and adult mammalian cells. Nature 385, 810–813.

Xie, H., Ye, M., Feng, R., and Graf, T. (2004). Stepwise reprogramming of B cells into macrophages. Cell 117, 663–676.

Yet, S.F., McA'Nulty, M.M., Folta, S.C., Yen, H.W., Yoshizumi, M., Hsieh, C.M., Layne, M.D., Chin, M.T., Wang, H., Perrella, M.A., et al. (1998). Human EZF, a Krüppel-like zinc finger protein, is expressed in vascular endothelial cells and contains transcriptional activation and repression domains. J. Biol. Chem. 273, 1026–1031.

Yin, Y., Morgunova, E., Jolma, A., Kaasinen, E., Sahu, B., Khund-Sayeed, S., Das, P.K., Kivioja, T., Dave, K., Zhong, F., et al. (2017). Impact of cytosine methylation on DNA binding

26

specificities of human transcription factors. Science 356, eaaj2239.

Yoon, H.S., and Yang, V.W. (2004). Requirement of Krüppel-like factor 4 in preventing entry into mitosis following DNA damage. J. Biol. Chem. 279, 5035–5041.

Yori, J.L., Seachrist, D.D., Johnson, E., Lozada, K.L., Abdul-Karim, F.W., Chodosh, L.A., Schiemann, W.P., and Keri, R.A. (2011). Krüppel-like Factor 4 Inhibits Tumorigenic Progression and Metastasis in a Mouse Model of Breast Cancer. Neoplasia 13, 601–IN605.

Yu, J., Vodyanik, M.A., Smuga-Otto, K., Antosiewicz-Bourget, J., Frane, J.L., Tian, S., Nie, J., Jonsdottir, G.A., Ruotti, V., Stewart, R., et al. (2007). Induced pluripotent stem cell lines derived from human somatic cells. Science 318, 1917–1920.

Yuan, H., Corbi, N., Basilico, C., and Dailey, L. (1995). Developmental-specific activity of the FGF-4 enhancer requires the synergistic action of Sox2 and Oct-3. Genes Dev. 9, 2635–2645.

Zhang, P., Andrianakos, R., Yang, Y., Liu, C., and Lu, W. (2010a). Kruppel-like factor 4 (Klf4) prevents embryonic stem (ES) cell differentiation by regulating Nanog gene expression. J. Biol. Chem. 285, 9180–9189.

Zhang, R., Han, M., Zheng, B., Li, Y.-J., Shu, Y.-N., and Wen, J.-K. (2010b). Krüppel-like factor 4 interacts with p300 to activate mitofusin 2 gene expression induced by all-trans retinoic acid in VSMCs. Acta Pharmacol. Sin. 31, 1293–1302.

Zhu, S., Li, W., Zhou, H., Wei, W., Ambasudhan, R., Lin, T., Kim, J., Zhang, K., and Ding, S. (2010). Reprogramming of human primary somatic cells by OCT4 and chemical compounds. Cell Stem Cell 7, 651–655.

27

CHAPTER 2

KLF4 RECRUITS MEDIATOR AND CBP/P300 TO INDUCE REPROGRAMMING

28

Abstract

Expression of transcription factors Oct4, Sox2, c-Myc and Klf4 converts somatic cells into induced pluripotent stem cells (iPSCs), which closely resemble embryonic stem cells

(ESCs). To understand the molecular basis of reprogramming, Klf4 was characterized for domains important for iPSC formation, defining the transcriptional activation domain (TAD) as a key element in reprogramming. The effects of different TAD mutations on reprogramming and transcriptional activation were strongly correlated. In particular, a single residue mutation

(I106A) within the TAD disrupted both reprogramming and transcriptional activation activity measured with reporter constructs and by genome-wide analysis. To determine binding partners of Klf4 TAD involved in reprogramming, wild-type and I106A TADs were used to affinity- purify interacting proteins from mouse ESC nuclear extracts. Histone acetyltransferases CBP and p300 and subunits of the Mediator complex were identified as factors that interacted specifically with the wild-type TAD. Mediator and CBP/p300 facilitated transcriptional activation by Klf4

TAD. My results provide evidence that the Klf4 TAD provides a critical link to transcriptional co-activator machinery to promote gene expression changes required for somatic cell reprogramming.

Introduction

Differentiated cells can be reprogrammed to a pluripotent state by overexpression of four transcription factors - Oct4, Sox2, Klf4, and c-Myc (Takahashi and Yamanaka, 2006). Such induced Pluripotent Stem Cells (iPSCs) closely resemble embryonic stem cells (ESCs), sharing the potential to differentiate into any somatic cell type. This property has led to wide use of iPSCs for disease modeling and drug screening, and also provides enormous promise for regenerative medicine (Avior et al., 2016). Thus, understanding the molecular mechanisms of

29

reprogramming has the potential for broad impact by providing insights into manipulating cell identity and enhancing reprogramming efficiency.

The transcription factor Kruppel-like factor 4 (Klf4) plays a key role in the reprogramming process, although the mechanism of action is complicated and not fully resolved.

Depletion of Klf4 in ESCs, along with functionally redundant Klf family members, leads to loss of self-renewal, suggesting a fundamental requirement for core Klf circuitry in pluripotency

(Jiang et al., 2008). However, the capacity of Klf4 to promote the pluripotent state appears to be at odds with reported anti-proliferative activity and a role in the terminal differentiation of epithelial cells in the gut (McConnell and Yang, 2010; Sen et al., 2012; Shields et al., 1996;

Tetreault et al., 2016). Furthermore, KLF4 has been reported to function as an oncogene or a tumor suppressor, depending upon the type of cancer (Kaczynski et al., 2003). Thus, cellular context appears to serve as a critical determinant of Klf4 function.

Even within the reprogramming process, Klf4 displays a bimodal function profile (Polo et al., 2012). Early in reprogramming, Klf4 is associated with silencing of somatic genes, often in concert with Oct4 and Sox2. In contrast, as reprogramming progresses, Klf4 activates pluripotency genes, initially acting together with Oct4 and Sox2 then shifting to a set of genes predominantly distinct from those occupied by Oct4 and Sox2.

Klf4 contains two well-characterized domains that are required for transcriptional activation (Geiman et al., 2000; Yet et al., 1998). At the C-terminus is a zinc finger DNA binding domain that specifically recognizes a GC-rich motif (Chen et al., 2008; Shields and Yang, 1998).

Near the N-terminus is a transcriptional activation domain (TAD) that is dependent upon acidic and hydrophobic residues (Du et al., 2010; Geiman et al., 2000). However, the functional significance of these and other Klf4 regions in reprogramming has not been established.

30

Here, we demonstrate an important role for the Klf4 TAD in reprogramming and describe reprogramming-defective TAD mutants. These mutants displayed a tight correlation between reprogramming and transcriptional activation in the context of both reporter constructs and genome-wide expression. I identified Mediator complex and CBP/p300 as specific interaction partners of functional TAD. Both Mediator and CBP/p300 promoted TAD transcriptional transactivation. These results provide evidence that transcription activation mediated through

TAD interactions with Mediator and p300 contributes to the role of Klf4 in reprogramming.

Results

Klf4 Regions Required for Reprogramming

To identify regions of Klf4 that function in somatic cell reprogramming, deletion mutants were tested for the ability to replace the full-length protein in an assay for reprogramming of murine embryonic fibroblasts (MEF). Klf4 constructs were expressed in MEFs along with Oct4 and Sox2 and reprogramming was quantified as the number of colonies expressing the pluripotency marker Nanog (Figure 2-1A). All mutants described here exhibited expression levels, viral infection efficiencies, and subcellular localization that were similar to wild-type Klf4

(Figure 2-S1A,B). The N-terminal FLAG epitope used to detect Klf4 did not affect reprogramming efficiency (Figure 2-S1C,D). In addition, virus titration revealed that the levels of infection used in our experiments approach saturation of reprogramming efficiency, suggesting that small fluctuations in viral titer do not account for substantial differences in reprogramming efficiency (Figure 2-S1E)

Several regions within Klf4 were critical for reprogramming activity. Deletion of the highly conserved C-terminal DNA binding domain, consisting of 3 tandem C2H2 zinc fingers, abolished iPSC colony formation (1-396, Figure 2-1B). Deletion of the N-terminal 89 amino

31

acids (aa) did not alter reprogramming efficiency. However, extending the deletion to include the adjacent 21aa led to a dramatic, but not complete, reduction in the number of iPSC colonies

(Figure 2-1B). This region (residues 90-110) contains a well-characterized acidic TAD (Du et al.,

2010; Geiman et al., 2000). Reprogramming activity was further reduced by N-terminal truncation to aa170 and fully eliminated when the truncation reached aa209 (Figure 2-1B).

These results delineate a region required for reprogramming to aa90-209.

Internal deletion of the TAD (aa90-110) reduced reprogramming, however sequential deletions encompassing the subregions between aa111 and 209 were innocuous (Figure 2-1B,

S1F-H). Taken together, these results identify the DNA binding domain and TAD (aa90-110) as critical Klf4 elements in reprogramming and suggest that sequences between aa111 and aa209 provide a type of accessory activity that becomes apparent only in the absence of TAD.

Klf4 TAD exhibits specific reprogramming activity.

To further probe the reprogramming capacity of Klf4 TAD, aa90-110 were appended to the N-terminus of two fragments, one substantially defective in inducing Nanog+ colonies

(aa170-483) and the other (aa210-483) completely defective. When added to Klf4 170-483,

TAD elevated reprogramming to nearly wild-type levels. Klf4 TAD also restored reprogramming activity to aa210-483, though at levels below those of full-length Klf4 (Figure 2-

1C). These results provide evidence that the Klf4 TAD contains activity that is sufficient to induce Nanog+ colonies in the reprogramming assay, but functions more effectively with accessory sequences.

Two well-characterized TADs, from Sp1 and VP16 transcription factors (Courey and

Tjian, 1988; Hirai et al., 2010), were tested in the same assay to assess the specificity of Klf4

TAD reprogramming activity. These activation domains are enriched for distinct classes of

32

amino acids: Sp1 TAD is glutamine-rich, while VP16 TAD is acidic, similar to Klf4 aa90-110.

The VP16 TAD can be subdivided into N- and C-terminal regions, which independently activate transcription (Ikeda et al., 2002). VP16N inhibited the residual reprogramming of aa170-483 and did not restore reprogramming to aa210-483 (Figure 2-1C, D). Sp1 TAD was inactive when added to Klf4 aa210-483 (Figure 2-1C). In the context of aa210-483, VP16C yielded a low level of reprogramming (Figure 2-1D). VP16C core, which is a subregion of the C-terminal TAD that directly contacts transcription coactivators (Jonker et al., 2005; Langlois et al., 2008), was somewhat more active, but still less effective at reprogramming than Klf4 TAD. By these assays the Klf4 TAD functioned most effectively of the tested transcriptional activation domains, consistent with specific function(s) in reprogramming. The partial activity of VP16C and core raise the possibility that these regions may share properties with Klf4 that can contribute to reprogramming.

Klf4 TAD sequences required for transcriptional activation activity.

Characterization of transcriptional activation by Klf4 TAD in specialized cell types has revealed requirements for acidic and hydrophobic residues (Du et al., 2010; Geiman et al., 2000).

Based on these studies, we designed Klf4 TAD mutants and assessed transcriptional activation in

MEF, the starting cell type in the reprogramming assays, and murine ESCs (mESC), which closely resemble iPSC. Klf4 sequences fused to the Gal4 DNA binding domain were expressed in cells together with a reporter plasmid carrying 5 tandem GAL4 DNA binding motifs located upstream of the firefly luciferase gene. All Gal4-Klf4 constructs were expressed at similar levels

(Figure 2-S2A). The aa90-110 region functioned as a potent TAD in both MEF and mESC

(Figure 2-S2B-D), comparable in mESC to VP16N, C, and core (Figure 2-S2C-D). In contrast,

Klf4 aa111-209 displayed very low transcriptional activation (Figure 2-S2B, C). In mESC,

33

mutation of the three glutamate residues in the first half of the region to alanines (3E/A) reduced transactivation ~10-fold, whereas more severe effects were observed with mutations of acidic

(2D/A) or hydrophobic residues (LI/AA, 2L/A) in the second half of the region, even in the case of a single amino acid changes, L101A and I106A (Figure 2-2A, S2C). Similar results were observed in MEFs (Figure 2-S2B). These findings generally conform to previous reports using other cell types, although the Klf4 TAD glutamates appear to be comparatively less important in

MEF and mESC (Du et al., 2010; Geiman et al., 2000).

Klf4 TAD residues important for transcription activation are critical for reprogramming

Klf4 TAD mutations tested in the transcriptional activation assay were assessed for effects on reprogramming. For this purpose, the mutations were introduced into Klf4 aa90-483, which reprograms MEF with an efficiency similar to full-length Klf4 (aa1-483, Figure 2-2B).

The effects of the 2D/A, LI/AA, and 2L/A mutations on reprogramming were similar to what was observed when the TAD was deleted entirely (Figure 2-2B, compare to 111-483, Figure 2-

1B). As with transactivation, the 3EA mutant yielded a level of reprogramming that was intermediate between wild-type and other mutants (Figure 2-2B). Overall, there was a tight correlation between effects of mutations on reprogramming by aa90-483 and on GAL4-fusion transactivation activity, with a Pearson correlation coefficient of 0.94 (Figure 2-S2E).

We also assessed the effects of two TAD mutations (2L/A and I106A) on the ability of the TAD to confer reprogramming activity to Klf4 aa210-483, which is inactive by itself. Both mutations abolished reprogramming conferred by the TAD in this context (Figure 2-2C), mirroring the severe defects in TAD transcriptional activation caused by these mutations (Figure

2-2B).

34

Transcriptional activation activity of native Klf4 correlates with reprogramming efficiency.

The sequence requirements for transcriptional activation were also investigated in the native context of full-length Klf4. In this approach, Klf4 constructs were tested using a reporter plasmid containing five tandem Klf4 binding motifs upstream of the luciferase gene. As shown in Figure 2-2D, expression of full-length Klf4 (aa1-483) increased luciferase levels about 55-fold compared to expression of the negative control, GFP, or a Klf4 construct lacking the DNA binding domain (aa1-396). Similar to the reprogramming results, N-terminal truncation to aa90 had minimal effects on transcriptional activation whereas more extensive truncations, aa170-483 and aa210-483, progressively reduced activity. Fusion of Klf4 TAD (aa90-110) to both aa170-

483 and aa210-483 significantly increased luciferase expression, although the effect was substantially more robust in the case of 170-483. The 2L/A TAD mutation severely compromised activity in all contexts where it was tested. Consistent with results using GAL4 fusions, the transcriptional activation by Klf4 constructs correlated closely with reprogramming activity, with a Pearson correlation coefficient of 0.9337 (p = 0.0015) (Figure 2-2E). Taken together, the common sequence requirements for reprogramming and transcriptional activation by the Klf4 TAD are consistent with a predominant role for transcriptional activation in mediating reprogramming activity.

CBP/p300 and Mediator complex interact with Klf4 TAD.

To probe for Klf4 TAD interaction partners involved in reprogramming, affinity purification and mass spectrometry were employed to identify proteins in mESC nuclear extracts that bound to wild-type but not I106A or 2L/A reprogramming-defective mutants of TAD.

Included in the proteins that displayed the greatest binding differential between wild-type and mutant TAD were the closely related histone acetyltransferases p300 and CREB binding protein

35

(CBP), Mediator subunits, and Pol II subunits (Figure 2-3A, Table 2-S1). Also detected were proteins known to interact with Mediator (Gcn1l1, Trim 11) or p300 (Tcl1a) (Ebmeier and

Taatjes, 2010; Ishikawa et al., 2006; Pekarsky et al., 2008). The SAGA histone acetyltransferase complex and general transcription factors TFIIH, TFIIF, and TFIIE were moderately enriched in the wild-type Klf4 TAD sample (Table 2-S2). Considering that CBP/p300 and Mediator complex displayed strong differential binding and are essential for maintenance of pluripotency (Fang et al., 2014; Kagey et al., 2010), I focused on these transcriptional co-activators. Supporting the mass spectrometry results, p300 and selected Mediator subunits differentially associated with wild-type TAD as monitored by immunoblotting of affinity-purified samples (Figure 2-3B).

Furthermore, both p300 and Mediator were associated with Klf4 that was immunoprecipitated from mESC nuclear extract (Figure 2-3C), providing evidence for interactions at endogenous expression levels in vivo. The interactions detected by co-immunoprecipitation were specific as judged by the lack of association of DNA helicase Zranb3.

CBP/p300 binding and TAD transcription activation.

CBP and p300 histone acetyltransferases are enriched at gene promoters and enhancers, and are associated with active gene transcription (Wang et al., 2009). They play essential and redundant roles in maintaining ESC pluripotency (Fang et al., 2014). CBP/p300 were previously reported to interact with Klf4 (Evans et al., 2007; Geiman et al., 2000; Zhang et al., 2010) and to be required for Klf4 transcriptional activation (Geiman et al., 2000). In addition, over-expression of p300 stimulates Klf4-mediated transcriptional activation (Wang et al., 2012).

Direct binding assays were used to assess interactions between the Klf4 TAD and

CBP/p300. Purified full-length p300 was tested for direct interaction with Klf4 TAD or the

I106A mutant. p300 associated only with wild-type Klf4 TAD (Figure 2-4A), indicating that the

36

direct binding to TAD is dependent on I106. To further define the basis of p300 interaction with

TAD, I assayed TAD binding to purified versions of the CBP/p300 TAZ1, KIX, TAZ2, and

NCBD domains, which have been reported to interact with acidic TADs of p53 and Klf1 that are related in sequence to the Klf4 TAD (Mas et al., 2011; Teufel et al., 2007). Klf4 TAD specifically interacted with TAZ1, KIX, and TAZ2 domains of both p300 and CBP, with TAZ1 and KIX binding most dramatically inhibited by I106A (Figure 2-4B,C). In the more complex milieu of bacterial extracts, TAZ2 also displayed clearly defective binding to I106A TAD

(Figure 2-S3A). By these assays, Klf4 TAD contacts multiple domains in p300/CBP, all sensitive to the reprogramming-defective I106A mutation. Consistent with prior reports that p300 functions in Klf4 transcriptional activation (Geiman et al., 2000; Wang et al., 2012), over- expression of p300 and CBP enhanced Klf4 TAD transactivation activity (Figure 2-4D, S3B).

As shown in Figure 2-1D, VP16C was more active in reprogramming than VP16N, a difference that correlated with specific binding of p300 to VP16C (Figure 2-4A)(Ikeda et al.,

2002). Similarly, p300 over-expression was significantly more effective in stimulating transcriptional activity of VP16 TADC compared to VP16N (Figure 2-4D, S3B). Thus, p300/CBP binding is a feature that corresponds to the reprogramming abilities of Klf4 and the

VP16 C-terminal TADs.

Specific mediator subunits involved in functional interaction with Klf4 TAD.

Mediator is a multifunctional complex that plays important roles in transcriptional activation, in part by bridging transcriptional activators to the basal transcription machinery to facilitate pre-initiation complex assembly (Malik and Roeder, 2010). As predicted by this model, purified Mediator bound to wild-type Klf4 TAD but not the I106A mutant (Figure 2-5A,

S4A,B). To identify which Mediator subunits contact Klf4 TAD, all Mediator subunits were

37

radiolabeled by in vitro translation and tested for specific interaction with Klf4 TAD. Three subunits, Med14, 15, and 25, interacted with wild-type Klf4 but not the I106A mutant or GST alone (Figure 2-5B). In the case of Med25, Klf4 TAD binds to the activator interaction domain

(ACID), a common target for other activators (Figure 2-5C).

As one approach to determine the functional significance of Mediator subunit interactions with Klf4 TAD, I examined the effect of Med25 ACID domain expression on TAD-mediated transcriptional activation (Figure 2-5D, S5A). Increasing levels of Med25 ACID domain inhibited activation by Klf4 TAD and, as previously reported, VP16 TAD (Mittler et al., 2003;

Yang et al., 2004). In contrast, there was no effect of Med25 ACID on activation by the adenovirus E1A TAD, which interacts with Med23 (Boyer et al., 1999; Stevens et al., 2002).

Analogous results were observed using mESC (Figure 2-S5B). As another approach, the consequences of Med14 and Med15 overexpression on Klf4 TAD activity were monitored.

Consistent with a specific functional interaction, overexpressed Med15 significantly enhanced

Klf4 TAD activity whereas there was minimal effect on E1A TAD (Figure 2-5E, S5C). The impact of MED14 overexpression was similar to that of E1A, leaving the significance of Med14 interaction with Klf4 TAD uncertain. Med25 also did not affect Klf4 TAD activity when overexpressed (Figure 2-S5D).

MEF gene targets of Klf4 TAD.

I probed global gene expression changes caused by the reprogramming-inactivating

I106A Klf4 TAD mutation using RNA sequencing. To focus specifically on the effects of the

TAD, I analyzed MEF cells overexpressing wild-type or I106A TAD (aa90-110) fused to reprogramming-deficient Klf4 aa210-483. Cells overexpressing full-length Klf4 and transcriptionally inactive fluorescent protein tdTomato were also included in the analysis.

38

Transcription profiles were obtained after 48 hours of overexpression, a time that corresponds to an early, relatively homogeneous stage in reprogramming (Chronis et al., 2017; Polo et al., 2012;

Soufi et al., 2012). One key role for Klf4 in reprogramming is regulation of genes required for mesenchymal to epithelial transition (MET), an early step in fibroblast reprogramming (Li et al.,

2010). Accordingly, in our experiments, full-length Klf4 induced expression of MET-related genes such as E-cadherin, Pkp1, and Pkp3, and down-regulated genes such as Tgfb3 and Snail2 which oppose MET (Table 2-S3).

Compared to full-length Klf4, the wild-type TAD-aa210-483 fusion elicited a lower extent of global transcriptional change (Table 2-1), consistent with the reduced reprogramming and transcriptional activation activities of the fusion (Figure 2-1C-D, 2D). Overall, fewer genes were significantly up or down-regulated (Table 2-1), and within the set of up- and down- regulated genes in FL, the scale of change was smaller (Figure 2-6A). Most of the up-regulated and down-regulated genes by WT were up and down-regulated by full-length Klf4 (Figure 2-

6B). This trend applied to MET-related genes (Table 2-S3). The I106A mutation further reduced the number and extent of transcriptional changes (Figure 2-6A, Table 2-1, Table 2-S3), although there were a substantial number of gene expression changes that persisted in response to the I106A mutant. Principal component analysis also reflects the significant changes between full-length and wild-type TAD, and the effects of the I106A mutation (Figure 2-S6)

Of the gene expression changes in I106A TAD cells, 136 up-regulated genes were not as highly expressed and 63 down-regulated genes were not as highly repressed, when compared to

WT (Table 2-1 last column). The affected up-regulated genes were significantly enriched for

Gene Ontology terms such as skin development and cell-cell adhesion, that reflect epithelial cell pathways consistent with MET (Figure 2-6D). These terms were also highly enriched in genes

39

up-regulated by full-length Klf4 (Figure 2-6C). In the down-regulated genes affected by I106A, no terms were detected as over-represented. Together, these findings provide evidence that the

Klf4 TAD activates a core set of genes targeted by full-length Klf4. Furthermore, the data suggest that the I106A mutation affects the ability of TAD to activate a subset of these genes including those related to the MET pathway, consistent with the inhibitory effects of I106A on

TAD-mediated reprogramming.

Discussion

Our findings define the Klf4 TAD as an important determinant of somatic cell reprogramming and provide insights into transcriptional mechanisms responsible for the function of Klf4 in generating a pluripotent state. In particular, I identified the broad spectrum coactivators, p300/CBP and the Mediator Complex, as direct interaction partners of Klf4 TAD, with binding dependent on TAD residues critical for transcriptional activation and reprogramming. Direct association with both of these co-activators, known to be required for maintaining the pluripotent state of ESC (Fang et al., 2014; Kagey et al., 2010), mechanistically distinguishes Klf4 from Oct4 and Sox2 roles in the pluripotency transcription (Fong et al., 2011;

Gao et al., 2012; Pardo et al., 2010; Rodda et al., 2005; van den Berg et al., 2010).

Comprehensive characterization of chromatin states and Oct4, Sox2, Klf4, and c-Myc occupancy during reprogramming indicates that Oct4, Sox2, and Klf4 (OSK) shift from somatic to pluripotency enhancers and mediate both somatic enhancer inactivation at early stages and progressive pluripotency enhancer activation (Chronis et al., 2017). Klf4 interactions with p300/CBP and Mediator are well-suited for central roles in these regulatory transitions.

Recruitment of p300/CBP to Klf4-occupied pluripotency and MET gene enhancers earmarks the elements for H3K27 acetylation, a modification that distinguishes active from inactive and

40

poised enhancers. Mediator binding to Klf4 provides a mechanism to bridge Klf4-occupied enhancers to the core transcription machinery at the promoters of target pluripotency and MET genes (Kagey et al., 2010). CBP/p300 also mediates long range enhancer-promoter interactions in ESC (Fang et al., 2014), and interacts with Mediator (Aranda-Orgilles et al., 2016). This network of interactions may be particularly significant at super-enhancers, the extended regions of densely packed active enhancers associated with expression of pluripotency genes in ESC

(Hnisz et al., 2013; Whyte et al., 2013). OSK bind to super-enhancers at early reprogramming stages and increase occupancy at later stages (Chronis et al., 2017). However, whereas Oct4 and

Sox2 are enriched to similar extents at constituent and super-enhancers, Klf4 is preferentially enriched at super-enhancers together with p300, H3K27ac, and Mediator (Whyte et al., 2013), suggesting that Klf4 can serve as a co-activator interaction hub, connecting active enhancers with promoters of genes critical to establishing the pluripotent state.

Klf4 TAD-mediated interactions with p300/CBP and Mediator are also likely to play a role in somatic enhancer inactivation. In the early stages of reprogramming, OSK contribute to somatic enhancer inactivation by redirecting somatic transcription factors from somatic enhancers to OSK-occupied transient and pluripotency enhancers (Chronis et al., 2017). In this process, Klf4 interaction with Mediator at pluripotency enhancers would provide a potent binding platform to promote stable interaction with somatic TF. Similarly, relocated Klf4 has the potential to help sequester p300/CPB acetyltransferases from somatic enhancers, allowing deacetylation H3K27ac, and subsequent enhancer silencing. Consistent with this scenario, in one class of somatic enhancers in MEFs overexpressing Klf4 by itself, H3K27ac levels fell at the subclass not occupied by Klf4 but persisted at occupied enhancers where p300/CBP would be expected to reside with Klf4 (Chronis et al., 2017).

41

I demonstrate direct interaction of Klf4 TAD with at least three sites on p300/CBP and three subunits in Mediator, allowing for high avidity association through multivalent interactions between enhancer-bound Klf4 and the co-activator targets. Klf4 was previously reported to interact with the p300/CBP CH3 domain, which encompasses TAZ2, and weakly with a fragment that spans the NCBD domain (Evans et al., 2009; Geiman et al., 2000). In our analysis, the Klf4 TAD binds to TAZ1, TAZ2, and KIX domains in both p300 and CBP, with binding dependent on TAD residue I106, providing evidence that the interactions are functionally important. The more extensive binding detected in our experiments is likely due to use of purified proteins. The TAD of Klf1 binds the same three CBP domains and NCBD (Mas et al.,

2011). Notably, Klf1 and Klf4 TADs are homologous to each other and to regions in Klf2 and

Klf5, including hydrophobic residues at positions corresponding to Klf4 I106. These sequence similarities suggest that binding to multiple p300/CBP domains is a common feature of these Klf proteins. The multivalent mode of p300/CBP engagement is likely to be important for the pluripotent state, as Klf2 can substitute for Klf4 in reprogramming and Klf2,4, and 5 are functionally redundant in maintaining ESC pluripotency (Jiang et al., 2008; Nakagawa et al.,

2008).

Based on I106-dependent binding to individual subunits, the Klf4 TAD interaction with

Mediator that I discovered is mediated by binding to Med25, Med15, and Med14 subunits. In the case of Med25, Klf4 TAD bound to the ACID domain, a common interaction site for other transcriptional activation domains (reviewed in Landrieu et al., 2015). ACID domain overexpression depressed Klf4 TAD transcriptional activation, providing evidence that the interaction can take place in cells and suggesting that Med25 binding is functionally significant.

The significance of the Med15 interaction was supported by the finding that Klf4 TAD activity

42

was stimulated by overexpression of full-length Med15. In contrast, I did not observe stimulation of Klf4 TAD transcriptional activation upon overexpression of Med14 or Med25. In agreement with our results with Med25, overexpression also does not promote VP16 TAD activity (Mittler et al., 2003; Yang et al., 2004). Differential effects of Mediator subunit overexpression on TAD binding partners could reflect subunit-specific constraints on assembly into the Mediator complex and/or distinct abilities of different subunits to shift levels of particular Mediator isoforms.

Overall, our study provides evidence that p300/CBP and Mediator binding are important mechanisms by which Klf4 TAD functions in somatic cell reprogramming. Further analysis will be needed to assess whether there are additional Klf4 TAD interaction partners that also contribute. Candidates include the SAGA complex and general transcription factors such as

TFIIH that were moderately enriched by Klf4 TAD affinity purification. It is also worth emphasizing that, in addition to the TAD and DNA binding domains, our results clearly implicate other regions of Klf4 in reprogramming. This suggests that Klf4 presents an extensive platform for multiple factors that drive the epigenetic and transcriptional changes necessary to program somatic cells to pluripotency.

Contributions

Figure 2-1 A-C, Figure 2-S1 A-E and Figure 2-S2 B,C were from Ryan Jason Schmidt’s dissertation. The experiment in Figure 2-2B was conducted by Robin Mckee, and the experiments in Figure 2-2C and Figure 2-S1 F-H were conducted by Bernadett Papp. RNA-seq was conducted by Huajun Zhou and Constantinos Chronis. Shan Sabri, Constantinos Chronis, and Huajun Zhou collaborated for the RNA-seq data analysis. PCA analysis was solely conducted by Shan Sabri. Mass spectrometry was conducted by Ajay Vashisht. FLAG-p300

43

purification was conducted by Suman Kalyan Pradhan. Huajun Zhou conducted all the rest of the experiments and analyses.

Materials and Methods

Retrovirus Production

Retroviruses carrying mutant Klf4 constructs were produced according to the protocol of

Takahashi and Yamanaka (Takahashi and Yamanaka, 2006) with minor modifications. Klf4 variants were FLAG-tagged and cloned into pMXs. The Sp1 (NCBI Accession: NP_038700) transactivation domain fusion construct contains residues 145-494. The VP16 (NCBI Accession:

NP_044650) transactivation domain TADn fusion construct contains residues 413-455, TADc fusion contains 453-490, TADc core contains 470-485. For each virus, a 10 cm plate of Plat-E cells at ~40% confluence was transfected with 12.5 ug of plasmid using PEI overnight. The following morning, the transfection mixture was removed and replaced with 8 ml of mES media

(KO DMEM, 15% fetal bovine serum (FBS), recombinant leukemia inhibitory factor (Lif), b- mercaptoethanol, 1x penicillin/streptomycin, L-glutamine, and non-essential amino acids) .

Roughly 24 h later, viral supernatent was collected and stored at 4°C. An additional 8 ml of media was added to the cells and collected the following day. Viral supernatents were pooled, aliquoted, frozen in liquid nitrogen, and stored at -80°C.

Reprogramming

MEFs, harvested from E14.5 embryos, were seeded onto 6-well plates in MEF media

(DMEM, 10% FBS, b-mercaptoethanol, 1x penicillin/streptomycin, L-glutamine, and non- essential amino acids). Roughly 24 h later, viral supernatent was collected and stored at 4°C. An additional 8 ml and allowed to expand to ~50% confluence. For each reprogramming experiment, media was removed and replaced with 1 ml of infection mixture overnight. This

44

mixture contained 250 µl of each viral supernatent (Oct4, Sox2, and Klf4 variant), 250 µl of mES media containing 15% FBS, and 1 µg/ml polybrene. This mixture was replaced the following morning with mES media containing 15% FBS. After 2 days, reprogramming cultures were split

1:5 onto 22x22 mm glass coverslips (Fisher Scientific) and into separate wells to monitor factor expression by Western blotting and immunofluorescence. 5 days after initial viral infection, media was changed to mES media containing 15% KSR instead of 15% FBS. Media was changed every 3 days until the experiment was stopped 12 days post-infection.

Immunoblotting

For each reprogramming experiment, cells in a single well of a 6-well plate were harvested 5 days post-infection for analysis by immunoblotting to monitor protein expression.

Cell pellets were disrupted by sonication in 250 µl lysis buffer containing 1% SDS in 1xPBS with 0.5 mM DTT and cOmplete protease inhibitor (Roche). Lysate was cleared by centrifugation and the supernatant mixed with NuPAGE® 4x LDS sample buffer (ThermoFisher

Scientific) and NuPAGE®10x sample reducing agent NuPAGE® to 1X concentrations and separated on a 4-12% Bis-Tris polyacrylamide gel (Invitrogen). Protein was transferred to nitrocellulose membrane (Whatman) and analyzed by immunoblotting. The following antibodies and dilutions were used: α-FLAG (Sigma, F1804) 1:1,000; α-GAPDH 1:10,000 (Fitzgerald,

10R-G109a), IRDye 800 donkey anti-mouse IgG 1:20,000 (LI-COR). For all the other immunoblots, the following antibodies and dilutions were used, Med6 1:1,000(Santa Cruz, sc-

366562); Cdk8 1:1,000 (Santa Cruz, sc-1521); Med26 1: 200 (Santa Cruz, sc-48776), Med23

1:1,000 (BD Pharmingen, 550429); Med1 1:2,500 (Bethyl Laboratories, A300-793A); p300

1:500 (Santa Cruz, sc-585); Med15 1:1,000 (Abgent, AW5502); Zranb3 1:1,000 (Bethyl

45

Laboratories, A303-033A); Klf4 1:500 (R&D Systems, AF3158); GAL4 antibody 1:1,000 (a gift from Michael Carey).

Immunofluoresence

To monitor infection efficiency, factor expression, and subcellular localization, at 4 days post-infection, cells were split onto 12 mm circle glass coverslips (Fisher Scientific) and analyzed by immunofluorescence. Cells were washed in 1xPBS, fixed with 4% paraformaldehyde in 1xPBS, and permeablized with 0.5% Triton X-100 in 1xPBS. Coverslips were blocked with 0.2% fish skin gelatin, 0.2% Tween-20, and 5% goat serum in 1xPBS.

Antibodies were diluted in blocking buffer and wash steps were carried out with 1xPBS + 0.2%

Tween-20. Coverslips were mounted onto glass slides using Aqua-Poly/Mount (Polysciences).

The following antibodies and dilutions were used: α-FLAG (Sigma, F1804) 1:200; Alexa Fluor

546 goat anti- mouse IgG 1:1,000 (Invitrogen, A-11003). To measure reprogramming, coverslips fixed 12 days post-infection were immunostained for the presence of Nanog as described above, using α-Nanog (Abcam, ab80892; eBioscience 14-5761 clone eBioMLC-51; Cosmo Bio,

RCAB0002P-F) 1:200; Alexa Fluor 488 goat anti-rabbit IgG 1:1,000 (Invitrogen, A-11008).

Nanog+ colonies were counted using an upright fluorescence microscope (Zeiss Axio Imager).

Cell clusters containing at least 5 Nanog+ cells were deemed to be iPS colonies.

Dual Luciferase Assay

Putative transactivation domains and mutants were cloned into pBXG1(Emami and

Carey, 1992) to generate GAL4 fusion proteins. For experiments in Figure 2-S2B,C, 2.5 x 104

V6.5 mES or primary E14.5 MEF cells were added in 750 µl media to a transfection mixture containing 50 µl OPTI- MEM (ThermoFisher Scientific), 7.5 µl 1 mg/ml PEI pH=7.2, 200 ng pBXG1 expression vector, 40 ng pGL4.75 (Promega), and 800 ng G5E4T luc reporter vector

46

(Zhang et al., 2002) in a 24-well plate. Cells were harvested 36 h post-transfection by trypsinization and transferred into a 96-well plate. Experiments were carried out in triplicate for each construct tested. Luciferase readings were made according to the Dual-Luciferase Reporter

Assay System manufacturer’s instructions (Promega, E1910).

For the dual luciferase assay in Figure 2-2A, 2 x 105 V6.5 mESCs were plated per well in a gelatin-treated 6-well plate. 1440 ng G5E4T-luc and 72 ng pGL4.75 (Promega) and 720 ng pBXG1 GAL4 fusion expression plasmids were transfected into each well using Lipofectamine

2000. 40 hours post transfection, cells were trypsinized. 2.5% of the total cells were used for luciferase assay with Dual-Glo luciferase reagent (Promega, E2920). The remainder of the cells were lysed and analyzed by immunoblotting for GAL4-fusion expression.

For dual luciferase assay carried out in 4D and 5E, 4 x 104 HeLa cells were seeded per well in a 24-well plate the day before transfection. 2 ng pBXG1 GAL4 fusion expression vector,

100 ng G5E4T-luc reporter, 2 ng pGL4.75 and over-expression construct (amount indicated in figures) were transfected into cells via Lipofectamine 3000 reagent. Cell were lysed for dual luciferase assay 2 days post-transfection using Dual-Luciferase Reporter Assay System

(Promega, E1910). For dual luciferase assay in Figure 2-5D, similar procedures were carried out, except that 5 ng pRL-TK was used instead of pGL4.75, and cells were harvested 24-hour post- transfection using Dual-Glo luciferase reagent (Promega, E2920). The GAL4-E1A(121-223) is gift from Arnold Berk. Med15 over-expression plasmid was constructed by inserting the mouse

Med15 coding sequence into pcDNA3.1 N-FLAG/Hyg plasmid. MED14, Med25 ACID domain, p300, and CBP over-expression plasmids were gifts from Michael Carey.

For native Klf4 luciferase assay in Figure 2-2D, NIH3T3 cells were seeded at 20,000 cells per well in a 12-well plate the day before transfection. 1600 ng pMX-Klf4 variants, 450 ng

47

pRL-TK, 400 ng pGL4.27 containing 5 tandem Klf4 binding motifs, were transfected into each well with 15 µl 1 mg/ml PEI pH=7.2. Cells were harvested by trypsinization 48 hours later and subjected to luciferase measurement by the Dual-Glo luciferase reagent (Promega E2920).

Production of GST Fusion Proteins

Klf4 wild-type and mutant transactivation domains were cloned into pGEX-4T-1 (GE

Healthcare). Plasmids were transformed into BL21-CodonPlus(DE3)-RIL E.coli (Stratagene) and a single colony was used to inoculate an overnight culture in LB ampicillin. The following morning, the culture was diluted 1:100 and grown at 37°C to OD600~0.8. IPTG was added to a final concentration of 1 mM and the culture was grown overnight at 15°C. The cells were harvested by centrifugation at 6000 rpm for 10 minutes at 4°C, and the resultant pellet were resuspended in lysis buffer containing 1xPBS, 5% glycerol, 1 mM DTT, and protease inhibitor cocktail (Sigma P8340) and disrupted with sonication pulses. After centrifugation, Triton X-100 was added to the supernatant to a final concentration of 0.1%. Lysate was incubated with glutathione sepharose beads (GE Healthcare) for 1 h at 4°C with end-over-end rotation. Beads were washed 3 x 5 minutes with wash buffer containing 1xPBS, 5% glycerol, and 1 mM DTT.

Purified protein was eluted by three consecutive applications of an equal-bead volume of 50mM

Tris-HCl pH 8.0, 100 mM NaCl, 1 mM DTT, and 20% glycerol with 10 mM reduced glutathione. Purification was monitored by Coomassie staining of fractions separated on an SDS-

PAGE gel. Aliquots were frozen in liquid nitrogen and stored at -80°C. mESC Nuclear Extract preparation

V6.5 mESC were cultured in five 10-cm plates with feeder cells, then split into ten 15-cm plates with gelatin coating without feeders for 2 to 3 days. Nuclear extract was prepared according to Dignam et al. (Dignam et al., 1983) with modifications. In brief, ten 15cm plates of

48

confluent V6.5 ES cells (~7-10 x 108 cells) were harvested by trypsinization and centrifugation, then washed once in PBS 1mM EDTA. Cells were resuspended in 5 packed cell pellet volumes of Buffer A (10 mM HEPES pH 7.9, 1.5 mM MgCl2, 10 mM KCl, 0.5 mM DTT, 0.5 mM phenylmethylsulfonyl fluoride (PMSF), and 1x protease inhibitor cocktail (Sigma P8340)) and incubated 10min on ice. Cells were pelleted by centrifuging at 1000g for 10 minutes [you haven’t given the centrifugation conditions in any of the methods above] and gently resuspended in two volumes of Buffer A. Cells were lysed by 35 strokes in a Dounce homogenizer with the tight pestle. Lysates were subjected to centrifugation for 15min at 2000g. The resulting nuclear pellet was resuspended in five volumes of Buffer A, sedimented for 20 min at 4800g, and resuspended in 3ml Buffer C (20 mM HEPES pH 7.9, 25% (v/v) glycerol, 0.42 M NaCl, 1.5 mM

MgCl2, 0.2 mM EDTA, 0.5 mM PMSF, 0.5 mM DTT , and 1x protease inhibitor cocktail) using a Dounce homogenizer with the tight pestle. The suspension was placed on a rotator for 30 minutes at 4°C then cleared by centrifugation at 12100g for 30min. The supernatant was dialyzed against 1L Buffer D (20 mM HEPES pH 7.9, 20% (v/v) glycerol, 0.1 M KCl, 0.2 mM EDTA, 0.5 mM PMSF, and 0.5 mM DTT) for 5 hours in Spectra/Por 12-14,000 MWCO tubing. The final protein concentration was measured by Bradford assay (Bio-Rad).

Affinity Purification

GST fusion proteins were dialyzed four times against 20 mM HEPES pH7.9, 150 mM

NaCl, 20% glycerol, 0.5 mM DTT in a Spectra/Por Dialysis Membrane with MWCO: 12-14,000 for 2 hours at 4°C to remove residual glutathione. Proteins were concentrated to ~6 mg/ml using

Amicon Ultra – 4 Centrifugal Filters, and 2.6 mg of each protein was coupled to 120 µl Affi-Gel

15 resin (Bio-Rad) overnight at 4°C. Resin was washed once with wash buffer (20 mM HEPES pH 7.9, 1 M NaCl, 5% Glycerol, 0.5% TritonX-100, 1 mM DTT), incubated with 12 µl 1 M

49

pH8.0 ethanolamine for 1 hour at 4°C to inactivate unreacted resin, and washed 4 times each with wash buffer then Buffer D (20 mM HEPES pH 7.9, 0.1 M KCl, 0.2 mM EDTA, 20% glycerol, 1 mM DTT, 1 mM PMSF). Coupling was monitored by analyzing flow through by absorbance at A280 as well as SDS-PAGE and staining with Coomassie blue. Coupled resin was incubated with 2.5 mg of V6.5 mES nuclear extract at 4°C for 2 hours via end-over-end rotation.

Resin was washed 4 times with Buffer D, and bound proteins eluted by 2 consecutive treatments with 125 µl 100 mM Tris pH 8.5, 8 M Urea, at room temperature for 5 minutes via end-over-end rotation. 5 µl of each elution was used for immunoblotting and the first eluates were subjected to mass spectrometry.

Mass spectrometry

Purified proteins bound to GST-KLF4-wt and mutant samples were reduced, alkylated and digested on beads by lys-C and trypsin proteases (Kaiser and Wohlschlegel, 2005;

Wohlschlegel, 2009). The peptide mixture was cleaned using C18 tips and fractionated online using a 75 µM inner diameter fritted fused silica capillary column with a 5 µM pulled electrospray tip and packed in-house with 17 cm of 1.9 µM reversed phase particles (ReproSil-Pur

C18-AQ - Dr. Maisch GmbH HPLC). The gradient was delivered by an easy-nLC 1000 ultra high- pressure liquid chromatography (UHPLC) system (Thermo Scientific). MS/MS spectra were collected on a Q-Exactive mass spectrometer (Thermo Scientific) as described (Kelstrup et al.,

2012; Michalski et al., 2011). Data analysis was performed using the ProLuCID and DTASelect2 implemented in the Integrated Proteomics Pipeline - IP2 (Integrated Proteomics Applications,

Inc., San Diego, CA) (Cociorva et al., 2007; Tabb et al., 2002; Xu et al., 2006). Protein and peptide identifications were further filtered with DTASelect and required minimum of two unique peptides per protein and a peptide-level false positive rate of less than 5% as estimated by

50

a decoy database strategy (Elias and Gygi, 2007). Normalized spectral abundance factor (NSAF) values were calculated as described (Florens et al., 2006).

Mediator complex purification and direct binding

3xFLAG-Med29 V6.5 ES cells were cultured in five 10-cm plates with feeder cells, then spit into ten gelatin-coated 15-cm plates without feeders. Following overnight attachment, doxycycline was added to media to a final concentration of 2 µg/ml. After 48 hours, cells were harvested and nuclear extract was prepared as described above, except that the nuclear pellet was not washed in Buffer A and the second centrifugation step was carried out at 17000g for 20 minutes. The nuclear extract in Buffer C was directly incubated with 60 µl packed Anti-FLAG

M2 affinity gel (Sigma, A2220) for 4 hours at 4°C. The gel was washed five times with wash buffer (10 mM HEPES pH7.9, 0.4 M KCl, 1.5 mM MgCl2, 0.05% NP40, 0.5 mM PMSF, and 1x protease inhibitor cocktail), and once with wash buffer containing 0.1 M KCl. Bound protein was eluted by 3 consecutive incubations with 60 µl elution buffer (10mM HEPES pH7.9, 0.3 M KCl,

1.5 mM MgCl2, 0.05% NP40, 0.5 mM PMSF, 1x protease inhibitor cocktail, 300 µg/ml 3xFLAG peptide) for 30 minutes at 4°C. 13 µl eluate and 37 µl binding buffer (20 mM HEPES-KOH pH

7.9, 20% glycerol, 0.1 M KCl, 1.35 mg/ml BSA, 0.05% NP40, 0.5 mM PMSF, and 0.5 mM

DTT, and 1x protease inhibitors cocktail) were added to 30 µl packed glutathione beads pre- bound with GST fusions and incubated for 2 hours at 4°C. Beads were washed four times with wash buffer (20 mM HEPES-KOH pH 7.9, 20% glycerol, 0.1 M NaCl, 0.05% NP40, 1 mM

PMSF, and 1 mM DTT), and eluted twice with 30 µl elution buffer (50 mM Tris-HCl pH 8.0,

100 mM NaCl, 10 mM GSH (reduced), 1 mM DTT, 0.05% NP40).

51

Klf4 co-immunoprecipitation

1.4 mg mESC nuclear extract was treated with 50U Benzonase (Sigma, E1014) in the presence of 2 mM MgCl2 at room temperature for 10 minutes, and then incubated with 10 µg

Klf4 antibody (R&D Systems, AF3158) or goat IgG (Santa Cruz, SC-2028) at 4°C for 1 hour.

Samples were then incubated with 30 µl Gammabound G Sepharose (GE Healthcare) for 1 hour and washed four times with 500 µl wash buffer (20 mM HEPES-KOH pH 7.9, 0.1 M NaCl, 0.1%

NP40, 0.5 mM PMSF, and 0.5 mM DTT) and eluted by three incubations with 30 µl elution buffer (0.1 M Glycine-HCl pH 2.7, 0.1% NP40). Pooled eluates were neutralized with 10 µl 1M

Tris-HCl pH 8.5. p300 Purification and direct binding

FLAG-epitope-tagged human p300 was expressed in Sf9 cells by infecting recombinant baculovirus for 72 hrs. All purification steps were carried out at 4C unless otherwise specified.

Extract was prepared from infected cells by 30 strokes of tight pestle using Kontes glass homogenizer in lysis buffer (20 mM Tris–HCl pH 8.0, 5 mM MgCl2, 10% glycerol, 1 mM

PMSF, 10 mM b-mercaptoethanol, 0.1% Tween 20) containing 0.3 M KCl and the “complete” protease inhibitor cocktail (Roche). Lysate was then clarified by centrifugation for 25 min at

30,000 x g. The supernatant was mixed with buffer-equilibrated M2 anti-FLAG antibody agarose (SIGMA) for 4 hr on a rotating wheel. Beads were washed, 3 times with 0.35 M and, 2 times with 0.15 M KCl buffer. Finally, FLAG-p300 was eluted from the beads with 0.15 M KCl buffer containing 0.1 mg/ml FLAG-peptide at room temperature for 10 mins. Aliquots were kept at -80°C until further use. 6.5 µl p300 protein was added to 30 µl packed glutathione beads pre- bound with GST fusion proteins, and then 43.5 µl 20 mM binding buffer was added (20 mM

HEPES-KOH pH 7.9, 0.1 M NaCl, 20% glycerol, 1.35 mg/ml BSA, 0.05% NP40, 0.5 mM

52

PMSF, and 0.5 mM DTT). Samples were incubated at 4°C for 2 hours and washed four times with 500 µl binding buffer lacking BSA. Proteins were eluted with two treatments of 30 µl elution buffer (50 mM Tris-HCl pH 8.0, 100 mM NaCl, 10 mM GSH (reduced), 1 mM DTT,

0.05% NP40).

CBP/p300 fragment construction and purification

Mouse CBP TAZ1 (aa345-439), TAZ2 (aa1758-1892), KIX (aa586-672), NCBD

(aa2019-2115) and human p300 TAZ1 (aa313–433), KIX (aa564–658), TAZ2 (aa1717–1815),

NCBD (aa2039– 2141), IHD (aa420–520) were cloned into pET-28a to append a 6xHis tag at the N terminus. Expression in 50ml cultures was induced by adding 1mM IPTG and incubating at 15 °C overnight. For the bacterial lysate affinity binding, the cell pellet was sonicated in 1 ml

20 mM HEPES pH 7.9, 0.1 M NaCl, 1 mM PMSF, 1 mM DTT, 0.1% NP40, and 1x protease inhibitor cocktail, including 10 µM Zn(Ac)2 for TAZ1 and TAZ2. After centrifugation, 200 µl supernatant was directly incubated with 30 µl GST-Klf4 TAD coupled to glutathione beads for

1.5 hours at 4°C. Beads were washed four times with sonication buffer and eluted with two treatments of 30 µl elution buffer (50 mM Tris-HCl pH 8.0, 0.1 M NaCl, 10 mM reduced GSH,

0.1% NP40, 1 mM DTT). For protein domain purification, cell pellet was sonicated in 1 ml 20 mM HEPES pH 7.9, 0.5 M NaCl, 1 mM PMSF, 0.5 mM DTT, 0.1% NP40, 20 mM Imidazole and 1x protease inhibitor cocktail, including 10 µM Zn(Ac)2 for TAZ1 and TAZ2. After centrifugation, the supernatant was incubated with 100 µl packed Ni-NTA beads at 4°C for 30 minutes. Beads were washed four times with the sonication buffer without Zn(Ac)2, and once with 20 mM HEPES pH 7.9, 0.1 M NaCl, 0.1% NP40. Proteins were eluted by four treatments of 100 µl Elution buffer (20 mM HEPES pH 7.9, 20% glycerol, 0.1 M NaCl, 250 mM Imidazole,

0.1% NP40, 1 mM PMSF).

53

35S-labeled Mediator subunits in vitro translation

The human or mouse Mediator subunit constructs were provided by Michael Carey, except FLAG-Med6 and FLAG-Med9 were gifts from Joan Conaway & Ronald Conaway

(Addgene plasmid # 15411, 15407), MED13 cDNA (BC140891) and MED16 cDNA

(BC017282) were purchased from TransOmic Technologies, and the Med25 coding sequence was amplified from a mouse cDNA library and cloned into pET-28a vector with an N-terminal

6xHis tag. TNT Quick Coupled Transcription/Translation kit (Promega, #L1170) and EasyTag

L-[35S]-Methionine (PerkinElmer, #NEG709A500UC) was used for in vitro transcription and translation. For each subunit, 7 µl of translation protein mix was added to 20 µl packed glutathione beads pre-bound with GST fusion proteins, and 43 µl 20 mM binding buffer was added (20 mM HEPES-KOH pH 7.9, 0.1 M NaCl, 0.1% NP40, 0.5 mM PMSF, and 0.5 mM

DTT). Samples were incubated at 4°C for 1 hour, washed four times with 500 µl binding buffer, and heated at 100°C in 20 µl 2x Laemmli Sample Buffer for 5 minutes. Samples were analyzed by SDS-PAGE.

Correlation between reprogramming efficiency and transactivation activity

For Figure 2-2E, p value was calculated nonparametrically from a permutation test. In brief, a distribution of Pearson correlation coefficients was generated from 10000 sets of randomized data, and the p value was taken from the position of the real correlation coefficient in the distribution. For Figure 2-S2E, Pearson correlation coefficient r and p were generated based on reprogramming efficiency and the log10 scale of transactivation activity by Prism 7 software.

54

RNA-seq

P2 MEFs from 3 embryos were transduced with pMX-Klf4, 90-110+210-483 WT,

I106A, and tdTomato for 12 hours with Polybrene. Cells were harvested 48 hours post- transduction for extraction of RNA with Tryzol and purified with Qiagen RNeasy Mini Kit as per manufacturer instructions. RNA was treated on column with 30 kunitz units of DNase prior to elution.

Differential gene expression was determined by edgeR package using generalized linear model (GLM) pipeline with the likelihood ratio test method (Robinson et al., 2009). Significance threshold is set as fold change ≧2, and FDR ≦0.05.

55

Figure 2-1 Klf4 regions important for reprogramming.

A) Overview of reprogramming assay. Klf4 wild-type or variants were delivered into MEFs along with Oct4 and Sox2 on retroviral vectors on day 0 (d0). Cells were fixed on day 12-15 and immunostained for pluripotency marker Nanog. B) The diagrammed Klf4 deletion mutants (left) were tested for reprogramming activity (right), expressed as the percentage of Nanog+ colonies compared to wild-type control (1-483). The DNA binding domain (3xZF, green) and characterized TAD (AD, red) are shown. Constructs are labelled by the included or deleted (∆)

Klf4 residue numbers. Graph includes data from separate reprogramming experiments; each sample was normalized to the wild-type control from the same experiment. Error bars represent standard deviation (SD). Error bar for 1-483 was calculated from the experiment exhibiting the largest standard deviation. C) Indicated constructs were tested for reprogramming activity as in

B). Colored regions in the construct diagrams represent Klf4 activation domain aa90-110 (AD; red), VP16 transcriptional activation domain N terminal region aa413-455 (VP16N; purple), and the Sp1 transcriptional activation domain aa145-494 (Sp1; blue). D) Indicated constructs were tested for reprogramming activity, expressed as the number of Nanog+ colonies. Purple regions in the construct diagrams represent the VP16 transcriptional activation N-terminal domain as in

C, the VP16 transcriptional activation C-terminal domain aa453-490 (VP16C), and the core sequence aa470-485 of VP16C (unlabeled). Each shade in the bar graphs represents a technical replicate.

56

pMX-Oct4 Figure 1 A pMX-Sox2 pMX-Klf4 variant

Nanog+ d0 ~2 weeks

MEF iPSC

B AD 3xZF 1-483 AD 1-396 AD 3xZF 90-483 3xZF 111-483 3xZF 170-483 3xZF 210-483 3xZF 290-483 3xZF 90-110 AD 3xZF 111-144 AD 3xZF 145-169 AD 3xZF 170-209

0 50 100 150 % Nanog+ colonies relative to 1-483

Ryan Jason Schmidt C AD 3xZF 1-483

3xZF 170-483

AD 3xZF 90-110+170-483

VP16N 3xZF VP16N+170-483 3xZF 210-483 AD 3xZF 90-110+210-483 VP16N 3xZF VP16N+210-483 Sp1 3xZF Sp1+210-483

0 D 50 100 150 % Nanog+ colonies relative to 1-483 D Ryan Jason Schmidt AD 3xZF 1-483

3xZF 210-483

AD 3xZF 90-110+210-483

VP16N 3xZF VP16N+210-483 RepA VP16C 3xZF VP16C+210-483 RepB 3xZF VP16Ccore+210-483

0 25 50 75 100100200300400 Number of Nanog+ colonies

57

Figure 2-2 TAD sequence requirements for transcriptional activation and reprogramming.

A) Sequence requirements for transcriptional activation by the Klf4 TAD. Wild-type Klf4 TAD

(aa90-110) and indicated mutant sequences were fused to the Gal4 DNA binding domain and tested for transcriptional activation in ESCs using a dual luciferase assay. Left panel: wild-type

(WT) TAD sequence and mutations (red). Right panel: relative transcriptional activation activity, expressed as relative luciferase units (RLU) compared to WT. RLU: relative luciferase units. DBD: Gal4 DNA binding domain. Error bar represent SD. * indicate p value compared to

WT by ratio T-test. * p≤ 0.05, ** p≤ 0.01. B) Sequence requirements for reprogramming by the

Klf4 TAD. The indicated mutations (in red) were introduced into Klf4 aa90-483 and the resulting constructs were tested for reprogramming of MEF as in 2-1A. C) Effects of mutations on TAD-dependent reprogramming. The indicated mutations were introduced into Klf4 aa90-

110 fused to aa210-483 and tested for reprogramming. D) Sequence requirements for transcriptional activation of a Klf4-dependent reporter. The indicated Klf4 constructs were assayed for transcriptional activation in NIH3T3 cells using a dual luciferase assay. Data is normalized to the GFP control. Error bar represents SD. Ratio t-tests were carried out between pairs as indicated. * p≤ 0.05, ** p≤ 0.01, ***p≤ 0.001, ****p≤ 0.0001. E) Correlated effects of the indicated mutations on reprogramming by Klf4 constructs (from Figure 2-1C and 2B,C) and transcriptional transactivation of a Klf4-dependent reporter (Figure 2-2D). Error bars represent

SD. r is Pearson correlation coefficient. p value is generated nonparametrically as described in

Methods. 1, 90-110+170-483. 2, 1-483. 3, 90-183. 4, 2L/A 90-483. 5, 170-483. 6, 90-110+210-

483. 7, 2L/A 90-110+210-483. 8, 210-483.

58

Figure 2 Klf4 90-110 DBD A ARRETEEFNDLLDLDFILSNS WT ARRETEEFNDLADLDFALSNS LI/AA ** ARRATAAFNDLLDLDFILSNS 3E/A ** ARRETEEFNDAADLDFILSNS 2L/A ** ARRETEEFNDLLALAFILSNS 2D/A ARRETEEFNDLADLDFILSNS L101A * ARRETEEFNDLLDLDFALSNS I106A *

0.0 0.5 1.0 1.5 B RLU normalized to WT

AD 3xZF 1-483 90 ARRETEEFNDLLDLDFILSNS 110 90-483 90 ARRATAAFNDLLDLDFILSNS 110 3E/A 90-483 90 ARRETEEFNDLLALAFILSNS 110 2D/A 90-483 90 ARRETEEFNDLADLDFALSNS 110 LI/AA 90-483 90 ARRETEEFNDAADLDFILSNS 110 2L/A 90-483

0 50 100 150 Nanog+ colony number Robin Mckee C I106A 3xZF I106A 90-110+210-483 AD 3xZF 90-110+210-483

3xZF 210-483

0 20 40 60 80 Nanog+ colony number Bernadett Papp

2L/A 3xZF 2L/A 90-110+210-483

AD 3xZF 90-110 +210-483

3xZF 210-483

0 50 100 150 Nanog+ colony number Bernadett Papp D E

GFP )

1-396 80

RLU r = 0.9337 ( 1-483 y p = 0.0015 **** t 170-483 60 1 ivi

**** t 90-110+170-483 **** 2 **** 3 2L/A 90-110+170-483 40 ion ac 210-483 t * iva

90-110 +210-483 t 4 * 20 2L/A 90-110+210-483 5 7 90-483 6

ransac 8

**** T 0 2L/A 90-483 0 50 100 150

0 20 40 60 80 Reprogramming efficiency% RLU normalized to GFP

59

Figure 2-3 Proteins bound differentially to wild-type TAD compared to reprogramming

defective mutants.

A) Wild-type (WT), I106A, 2L/A versions of TAD fused to glutathione-S-transferase (GST), or

GST alone were immobilized on glutathione-Sepharose and used to affinity purify proteins from mESC nuclear extract. Bound proteins were identified by mass spectrometry. Binding is expressed as NSAF (normalized spectral abundance factor) values, and the proteins are ranked by the extent of differential binding, determined by subtracting NSAF values of I106A from wild-type (WT-I106A) for each protein. B) GST alone, GST fused to wild-type Klf4 TAD (GST-

TAD) or the indicated mutants (GST-TAD 2L/A, GST-TAD I106A) were used to affinity purify proteins from mESC nuclear extract as in A. Bound proteins were eluted with sequential washes

(E1 and E2 are shown), subjected to SDS-PAGE, and immunoblotted with antibodies specific for p300 or Mediator (Med1, Cdk8). C) mESC nuclear extracts were subjected to immunoprecipitation with Klf4 antibody or an unrelated antibody (IgG). Immunoprecipitates were analyzed by SDS-PAGE and immunoblotting with antibodies to p300, the indicated

Mediator subunits, Klf4, and Zranb3. Input (IN) sample represents 1% of the sample subjected to immunoprecipitation. 17% of the immunoprecipitated sample was analyzed by immunoblotting.

60

Figure 3 A B GST GST-TAD GST-TAD 2L/A GST-TAD I106A WT- Gene WT I106A 2LA GST I106A E1 E2 E1 E2 E1 E2 E1 E2 Gcn1l1 215.1 31.5 111.9 12.9 183.6 Med27 180.8 0.0 0.0 0.0 180.8 p300 Polr2i 170.6 0.0 0.0 0.0 170.6 Gtf2h5 169.7 0.0 0.0 0.0 169.7 Ep300 223.2 55.4 83.1 18.6 167.7 Crebbp 202.4 39.1 76.5 8.5 163.3 Med1 Med30 158.0 0.0 76.3 0.0 158.0 Cdk8 248.0 92.7 67.4 0.0 155.4 Med10 267.8 113.2 100.7 51.1 154.6 Trim11 172.6 19.3 17.2 0.0 153.3 Cdk8 Med17 222.8 70.6 68.1 10.6 152.2 Med14 209.3 57.6 41.9 0.0 151.6 Tcl1a 141.9 0.0 0.0 0.0 141.9 Med4 267.8 127.3 100.7 38.3 140.5 Polr2j 139.7 0.0 0.0 60.0 139.7 Med19 181.1 47.0 69.6 28.3 134.1 Med12 156.9 28.1 34.3 0.0 128.8 Med31 245.3 116.6 77.8 0.0 128.7 Med29 161.5 38.4 119.5 0.0 123.1 Med8 218.6 95.9 99.5 0.0 122.6 2410002F23Rik 198.0 80.7 71.8 0.0 117.3 C αKlf4 IgG IN Med9 169.7 53.8 71.8 0.0 115.9 Med18 115.9 0.0 0.0 0.0 115.9 p300 Med23 181.7 66.9 32.2 0.0 114.8 Med24 190.6 77.1 58.3 0.0 113.5 Polr2d 113.2 0.0 47.8 48.6 113.2 Med1 Abce1 301.8 191.3 153.1 0.0 110.5 Glud1 108.0 0.0 0.0 0.0 108.0 Plk1 346.4 240.7 225.4 131.6 105.7 Med23 Med16 144.7 40.0 23.7 0.0 104.8 Med11 166.5 63.4 84.5 0.0 103.2 Med13 111.0 15.8 26.6 0.0 95.2 CDK8 Ccz1 92.1 0.0 56.6 0.0 92.1 Timm13 211.4 120.6 71.5 72.6 90.8 Pdrg1 90.6 0.0 0.0 0.0 90.6 Med21 167.4 79.6 47.2 71.9 87.8 Med26 Med13l 92.6 5.2 12.3 0.0 87.5 Stk24 268.8 182.0 106.3 16.6 86.7 Med15 Skp1 320.4 234.4 208.4 190.5 86.0 Med1 123.0 37.1 25.4 0.0 85.9 Med15 139.8 54.2 56.9 8.9 85.6 Klf4 Tada1 83.9 0.0 0.0 0.0 83.9 Gtf2f2 80.7 0.0 0.0 0.0 80.7

Zranb3 CBP/p300 Mediator subunits Pol II subunits

Ajay Vashisht

61

Figure 2-4 Direct and functional interactions of p300/CBP with Klf4 TAD.

A) Full-length Flag-p300 was tested for binding to immobilized GST or the indicated GST fusions to TAD wild-type (WT) or I106A mutant, VP16N (VPN), VP16C (VPC), or full length

VP16 TAD (VPfl). Eluted proteins were analyzed by immunoblotting with Flag antibody (top panel) or visualized by fluorescence (bottom panel). Input sample (IN) represents 8% of the sample subjected to affinity binding. 18% of each eluted sample was analyzed by immunoblotting and by fluorescence. B,C) Purified 6xHistidine-tagged CBP/p300 domains were incubated with GST, GST-TAD (aa90-110; WT), or GST-TAD I106A (I106A) immobilized on glutathione-Sepharose. Bound proteins were eluted, separated by SDS-PAGE and visualized by Coomassie blue staining. Input (IN) sample represents 5% of the sample used for affinity binding. 7% of each eluted sample was analyzed by SDS-PAGE. Arrows indicate bands corresponding to the CBP/p300 domains. D) Gal4 fusions to Klf4, VP16N (VPN) or

VP16C (VPC) TADs were transfected into HeLa cells together with the indicated DNA amounts of vector (pcDNA) or p300/CBP overexpression constructs. Transcriptional activation activity was assayed using a dual luciferase assay. Luciferase expression levels were normalized to that in pcDNA samples. Error bars represent SD. **P ≤ 0.01 *** P ≤ 0.001 **** P ≤ 0.0001 as determined by unpaired t-test.

62

A D

IN GST WT I106A VPN VPC VPfl **** *** 2.5 ** αFLAG ** 2.0

GST-fusion 1.5

1.0

0.5

0.0

Klf4 16ngKlf4 p300 16ng CBP VPN 16ng p300 VPC 16ng p300 Klf4 16ng pcDNA VPN 16ng pcDNAVPC 16ng pcDNA

B

CBP TAZ1 KIX TAZ2 NCBD IN GST WT I106A IN GST WT I106A IN GST WT I106A IN GST WT I106A

C p300 TAZ1 IHD KIX TAZ2 NCBD IN GST WT I106A IN GST WT I106A IN GST WT I106A IN GST WT I106A IN GST WT I106A

63

Figure 2-5 Direct and functional interactions of Mediator with Klf4 TAD.

A) Mediator complex purified from a nuclear extract of ES cells expressing 3xFLAG-Med29 was tested for binding to GST, GST-Klf4 TAD (WT) or GST-Klf4 TAD I106A (I106A) as in

4A. Bound proteins were analyzed by immunoblotting with antibodies to the indicated subunits or staining with Ponceau S. Input (IN) sample represents 1% of eluted samples. 17% of eluted samples were analyzed by immunoblotting. B) Individual Mediator subunits were labelled with

35S-methionine by in vitro translation and incubated with GST, GST-Klf4 TAD (WT) or GST-

Klf4 TAD I106A (I106A) immobilized on glutathione-Sepharose. Eluted proteins were analyzed by SDS-PAGE and autoradiography. Input (IN) sample represents 10% of the sample used for affinity binding. 30% of each eluted sample was analyzed by SDS-PAGE. C) The Med25 ACID domain (aa402-591) was labelled by in vitro translation and tested for binding to GST or the indicated GST fusions as in 5B. D) GAL-4 fusions to Klf4, VP16 full-length TAD, or E1A

(aa121-223) TAD were transfected into HeLa cells with increasing amounts of Med25 ACID overexpression plasmid DNA (8 or 16ng) and assayed for transcriptional activation using a dual luciferase assay. Luciferase expression is normalized to ACID 0 ng in each set. Error bar represents SD. *P≤0.05, **P ≤ 0.01 by unpaired t-test. E) Gal4-Klf4 TAD or Gal4-E1A TAD expression plasmids were co-transfected into HeLa cells with the indicated DNA amounts (in ng) of MED14 and Med15 overexpression constructs and assayed for transcriptional activation using a dual luciferase assay. Luciferase expression is normalized to MED14 0 ng (-) or Med15 0 ng

(-) in each set. **P≤0.01 by ratio t-test.

64

A B IN GST WT I106A IN GST WT I106A IN GST WT I106A IN GST WT I106A MED1 MED14 MED24 Med1 MED4 MED15 Med25 Med23 Med6 MED16 MED26 Med26 MED7 MED17 MED27 Cdk8 MED8 MED18 MED28

Med29 (FLAG) Med9 Med19 MED29

GST-fusion Med10 Med20 Med30

Med11 MED21 Med31 C IN GST WT I106A MED12 MED22 CDK8

Med25 ACID MED13 MED23 Cyclin C

D E ** ns 1.5 * ** 2.0 **

1.5 1.0 **

U U 1.0 RL RL 0.5 0.5

0.0 0.0

ACID ACID ACID ACID ACID ACID ACID ACID ACID 0 8 Klf4 + + + - - - + + + - - - A A 16 10 0 10 8 A 1 1 10 16 E1 E1 1 VP16 0VP16 8 E1 E1A - - - + + + - - - + + + 90- 90- VP16 16 90- MED14 ------8 16 - 8 16 Med15 - 8 16 - 8 16 ------

65

Figure 2-6 Genome-wide gene expression effects of wild-type and I106A Klf4 TAD.

(A) RNA sequencing analysis of MEFs expressing full-length Klf4 (FL), wild-type (WT) or

I106A TAD fused to Klf4 aa210-483 (I106A), or tdTomato (TOM). Log2RPKM box plots of the genes at least 2-fold up-regulated in FL (left panel) or down-regulated in FL (right) are shown for each indicated sample. **** p<2.2x10^-16, by paired Mann-Whitney test. B) Venn diagrams of up-regulated (left) and down-regulated (right) genes at least by two fold in WT and FL cells, compared to TOM. C) Common biological process enrichment in genes upregulated in FL, WT, and I106A. X axis is adjusted p value (q) in log scale. D) Gene ontology of genes upregulated by

WT compared to I106A was determined as in C.

66

A Up-regulated genes Down-regulated genes B Up-regulated genes Down-regulated genes

**** 14 **** FL WT FL WT **** **** **** **** **** 10 ****

12 **** **** 8 10 958 536 121 703 231 68

8 6 RPKM 2

log

6 4 4 2 2 0 0

TOM FL WT I106A TOM FL WT I106A

C regulation of cellular component movement

cell-cell adhesion

epithelial cell differentiation FL enzyme linked receptor protein signaling pathway WT epithelial cell proliferation I7A homophilic cell adhesion via plasma membrane adhesion molecules

0 5 10 15 20

-log10q D

skin development positive regulation of cell proliferation cell-cell adhesion regulation of anatomical structure size positive regulation of secretion regulation of body fluid levels blood circulation

0 1 2 3 4

-log10q

67

Table 2-1 Number of genes significantly up- or down- regulated in cells expressing full- length Klf4 (FL), wild-type (WT) or I106A TAD fused to Klf4 aa210-483 compared to cells

expressing tdTomato (TOM) or WT compared to I106A. Significance determined as

described in Methods.

FL/TOM WT/TOM I106A/TOM WT/I106A Significant 2373 956 809 199 Significantly up 1439 657 584 136 Significantly down 934 299 225 63

68

Figure 2-S1 Characterization of Klf4 constructs.

Data correspond to reprogramming experiments in Figure 2-1B. A) Expression levels of select

Klf4 mutants. Lysates from cells expressing the indicated Klf4 constructs, all carrying an N- terminal FLAG tag, were analyzed by immunoblotting with FLAG or GAPDH antibody. Stars mark non-specific bands. Numbers beneath each lane indicate the band intensity ratio of FLAG to GAPDH. Lysates were prepared from the reprogramming culture at day 5. B) Localization of select Klf4 constructs. Cells fixed at reprogramming day 4 were immunolabelled with FLAG antibody and stained with DAPI to label the nucleus. C) N-terminal FLAG epitope does not alter the reprogramming efficiency of full-length Klf4. Error bars represent SD. D) FLAG epitope does not affect reprogramming by 90-483 3E/A or 111-483. Error bars represent SD. E)

Relationship between reprogramming and virus titer. Media from cells expressing retrovirus carrying wild-type Klf4 was used to prepare dilutions for reprogramming. The fraction of 1 on the x-axis represents the amount of media normally used for reprogramming. Error bars represent

SD. Cells were fixed on day 12 in (C-E). F-H) Results from additional reprogramming experiments with the indicated constructs. Samples were fixed on day 12 (F), day 13 (G), day 11

(H).

69

A

1-483 90-48390-483 90-4833E/A 90-483LI/AA 2D/A

* * FLAG

GAPDH

1.00 0.85 1.18 2.24 0.90 Ryan Jason Schmidt

B 1-483 90-483 90-483 90-483 90-483 3E/A LI/AA 2D/A

DAPI

FLAG

merge

Ryan Jason Schmidt C 150 D 200 E 200 s s 150 s e 100 i 150 on l

100 o c

100 50 +

50 nog Nanog colonie

Nanog+ colonie 50 a N 0 0 0 0 0.2 0.4 0.6 0.8 1 1-483 1-483 1-483 90-483 11 11 fraction of virus added A FLAG-1-483 Ryan Jason Schmidt 3E/ Ryan Jason Schmidt FLAG- Ryan Jason Schmidt 3E/A FLAG-90-483

F 60 G H 150 200 ) )

) 150 40 100 1 1

100 20 50 Nanog+ (d12 Nanog+ (d13 Nanog+ (d 50

0 0 0

10 10 1 1 1-483 1-483 1-144 1-483 90- 90- 11 145-169 170-209 Bernadett Papp Bernadett Papp Bernadett Papp

70

Figure 2-S2 Klf4 TAD transcriptional activation activity.

Data correspond to experiments in Figure 2-2. A) Expression levels of Klf4 TAD-Gal4 fusions were analyzed by immunoblotting of cell lysates from samples used for transcriptional activation assay. Mutations indicated as in Figure 2-2A. Asterisk indicates position of non-specific band.

Arrows indicate specific bands. In some cases the specific band is inferred from broadening and increased intensity of the non-specific band compared to the GFP lane. B-D) The indicated Klf4 sequences (numbered) or tandem VP16N TAD (2xVP16N) were fused to the Gal4 DNA binding domain and tested for transcriptional activation using a dual luciferase assay in MEFs (B) and mESCs (C,D). The Gal4 DNA binding domain by itself was also included as a control (DBD).

RLU are normalized to DBD. Numbers over each bar indicate the average measured value. Error bars represent SD. E) Correlated effects of the indicated mutations on reprogramming by Klf4 aa90-483 (from Figure 2-2B) and transcriptional activation by the GAL4-TAD fusion (Figure 2-

2A). Pearson correlation coefficient r and p values were calculated as described in Methods. 1,

WT. 2, 3E/A. 3, 2D/A. 4, LI/AA. 5, 2L/A.

71

A GFP WT LI/AA 3E/A 2D/A 2L/A L101A I106A GFP DBD WT * *

B C

2xVP16N 1748 2xVP16N 2716

111-209 5.78 111-209 13.7

90-110 L/I 1.66 90-110 L/I 2.88

90-110 3E/A 20.2 90-110 3E/A 156.8

90-110 244.6 90-110 1374

DBD DBD 1

1 10 100 1000 10000 1 10 100 1000 10000

RLU Ryan Jason Schmidt RLU Ryan Jason Schmidt D E 10 y t r = 0.9392 iv

VP16 TADc Core t 1 1 p = 0.0178

VP16 TADc 0.1 ion ac t 2

iva 3 90-110 t 0.01

0.001 5 4

DBD ransac T 0.0001 0 0 50 100 150 500 1000 1500 2000 2500 Reprogramming efficiency% RLU normalized to DBD

72

Figure 2-S3 Binding of CBP/p300 domains to Klf4 TAD and overexpression of CBP and

p300.

A) Extracts of E. coli expressing the indicated 6xHistidine-tagged CBP/p300 domains were incubated with GST, GST-TAD (aa90-110; WT), or GST-TAD I106A (I106A) immobilized on glutathione-Sepharose. Bound proteins were eluted, separated by SDS-PAGE and visualized by staining with Coomassie blue. Input (IN) sample represents 0.75% of the sample used to affinity binding. 7% of each eluted sample was analyzed by SDS-PAGE. Arrows indicate bands corresponding to CBP/p300 domains. B) qRT-PCR analysis of p300 and CBP expression in

HeLa cells used for the luciferase assay in Figure 2-4D. Values are normalized to the vector- alone sample (pcDNA). Error bars represent SD. *P ≤ 0.05 by t-test.

73

A CBP TAZ1 KIX TAZ2 NCBD IN GST WT I106A IN GST WT I106A IN GST WT I106A IN GST WT I106A

p300 TAZ1 IHD KIX TAZ2 NCBD IN GST WT I106A IN GST WT I106A IN GST WT I106A IN GST WT I106A IN GST WT I106A

B A

A * 4 * 15

3 10

2 5 1

0 0 Fold change relative to pcDN Fold change relative to pcDN

16 ng 16 ng 16 ng A P A CB p300 16 ng pcDN pcDN

74

Figure 2-S4 Purified Mediator complex.

Mediator was purified from nuclear extracts of an ES cell line that expresses doxycycline- inducible 3xFLAG-Med29 from the Col1a locus. The purified sample was analyzed by SDS-

PAGE and protein visualized by silver stain (A) or immunoblotting with antibodies to the indicated Mediator subunits (B).

75

A B

Med13/12 245 Med1 190 Med14 Med23 Med1 135 Med23 Med15 100 Med24 Med25/16 Med26 Med17 80 Cdk8 Med26

3xFLAG-Med29 58 Cdk8

46

32

76

Figure 2-S5 Overexpression of Med25 ACID domain, MED14, and Med15 in Gal4-TAD

transcriptional activation assays.

A) FLAG-tagged Med25 ACID domain expression levels were determined by qRT-PCR (left) normalized to RRM2 mRNA levels and expressed relative to levels in the ACID 0 ng sample

(left) or immunoblotting with FLAG antibody (right). Samples analyzed from cells transfected for luciferase assays in Figure 2-5D with the indicated amounts ACID domain overexpression plasmid DNA. **p≤0.01 by unpaired t-test. B) Gal4-Klf4 TAD (Klf4) or Gal4-E1A TAD

(E1A) expression plasmids were co-transfected into mESC cells with the indicated DNA amounts of Med25 ACID domain overexpression plasmids and assayed for transcriptional activation using a dual luciferase assay. Luciferase expression is normalized to the level in the

Klf4 TAD 0ng ACID sample. Error bars represent SD. *p≤0.05 by ratio t-test. C) qRT-PCR analysis of MED14 and Med15 expression in HeLa cells used for the luciferase assay in Figure

2-5E. Expression levels are normalized to RRM2 internal control. Primers used to detect mouse

Med15 did not detect Hela MED15 in the negative control. Notice that ectopic expression levels of MED14 and Med15 are similar. Error bars represent SD. ****P ≤ 0.0001 by t-test. D) Over- expression of Med25 on GAL4-Klf4 TAD transactivation activity determined by dual luciferase assay in HeLa cells. ** P≤ 0.01 by unpaired t-test.

77

A A 200 ** g n 150 g

n 16 0

D D I 100 ** I AC AC

50 FLAG-ACID

0 Fold change relative to pcDN

ACID 0 ngACID 8 ng ACID 16 ng

B ) 1.5 * ns 0 ng ACID ACID

32 ng ACID 1.0

10 0ng 64 ng ACID

1 RLU 0.5

0.0

(compared t o 90- Klf4 E1A

0.020 N.S. C 2 1.5 2 2.0 D RRM **** RRM ** o o t t

0.015 1.5 ve ve i i

at 1.0 at l l U re re 0.010

1.0 RL ion ion 0.5 ress ress p p 0.5 0.005 ex ex

15 D14 d

E 0.0 0.0 0.000 Me M

ACID 16 ng 16 ng A A 16 ng pcDN MED14 16 ng pcDN Med15 16 ng 16 ng pcDNA 16 ng Med25

78

Figure 2-S6 Principal component analysis of the transcriptomes of cells expressing full-

length Klf4 (FL), wild-type (WT) or I106A TAD fused to Klf4 aa210-483, or tdTomato

(TOM).

79

20

10

FL I106A PC2 TOM 0 WT

−10

−20

−50 −25 0 25 PC1

Shan Sabri

80

Table 2-S1 Mass spectrometry result of GST-TAD affinity purification in normalized

spectral abundance factor (NSAF). (Spreadsheet appended).

Table 2-S2 Protein complexes detected by Klf4 TAD affinity binding.

Average NSAF value of subunits of protein complex involved in transcriptional regulation and histone modification.

WT- WT I106A 2L/A GST I106A Mediator 158.3 56.0 52.4 9.3 102.3 Pol II 92.2 29.4 23.4 33.4 62.7 TFIIF 68.0 11.3 13.4 10.2 56.7 SAGA/PCAF 82.8 37.4 36.6 22.8 45.4 TFIIH 65.3 20.2 16.5 10.9 45.1 TFIID 69.5 50.2 33.9 28.4 19.3 TFIIE 32.0 13.0 11.6 11.8 18.9 FACT 64.2 55.3 67.5 65.0 8.9 PAF 32.0 23.5 21.8 28.1 8.5 ATAC 59.3 54.5 30.8 24.2 4.8 Integrator 33.5 31.2 23.5 27.2 2.3 ASF1 19.7 18.7 16.7 16.9 1.0 TIP60 61.3 60.4 37.7 32.1 0.9 TFIIB 0.0 0.0 0.0 0.0 0.0 SEC 12.0 13.0 6.3 6.3 -1.0 CHD-family 22.1 24.4 21.9 14.7 -2.3 Swi/SNF/pBAF 55.9 58.3 61.1 70.1 -2.4 MOZ/MORF 16.2 20.5 0.0 0.9 -4.3 MLL3/4 48.2 53.8 36.9 44.8 -5.6 HBO1 8.1 14.2 2.3 2.3 -6.1 MOF 10.8 17.9 4.9 7.5 -7.1 MLL1/2 56.0 66.2 44.7 52.0 -10.2 Set1a/b 77.0 91.0 63.1 73.5 -14.0 NSL 48.7 65.0 47.1 49.2 -16.3 Ino80 25.3 45.0 24.2 23.2 -19.7 NuRD 152.6 172.9 155.0 155.1 -20.3

81

Table 2-S3 Expression changes in selected MET-related genes in MEF expressing Klf4

constructs.

Log2 ratio of expression changes (Upper panel) and RPKM values (lower panel), determined by

RNA sequencing, in the indicated comparisons of cells expressing full-length Klf4 (FL), wild- type (WT) or I106A TAD fused to Klf4 aa210-483, or tdTomato (TOM).

log2FC FL/TOM WT/TOM I106A/TOM WT/I106A Cdh1 4.68 1.62 1.55 0.06 Pkp1 5.49 1.69 0.15 1.55 Pkp3 2.87 1.15 1.82 -0.67 Dsp 7.55 3.61 2.66 0.96 Krt19 1.27 -0.30 -2.10 1.80 Tgfb3 -2.26 -1.27 -0.89 -0.38 Snai2 -1.93 -0.65 -0.28 -0.36

log2RPKM FL WT I106A TOM

Cdh1 0.96 0.15 0.16 0.04

Pkp1 5.75 2.28 1.25 1.04

Pkp3 0.47 0.16 0.26 0.06

Dsp 5.07 1.65 1.10 0.21

Krt19 4.05 2.65 1.37 2.76

Tgfb3 4.76 5.73 6.16 6.81

Snai2 3.81 5.05 5.45 5.52

82

References

Aranda-Orgilles, B., Saldaña-Meyer, R., Wang, E., Trompouki, E., Fassl, A., Lau, S., Mullenders, J., Rocha, P.P., Raviram, R., Guillamot, M., et al. (2016). MED12 Regulates HSC- Specific Enhancers Independently of Mediator Kinase Activity to Control Hematopoiesis. Cell Stem Cell 19, 784–799.

Avior, Y., Sagi, I., and Benvenisty, N. (2016). Pluripotent stem cells in disease modelling and drug discovery. Nature Reviews Molecular Cell Biology 17, 170–182.

Boyer, T.G., Martin, M.E., Lees, E., Ricciardi, R.P., and Berk, A.J. (1999). Mammalian Srb/Mediator complex is targeted by adenovirus E1A protein. Nature 399, 276–279.

Chen, X., Xu, H., Yuan, P., Fang, F., Huss, M., Vega, V.B., Wong, E., Orlov, Y.L., Zhang, W., Jiang, J., et al. (2008). Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117.

Chronis, C., Fiziev, P., Papp, B., Butz, S., Bonora, G., Sabri, S., Ernst, J., and Plath, K. (2017). Cooperative Binding of Transcription Factors Orchestrates Reprogramming. Cell 168, 442– 459.e20.

Cociorva, D., L Tabb, D., and Yates, J.R. (2007). Validation of tandem mass spectrometry database search results using DTASelect. Curr Protoc Bioinformatics Chapter 13, Unit13.4– 13.4.14.

Courey, A.J., and Tjian, R. (1988). Analysis of Sp1 in vivo reveals multiple transcriptional domains, including a novel glutamine-rich activation motif. Cell 55, 887–898.

Dignam, J.D., Lebovitz, R.M., and Roeder, R.G. (1983). Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res. 11, 1475–1489.

Du, J.X., McConnell, B.B., and Yang, V.W. (2010). A small ubiquitin-related modifier- interacting motif functions as the transcriptional activation domain of Krüppel-like factor 4. J. Biol. Chem. 285, 28298–28308.

Ebmeier, C.C., and Taatjes, D.J. (2010). Activator-Mediator binding regulates Mediator-cofactor interactions. Proc. Natl. Acad. Sci. U.S.a. 107, 11283–11288.

Elias, J.E., and Gygi, S.P. (2007). Target-decoy search strategy for increased confidence in large- scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214.

Emami, K.H., and Carey, M. (1992). A synergistic increase in potency of a multimerized VP16 transcriptional activation domain. Embo J. 11, 5005–5012.

Evans, P.M., Chen, X., Zhang, W., and Liu, C. (2009). KLF4 Interacts with -Catenin/TCF4 and Blocks p300/CBP Recruitment by -Catenin. Mol. Cell. Biol. 30, 372–381.

83

Evans, P.M., Zhang, W., Chen, X., Yang, J., Bhakat, K.K., and Liu, C. (2007). Kruppel-like factor 4 is acetylated by p300 and regulates gene transcription via modulation of histone acetylation. J. Biol. Chem. 282, 33994–34002.

Fang, F., Xu, Y., Chew, K.-K., Chen, X., Ng, H.-H., and Matsudaira, P. (2014). Coactivators p300 and CBP maintain the identity of mouse embryonic stem cells by mediating long-range chromatin structure. Stem Cells 32, 1805–1816.

Florens, L., Carozza, M.J., Swanson, S.K., Fournier, M., Coleman, M.K., Workman, J.L., and Washburn, M.P. (2006). Analyzing chromatin remodeling complexes using shotgun proteomics and normalized spectral abundance factors. Methods 40, 303–311.

Fong, Y.W., Inouye, C., Yamaguchi, T., Cattoglio, C., Grubisic, I., and Tjian, R. (2011). A DNA repair complex functions as an Oct4/Sox2 coactivator in embryonic stem cells. Cell 147, 120– 131.

Gao, Z., Cox, J.L., Gilmore, J.M., Ormsbee, B.D., Mallanna, S.K., Washburn, M.P., and Rizzino, A. (2012). Determination of protein interactome of transcription factor Sox2 in embryonic stem cells engineered for inducible expression of four reprogramming factors. J. Biol. Chem. 287, 11384–11397.

Geiman, D.E., Ton-That, H., Johnson, J.M., and Yang, V.W. (2000). Transactivation and growth suppression by the gut-enriched Krüppel-like factor (Krüppel-like factor 4) are dependent on acidic amino acid residues and protein-protein interaction. Nucleic Acids Res. 28, 1106–1113.

Hirai, H., Tani, T., and Kikyo, N. (2010). Structure and functions of powerful transactivators: VP16, MyoD and FoxA. Int. J. Dev. Biol. 54, 1589–1596.

Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-André, V., Sigova, A.A., Hoke, H.A., and Young, R.A. (2013). Super-enhancers in the control of cell identity and disease. Cell 155, 934– 947.

Ikeda, K., Stuehler, T., and Meisterernst, M. (2002). The H1 and H2 regions of the activation domain of herpes simplex virion protein 16 stimulate transcription through distinct molecular mechanisms. 7, 49–58.

Ishikawa, H., Tachikawa, H., Miura, Y., and Takahashi, N. (2006). TRIM11 binds to and destabilizes a key component of the activator-mediated cofactor complex (ARC105) through the ubiquitin-proteasome system. FEBS Lett. 580, 4784–4792.

Jiang, J., Chan, Y.-S., Loh, Y.-H., Cai, J., Tong, G.-Q., Lim, C.-A., Robson, P., Zhong, S., and Ng, H.-H. (2008). A core Klf circuitry regulates self-renewal of embryonic stem cells. Nat. Cell Biol. 10, 353–360.

Jonker, H.R.A., Wechselberger, R.W., Boelens, R., Folkers, G.E., and Kaptein, R. (2005). Structural Properties of the Promiscuous VP16 Activation Domain †. Biochemistry 44, 827–839.

Kaczynski, J., Cook, T., and Urrutia, R. (2003). Sp1- and Krüppel-like transcription factors.

84

Genome Biol. 4, 206.

Kagey, M.H., Newman, J.J., Bilodeau, S., Zhan, Y., Orlando, D.A., van Berkum, N.L., Ebmeier, C.C., Goossens, J., Rahl, P.B., Levine, S.S., et al. (2010). Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435.

Kaiser, P., and Wohlschlegel, J. (2005). Identification of ubiquitination sites and determination of ubiquitin-chain architectures by mass spectrometry. Meth. Enzymol. 399, 266–277.

Kelstrup, C.D., Young, C., Lavallee, R., Nielsen, M.L., and Olsen, J.V. (2012). Optimized fast and sensitive acquisition methods for shotgun proteomics on a quadrupole orbitrap mass spectrometer. J. Proteome Res. 11, 3487–3497.

Landrieu, I., Verger, A., Baert, J.-L., Rucktooa, P., Cantrelle, F.-X., Dewitte, F., Ferreira, E., Lens, Z., Villeret, V., and Monté, D. (2015). Characterization of ERM transactivation domain binding to the ACID/PTOV domain of the Mediator subunit MED25. Nucleic Acids Res. 43, 7110–7121.

Langlois, C., Mas, C., Di Lello, P., Jenkins, L.M.M., Legault, P., and Omichinski, J.G. (2008). NMR structure of the complex between the Tfb1 subunit of TFIIH and the activation domain of VP16: structural similarities between VP16 and p53. J. Am. Chem. Soc. 130, 10596–10604.

Malik, S., and Roeder, R.G. (2010). The metazoan Mediator co-activator complex as an integrative hub for transcriptional regulation. Nat. Rev. Genet. 11, 761–772.

Mas, C., Lussier-Price, M., Soni, S., Morse, T., Arseneault, G., Di Lello, P., Lafrance-Vanasse, J., Bieker, J.J., and Omichinski, J.G. (2011). Structural and functional characterization of an atypical activation domain in erythroid Kruppel-like factor (EKLF). Proc. Natl. Acad. Sci. U.S.a. 108, 10484–10489.

McConnell, B.B., and Yang, V.W. (2010). Mammalian Krüppel-like factors in health and diseases. Physiological Reviews 90, 1337–1381.

Michalski, A., Damoc, E., Hauschild, J.-P., Lange, O., Wieghaus, A., Makarov, A., Nagaraj, N., Cox, J., Mann, M., and Horning, S. (2011). Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol. Cell Proteomics 10, M111.011015.

Mittler, G., Stühler, T., Santolin, L., Uhlmann, T., Kremmer, E., Lottspeich, F., Berti, L., and Meisterernst, M. (2003). A novel docking site on Mediator is critical for activation by VP16 in mammalian cells. Embo J. 22, 6494–6504.

Nakagawa, M., Koyanagi, M., Tanabe, K., Takahashi, K., Ichisaka, T., Aoi, T., Okita, K., Mochiduki, Y., Takizawa, N., and Yamanaka, S. (2008). Generation of induced pluripotent stem cells without Myc from mouse and human fibroblasts. Nat. Biotechnol. 26, 101–106.

Pardo, M., Lang, B., Yu, L., Prosser, H., Bradley, A., Babu, M.M., and Choudhary, J. (2010). An expanded Oct4 interaction network: implications for stem cell biology, development, and

85

disease. Cell Stem Cell 6, 382–395.

Pekarsky, Y., Palamarchuk, A., Maximov, V., Efanov, A., Nazaryan, N., Santanam, U., Rassenti, L., Kipps, T., and Croce, C.M. (2008). Tcl1 functions as a transcriptional regulator and is directly involved in the pathogenesis of CLL. Proc. Natl. Acad. Sci. U.S.a. 105, 19643–19648.

Polo, J.M., Anderssen, E., Walsh, R.M., Schwarz, B.A., Nefzger, C.M., Lim, S.M., Borkent, M., Apostolou, E., Alaei, S., Cloutier, J., et al. (2012). A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632.

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2009). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140.

Rodda, D.J., Chew, J.-L., Lim, L.-H., Loh, Y.-H., Wang, B., Ng, H.-H., and Robson, P. (2005). Transcriptional regulation of nanog by OCT4 and SOX2. J. Biol. Chem. 280, 24731–24737.

Sen, G.L., Boxer, L.D., Webster, D.E., Bussat, R.T., Qu, K., Zarnegar, B.J., Johnston, D., Siprashvili, Z., and Khavari, P.A. (2012). ZNF750 is a p63 target gene that induces KLF4 to drive terminal epidermal differentiation. Dev. Cell 22, 669–677.

Shields, J.M., and Yang, V.W. (1998). Identification of the DNA sequence that interacts with the gut-enriched Krüppel-like factor. Nucleic Acids Res. 26, 796–802.

Shields, J.M., Christy, R.J., and Yang, V.W. (1996). Identification and characterization of a gene encoding a gut-enriched Krüppel-like factor expressed during growth arrest. J. Biol. Chem. 271, 20009–20017.

Soufi, A., Donahue, G., and Zaret, K.S. (2012). Facilitators and impediments of the pluripotency reprogramming factors' initial engagement with the genome. Cell 151, 994–1004.

Stevens, J.L., Cantin, G.T., Wang, G., Shevchenko, A., Shevchenko, A., and Berk, A.J. (2002). Transcription control by E1A and MAP kinase pathway via Sur2 mediator subunit. Science 296, 755–758.

Tabb, D.L., McDonald, W.H., and Yates, J.R. (2002). DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 1, 21–26.

Takahashi, K., and Yamanaka, S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676.

Tetreault, M.-P., Weinblatt, D., Shaverdashvili, K., Yang, Y., and Katz, J.P. (2016). KLF4 transcriptionally activates non-canonical WNT5A to control epithelial stratification. Sci Rep 6, 26130.

Teufel, D.P., Freund, S.M., Bycroft, M., and Fersht, A.R. (2007). Four domains of p300 each bind tightly to a sequence spanning both transactivation subdomains of p53. Proceedings of the National Academy of Sciences 104, 7009–7014.

86

van den Berg, D.L.C., Snoek, T., Mullin, N.P., Yates, A., Bezstarosti, K., Demmers, J., Chambers, I., and Poot, R.A. (2010). An Oct4-centered protein interaction network in embryonic stem cells. Cell Stem Cell 6, 369–381.

Wang, W.-P., Tzeng, T.-Y., Wang, J.-Y., Lee, D.-C., Lin, Y.-H., Wu, P.-C., Chen, Y.-P., Chiu, I.-M., and Chi, Y.-H. (2012). The EP300, KDM5A, KDM6A and KDM6B chromatin regulators cooperate with KLF4 in the transcriptional activation of POU5F1. PLoS ONE 7, e52556.

Wang, Z., Zang, C., Cui, K., Schones, D.E., Barski, A., Peng, W., and Zhao, K. (2009). Genome- wide mapping of HATs and HDACs reveals distinct functions in active and inactive genes. Cell 138, 1019–1031.

Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H., Rahl, P.B., Lee, T.I., and Young, R.A. (2013). Master transcription factors and mediator establish super- enhancers at key cell identity genes. Cell 153, 307–319.

Wohlschlegel, J.A. (2009). Identification of SUMO-conjugated proteins and their SUMO attachment sites using proteomic mass spectrometry. Methods Mol. Biol. 497, 33–49.

Xu, T., Venable, J.D., Park, S.K., Cociorva, D., Lu, B., Liao, L., Wohlschlegel, J., Hewel, J., and Yates, J.R.I. (2006). ProLuCID, a fast and sensitive tandem mass spectra-based protein identification program. Molecular & Cellular Proteomics 5, S174–S174.

Yang, F., DeBeaumont, R., Zhou, S., and Näär, A.M. (2004). The activator-recruited cofactor/Mediator coactivator subunit ARC92 is a functionally important target of the VP16 transcriptional activator. Proc. Natl. Acad. Sci. U.S.a. 101, 2339–2344.

Yet, S.F., McA'Nulty, M.M., Folta, S.C., Yen, H.W., Yoshizumi, M., Hsieh, C.M., Layne, M.D., Chin, M.T., Wang, H., Perrella, M.A., et al. (1998). Human EZF, a Krüppel-like zinc finger protein, is expressed in vascular endothelial cells and contains transcriptional activation and repression domains. J. Biol. Chem. 273, 1026–1031.

Zhang, L., Adams, J.Y., Billick, E., Ilagan, R., Iyer, M., Le, K., Smallwood, A., Gambhir, S.S., Carey, M., and Wu, L. (2002). Molecular engineering of a two-step transcription amplification (TSTA) system for transgene delivery in prostate cancer. Mol. Ther. 5, 223–232.

Zhang, R., Han, M., Zheng, B., Li, Y.-J., Shu, Y.-N., and Wen, J.-K. (2010). Krüppel-like factor 4 interacts with p300 to activate mitofusin 2 gene expression induced by all-trans retinoic acid in VSMCs. Acta Pharmacol. Sin. 31, 1293–1302.

87

CHAPTER 3

POTENTIAL ROLE OF CLATHRIN HEAVY CHAIN IN RENAL CELL CARCINOMA

TUMORIGENESIS

88

Abstract

TFE/MiTF family transcription factors play an essential role in nutrient sensing, organelle biogenesis, and energy metabolism, in addition to their established roles in the development of osteoclasts, melanocytes, and mast cells. TFE/MiTF genes also contribute to the tumorigenesis of different types of cancers, such as renal cell carcinomas, melanomas, and alveolar sarcomas. A variety of gene fusions to TFE/MiTF due to chromosomal translocations have been found in these cancer cells. The mechanisms underlying these malignant transformations by TFE3/MiTF genes, however, are not yet fully understood. Clathrin heavy chain (CHC or CLTC), a vesicle coat protein involved in membrane trafficking, is one of the fusion partners of TFE/MiTF transcription factors and was speculated to harbor transcriptional activation activity to activate TFE3 target genes. Here, I show that CHC does accommodate transcriptional activation activity by dual luciferase assay. Even though CHC physically interacts with Klf4, CHC does not affect the transcriptional activation activity of Klf4. Nor does the transactivation activity of CHC require Klf4. CHC-TFE3 fusion also exhibits robust transactivation activity, and it induces cell-cycle arrest of HEK293T cells, similar to PRCC-

TFE3 and ASPL-TFE3, two other frequent fusions in translocation renal cell carcinoma. These findings indicate a contribution of CHC transactivation activity to renal cell carcinoma.

Introduction

Clathrin heavy chain (CHC or CLTC) is a vesicle coat protein well known for its functionality in membrane trafficking (Kirchhausen et al., 2014). In addition to this canonical role of clathrin, clathrin heavy chain has been discovered in multiple other cellular activities. For instance, clathrin is required to bridge the kinetochore fibers in mitosis (Royle et al., 2005).

Clathrin has also been reported to regulate the transcriptional activation activity of p53 in

89

nucleus (Enari et al., 2006; Ohata et al., 2009; Ohmori et al., 2008). In addition, clathrin is also frequently fused with anaplastic lymphoma kinase (ALK), a proto-oncogene, in a variety of tumors (Armstrong et al., 2004). Of note, rare fusions of CHC with the microphthalmia transcription factor (MiT) family of transcription factors, TFEB and TFE3, have been reported in two cases of translocation renal cell carcinoma (tRCC) (Argani et al., 2003; Durinck et al.,

2015). The mechanism of the oncogenic activity of CHC-ALK has been postulated as: 1) the over-activation of ALK activity caused by the trimerization of the fusion protein, an attribute from CHC that mediates the formation of clathrin lattice on vesicles, and 2) the constitutive over- expression of this fusion gene by the CHC promoter (Armstrong et al., 2004). The mechanism of how CHC-TFE3/TFEB contributes to the tumorigenesis, however, remains unclear. It has been proposed that CHC-TFE3 may harbor strong transcriptional activation activity from CHC, in addition to the constitutive expression of this fusion gene (Argani et al., 2003).

The TFE/MiTF family includes four structurally related genes, TFE3, TFEB, TFEC, and

MiTF. They share highly conserved leucine zipper (LZ) and basic helix-loop-helix (bHLH) domains, which mediate their homo- or heterodimerization and binding to consensus E-box sequence motif (CA[C/T] GTG), and an acidic transactivation domain that mediates transcriptional activation (Haq and Fisher, 2011; Kauffman et al., 2014). The high sequence similarity within the family members confers the functional similarity and redundancy for these genes. For example, TFE3 and MiTF redundantly regulate osteoclast and mast cell genesis

(Martina et al., 2014b). TFEB, TFE3, and some isoforms of MiTF were found to share functionality in promoting autophagy, lysosome biogenesis, and cellular debris clearance

(Martina et al., 2014c; Sardiello et al., 2009; Settembre et al., 2011). Further, TFE3 and TFEB can mutually compensate in regulating whole-body energy metabolism (Pastore et al., 2017).

90

The functional similarity is also reflected by their oncogenic activity in specific groups of cancers. For instance, chromosomal translocation of TFE3, TFEB, or MiTF are found in all translocation renal cell carcinomas (tRCC) (Chahoud et al., 2016), which account for the majority of adolescent and pediatric renal cell carcinomas (Geller et al., 2015; Haq and Fisher,

2011). MiTF can also be replaced by TFE3 or TFEB in supporting the survival of a cell line derived from clear cell sarcoma (Davis et al., 2006). A germline mutation of MiTF confers the carrier with a genetic susceptibility to not only melanoma but also renal cell carcinoma

(Bertolotto et al., 2011). Overexpression of either TFE3, TFEB, or MiTF was observed in pancreatic ductal adenocarcinoma (PDA) (Perera et al., 2015). The similarity of TFE/MiTF genes and their prevalence in specific groups of cancers suggest a common underlying molecular mechanism of tumorigenesis by these genes.

The TFE/MiTF family of genes is regulated by mammalian (or mechanistic) target of rapamycin complex 1 (mTORC1) (Martina and Puertollano, 2013; Martina et al., 2014c; 2014a;

Roczniak-Ferguson et al., 2012; Settembre et al., 2012), a protein complex that senses changes in energy, stress, growth factors, and nutrients, and regulates protein synthesis, cell growth, and cell proliferation (Laplante and Sabatini, 2012). As illustrated in Figure 3-1A, in fully-fed cells, the v-ATPase on lysosomes promotes the Ragulator complex to activate heterodimer RagA/B-

RagC/D GTPases (Bar-Peled et al., 2012). Active Rag GTPases, consisting of GTP-loaded

RagA/B and GDP-loaded RagC/D, recruit mTORC1 to the lysosomes (Bar-Peled et al., 2012;

Sancak et al., 2010; 2008), where mTORC1 is activated by Rheb GTPase (Saucedo et al., 2003;

Stocker et al., 2003). Active Rag GTPases also recruit TFE3 to the lysosomes, and the recruitment requires S112 and R113 of TFE3 (Martina et al., 2014a). Then TFE3 is phosphorylated by the activated mTORC1 at S321 (Martina et al., 2014c). The phosphorylated

91

S321 provides a binding site for cytosolic chaperone 14-3-3, which subsequently sequesters

TFE3 in the cytosol (Martina et al., 2014c). When the nutrient level is low or mTORC1 is inactivated by Torin-1, TFE3 is dephosphorylated and translocated to the nucleus to activate a range of genes, including genes for lysosomal biogenesis, autophagy, and lysosomal exocytosis, adapting the cell to the starvation conditions (Martina et al., 2014c). Notably, RagD is also one of the target genes being upregulated (Di Malta et al., 2017). RagD appears to be a limiting factor for Rag GTPase activity, as deletion of RagD impaired mTORC1 signaling upon amino acid stimulation, whereas ectopic expression of RagD rescued mTORC1 activation (Di Malta et al., 2017). Interestingly, tumor suppressor gene folliculin (FLCN), a repressor of TFE3 (Hong et al., 2010), was also upregulated upon TFE3 activation (Di Malta et al., 2017; Martina et al.,

2014c). FLCN negatively regulates TFE3 via functioning as a potential GTPase-activating protein (GAP) for RagC/D-GTP, because GDP-bound RagC/D is required for the recruitment of mTORC1 (Petit et al., 2013; Tsun et al., 2013). Thus, activation of TFE3 positively regulates mTORC1, by both enhancing catabolic processes, including lysosomes and autophagy, and upregulating RagD and FLCN, two genes involved in mTORC1 activation. Activated mTORC1 phosphorylates and inactivates TFE3, forming a negative feedback loop of the regulation of

TFE3 (Figure 3-1B). Of note, TFEB and MiTF are regulated in a similar way as TFE3 (Martina and Puertollano, 2013; Martina et al., 2014c).

The oncogenic activity of TFE fusion genes has been demonstrated in several studies.

Injection of PRCC-TFE3-transformed NIH3T3 cells into nude mice resulted in aggressive tumor growth (Weterman et al., 2001b); knock-down of NonO-TFE3 in UOK109 cell line (NonO-

TFE3) abolished virtually all colony growth (Davis et al., 2006) in culture; silencing of TFE3 in

HCR-59 cell line (Luc7L3-TFE3) reduced tumor cell proliferation (Di Malta et al., 2017).

92

Notably, kidney-specific overexpression of TFEB in mice resulted in papillary-RCC (pRCC) and hepatic metastases (Calcagnì et al., 2016). Even though MiTF family transcription factors have been long recognized for contributing to oncogenesis, the molecular mechanism remains elusive.

Higher transcriptional activation activity of PRCC-TFE3 and ASPL-TFE3 compared to that of native TFE3 has been observed (Kobos et al., 2013; Skalsky et al., 2001), suggesting a model of abnormal activation of TFE3 downstream genes. SFPQ-TFE3, however, is proposed to transform renal cells by sequestering tumor suppressor p53 in the cytoplasm (Mathur and Samuels, 2007).

In addition, PRCC-TFE3 was also reported to interact with MAD2B, a mitotic checkpoint protein, with reduced affinity, leading to a mitotic check point defect (Weterman et al., 2001a).

Efforts have also been made to identify the transcriptional targets of TFE3. A proto-oncogene,

MET, was identified as a direct target of ASPL-TFE3 (Kobos et al., 2013; Tsuda et al., 2007).

MET is a receptor tyrosine kinase that activates a wide range of signaling pathways, including

PI3K, MAPK, and AKT (Organ and Tsao, 2011). MET is also significantly over-expressed in pRCC compared to matched normal samples. Furthermore, MET dominant active mutants, which constitutively activate the downstream pathways, have been frequently detected in pRCC, the transcriptional profile of which is clustered with translocation-RCC (Durinck et al., 2015), indicating a common MET signaling over-activation in these two subtypes of RCCs. In support of the oncogenic activity of TFE3, Rag GTPases component RagD was identified as another direct target of TFE3. The upregulation of RagD leads to the activation of mTORC1 pathway, as revealed by the phosphorylation of mTORC1 downstream protein S6K at residue T389 (Di

Malta et al., 2017). On the other hand, however, cyclin-dependent kinase inhibitor 1A (p21), which suppresses cell cycle progression, was also identified as a direct target of TFE3 (Ishiguro and Yoshida, 2016; Medendorp et al., 2009). Ectopic expression of PRCC-TFE3 in HEK293

93

cells and ASPL-TFE3 in HEK293 and mesenchymal stem cells (MSCs) induced cell cycle arrest and senescence of MSCs (Ishiguro and Yoshida, 2016; Medendorp et al., 2009). Transfection of

PRCC-TFE3 into the cell line derived from PRCC-TFE3 RCC did not further up-regulate p21 or arrest cell cycle, suggesting that p21-mediated inhibition of cell cycle progression is bypassed in the tumor cell line. Given the counteracting roles of p21 and MET/mTORC1 in cell cycle progression, TFE3 likely accelerates cell senescence, which contributes the cancer development

(Blagosklonny, 2011). Finally, in a transgenic mouse model of TFEB overexpression, Wnt/β- catenin signaling pathway was hyper-activated in the affected kidney (Calcagnì et al., 2016).

Though there is no evidence showing that Wnt/β-catenin signaling pathway components are direct targets of TFE3/MiTF family genes, TFE3/MiTF appears to mediate WNT signaling via co-regulation of WNT signaling target gene DCT (Yasumoto et al., 2002) and acting as a downstream target of TCF, a component of WNT signaling pathway (Davis et al., 2003;

Widlund et al., 2002).

Notably, most fusions to TFE3 lack the first three exons of native TFE3 transcript

(Kauffman et al., 2014), which contains the serine residues required for Rag GTPase interaction and cytosolic sequestration (Martina and Puertollano, 2013; Martina et al., 2014c). Therefore, I hypothesized that the TFE3 fusion proteins are likely to be constitutively localized in the nucleus, and thus to escape the negative regulation by mTORC1. CHC-TFE3 may also harbor transcriptional activation activity, as the acidic transactivation domain of TFE3 is not present in the fusion protein, yet the renal cell carcinoma samples were stained positive for PMEL17 (also known as HMB45), a direct transcriptional target of MiTF in melanocytes (Argani et al., 2003;

Du et al., 2004).

94

In this chapter, I discovered that clathrin heavy chain harbors transcriptional activation activity, as revealed by both GAL4 based and native TFE3 binding motif based luciferase assays.

Further, I observed cell-cycle arrest by CHC-TFE3 in HEK293T cells, suggesting that p21 was transcriptionally activated by CHC-TFE3.

Results

Clathrin heavy chain interacts with Klf4 transactivation domain, but does not appear to functionally affect the reprogramming and transactivation activities of Klf4.

In the initial attempt to isolate the interaction partner of the transactivation domain of

Klf4, transactivation and reprogramming deficient LI/AA mutant of GST-Klf4-TAD was set as the negative control instead of 2LA and I106A (See Chapter 2 for details). Clathrin heavy chain was highly enriched in the GST-Klf4-TAD wild-type sample, but not in the LI/AA mutant. This raised the possibility that clathrin heavy chain may mediate the transactivation and reprogramming activity of Klf4 during reprogramming. Examination of the Klf4 transactivation domain revealed a canonical clathrin-box motif: LφXφ[D/E] (φ represents large hydrophobic residues, X represents any amino acid), which binds to the amino terminal domain of clathrin heavy chain (Lemmon and Traub, 2012) (Figure 3-2A). The leucine residue of LI/AA mutant is at the second position of this motif, whereas the isoleucine residue is not within this motif (both colored in red in Figure 3-2A). Single mutation of the leucine would be predicted to abolish the interaction between Klf4 TAD and CHC, while single mutation of the isoleucine would not. To test this, I purified the amino terminal domain (NTD, aa1-363) of CHC from bacteria, and tested its binding to the leucine mutant (L101A) and isoleucine mutant (I106A). As expected, NTD was not enriched in the L101A bound fraction but enriched in the I106A bound fraction (Figure 3-

2B). If CHC mediates the transcriptional activation activity of Klf4 TAD, the I106A mutant but

95

not L101A mutant should be able to activate luciferease expression in the dual luciferase assay.

Surprisingly, both I106A and L101A exhibit poor luciferase expression, compared to the wildtype TAD (Figure 2-2A in Chapter 2). Furthermore, I106A mutation also affects the reprogramming activity of Klf4 (Figure 2-2C in Chapter 2). This result prompted us to include

GST-Klf4-TAD I106A as the negative control in the mass spectrometry to search for the factors enriched in the wildtype but not in the I106A mutant, as a way to more specifically identify potential reprogramming factors.

Even though clathrin heavy chain was also detected in the co-immunoprecipitation of

Klf4 from mouse embryonic stem cell (mESC) nuclear extract (Figure 3-2C), my results suggest that clathrin heavy chain is not likely to function as a mediator of transcriptional activation by

Klf4. However, it is possible that CHC acts as a negative regulator of Klf4, given that CHC is predominantly localized in the cytoplasm, allowing the interaction between Klf4 and CHC to sequester Klf4 in the cytoplasm. A study has shown that inhibition of endocytosis by the clathrin heavy chain inhibitor PITSTOP, which blocks interaction with clathrin box motifs, enhances the reprogramming efficiency in human cells (Qin et al., 2014). The interpretation was that clathrin- mediated endocytosis is required for TGF-β signaling, which hampers reprogramming. This interpretation was supported by the evidence that PITSTOP reduced TGF-β signaling (Qin et al.,

2014). Given the possibly that PITSTOP addition also disrupts the interaction between Klf4 and

CHC to allow Klf4 to enter nucleus, I set out to test whether co-expression of a cytosolic form of

Klf4 1-349, which does not contain the nuclear localization signal and the zinc finger domain

(Shields and Yang, 1997), allows more full-length Klf4 to enter nucleus to enhance reprogramming efficiency by competing for clathrin heavy chain binding. Co-expression of Klf4

1-349 with OSK (Oct4, Sox2, and Klf4) did not produce significantly more Nanog+ colonies

96

than co-expression of tdTomato in two experiments (Figure 3-2D). This evidence does not support the notion that CHC negatively regulates reprogramming via sequestering Klf4 in the cytoplasm.

To test whether CHC can affect the transactivation activity of Klf4, CHC was knocked down or overexpressed in the dual luciferase assay of GAL4-Klf4 TAD. Knock-down of CHC marginally up-regulated the luciferase signal of Klf4 TAD, and over-expression of a GFP-tagged

CHC marginally down-regulated the luciferase signal of Klf4 TAD (Figure 3-2E; Western blotting of knockdown and overexpression of CHC are shown in Figure 3-S1A). Thus, it seems that CHC may play a negative role in the transcriptional regulation by Klf4. However, knock- down and overexpression of CHC also up-regulated and down-regulated the transactivation activity of VP16 TADc (VPC) and TAD of E1A (Figure 3-2E), which do not carry clathrin box motifs, suggesting that the negative effect by CHC is not specific to Klf4. As an alternative approach to repress CHC, I used the small molecule PITSTOP2. PITSTOP2 had an acute effect on the luciferase readout, even after only 30 minutes of administration. Both firefly and renilla luciferase showed a tremendous decrease, rendering the normalized data hard to interpret (Figure

3-S1B). I conclude that CHC does not specifically mediate the transcriptional activation activity of Klf4.

Clathrin heavy chain exhibits transcriptional activation activity

Previous studies suggest that CHC mediates the transcriptional activation activity of p53

(Enari et al., 2006; Ohmori et al., 2008). We asked whether CHC by itself can activate transcription. CHC full-length protein and CHC segments were fused to GAL4 DNA binding domain, and were tested for transactivation activity in HeLa cells. Interestingly, the segment

833-1406, which mediates the transactivation activity of p53 (Ohmori et al., 2008), and full-

97

length CHC (1-1675), do not exhibit robust transactivation activity in our assay, whereas CHC 1-

579 and 1-932 exhibit strong transcriptional activation activity, about 20-30 fold above the

GAL4 DNA binding domain (DBD) control (Figure 3-3A upper panel). Mutation of the CHC lysine residues that mediate the interaction between Klf4 and CHC (Figure 3-3B) to glutamates

(1-579 KK) did not affect the transactivation activity of CHC, suggesting that Klf4 does not mediate the transactivation activity of CHC. Lack of robust transactivation activity in CHC 1-

479 or any constructs ending before 479 (1-100 and 1-389) suggested that transactivation domain might lie in the 479-579 region. However, other segments covering 479-579 (472-838, 472-579, and 453-579) did not exhibit robust activity as well (Figure 3-3A lower panel). Splitting the 479-

579 region into two parts (453-529, 530-579) also failed to recover potent transactivation activity

(Figure 3-3A lower panel). A 9-amino-acid TAD-like sequence (547-556, predicted online by the program described in Piskacek et al., 2007) also failed to produce vigorous signal. These results suggest that the N-terminal region before 479 and 479-579 region are both required for the transcriptional activation. The lack of transactivation activity in full-length CHC may result from the trimerization of CHC, which may hamper the nuclear localization of CHC (Ybe et al., 2013).

In sum, some segments of CHC do exhibit transactivation activity, and regions before 479 and between 479 and 579 seem to be both required for this activity. This activity may be inhibited in the full length context. Similar results were observed in dual luciferase assay carried out in mouse embryonic stem cells (mESCs), and western blotting using GAL4 antibody revealed similar expression levels of GAL4 fusions (Figure 3-S2).

CHC-TFE3 exhibits transcriptional activation activity

CHC fusion to transcription factors has been detected in two cases of renal cell carcinoma

(Argani et al., 2003; Durinck et al., 2015). In both cases, CHC 1-932 segment was fused to the

98

conserved C terminal region of TFE3 and TFEB, both of which lack the conserved acidic transactivation domain but retain the basic helix-loop-helix domain and leucine zipper domain for dimerization and DNA binding (Figure 3-1C). It was suggested that CHC 1-932 segment may harbor transcriptional activation activity, given the evidence of nuclear localization of CHC-

TFE3 fusion and abnormal expression of melanocytic protein HMB45 (targeted by MiTF) in the renal cell carcinoma samples (Argani et al., 2003). In support of this, in our GAL4 luciferase assay, CHC 1-932 does exhibit transactivation activity (Figure 3-3A). To test whether CHC 1-

932 also retains transactivation activity in the context of CHC-TFE3 fusion, a different dual luciferase assay was conducted. CHC-TFE3 fusion protein was expressed and a luciferase plasmid harboring 3 tandem µE3 sites for TFE3 binding was used as the reporter plasmid.

Indeed, CHC-TFE3 exhibits higher transactivation activity than TFE3 296-575, the fragment of

TFE3 in the CHC-TFE3 fusion (Figure 3-4A). The residual transactivation activity in TFE3 296-

575 may come from a second transactivation domain in the C terminal region of TFE3 (Artandi et al., 1995). Unlike the case of PRCC-TFE3 or ASPL-TFE3, the transactivation activity of

CHC-TFE3 is lower than native TFE3 protein (TFE3 1-575) (Figure 3-4A) (Kobos et al., 2013;

Weterman et al., 2000). (Proteins expression levels were determined by using an antibody against the C terminus of TFE3 at two DNA transfection amounts (Figure 3-S3B), and were plotted with luciferase signal in Figure 3-S3A. Assuming the linear relationship between protein level and luciferase signal, the luciferase signal ranking is TFE3 1-575 > CHC-TFE3 > TFE3

296-575 when protein level is above TFE3 1-575 at 2 ng.) Therefore, CHC 1-932 harbors transactivation activity in the context of CHC-TFE3 fusion, though not as strong as the native

TFE3 acidic transactivation domain.

99

Ectopic expression of CHC-TFE3 arrests HEK293T cell cycle.

To test the oncogenic activity of CHC-TFE3, CHC-TFE3 fusion was transfected into

HEK293T cells, and a cell viability assay was conducted to assess cell growth. Unexpectedly, both ectopic expression of CHC-TFE3 and full-length TFE3 caused reduced cell numbers, compared to CHC 1-932 (Figure 3-4B). Reduced cell growth, however, does not distinguish between increased apoptosis and cell-cycle arrest. Previous studies reported that PRCC-TFE3 and ASPL-TFE3 activate p21 (cyclin-dependent kinase 1A) in HEK293 cells to induce cell cycle arrest. To test whether the reduced cell numbers were due to cell cycle arrest, CHC-TFE3 transfected cells were fixed 48 hr post-transfection and stained with propidium iodide for cell cycle analysis by flow cytometry. A plasmid containing dominant negative p53 (p53DN) and shRNA against RB (shRB) was also included as a control. As shown in Figure 3-4C, p53DN_shRB exhibits lower percentage of G0/G1 phase cells and higher percentage of S phase cells compared to the vector control as expected. Both CHC-TFE3 and TFE3 1-575 sample have higher percentage of G0/G1 phase cells, compared to the vector control. TFE3 296-575 and CHC

1-932, however, appear to have similar G0/G1 phase cell percentage with the vector control. The differences are subtle, possibly because all cells were used for FACS analysis but only a subset were successfully transfected. These results suggest that CHC-TFE3 and native TFE3 may activate p21 gene to induce cell-cycle arrest in HEK293T cells, just as PRCC-TFE3 and ASPL-

TFE3.

Discussion

In this study, I characterized transcriptional activation activity of clathrin heavy chain. By

GAL4 dual luciferase assay, the transcriptional activation activity lies within the N terminal region of CHC (1-579), and this activity is not mediated through the interaction with Klf4. The

100

exact transactivation domain was not explicitly mapped, but it appears that two or more regions within 1-579 are required for transcriptional activation. In the context of CHC-TFE3 fusion,

CHC retains the transactivation activity, as CHC-TFE3 exhibits higher luciferase expression than

TFE3 296-575. Similar to the previous findings of other TFE3 fusions (PRCC-TFE3, ASPL-

TFE3) arresting cell cycle, ectopic expression of CHC-TFE3 also appears to arrest the cell cycle in HEK293T cells, possibly via activating p21. These findings provide evidence that the transactivation activity of CHC may contribute to the oncogenic activity of CHC-TFE3.

CHC 1-932 was found to fuse to exact same part of two different genes, TFEB and TFE3, in the same type of cancer (Figure 3-1C). Compared to TFE3, which has been found to fuse to a variety of genes (PRCC, ASPL, NonO, SFPQ, MED15, LUC7L3, and CHC), TFEB has fewer fusion partners (Alpha, CHC, and KHDRBS) (Chahoud et al., 2016). CHC is the only fusion partner shared by both TFE3 and TFEB discovered so far. The high similarity between CHC-

TFE3 and CHC-TFEB suggests that they transform renal cells by the same mechanism. My finding supports the notion that one mechanism comes from transactivation activity of CHC.

It is unclear how CHC activates transcription. CHC may recruit p300, as evidenced by the fact that CHC enhances the interaction between p300 and p53 (Enari et al., 2006). Given the fact that CHC 833-1406 also enhanced p300-mediated p53 transactivation (Ohmori et al., 2008), it is likely that CHC 1-579 activates transcription via a different mechanism, but it’s worthwhile to test whether there is a direct interaction between p300 and CHC, and if so what regions in

CHC mediates this interaction. To further dissect the mechanism of CHC transcriptional activation, affinity purification of CHC-TFE3 followed by mass spectrometry may be a plausible way to identify the potential transcriptional coactivators that interact with CHC.

101

Western blotting of the cell lysates from the luciferase assay of CHC-TFE3 revealed two major bands (Figure 3-S3B) using the TFE3 antibody that recognizes 406-545 of TFE3. The molecular weight of the upper band corresponds to the full length CHC-TFE3 (~135 kDa), and the lower band (~100 kDa) should represent a N-terminal truncated version of CHC-TFE3. This is corroborated by the fact that the CHC NTD-specific antibody only recognized the upper band

(Figure 3-S3C). The signal intensity of the lower band is higher than the upper band in both transfection conditions (Figure 3-S3B) and in a different cell type HEK293T (Figure 3-S3C).

Thus, it is likely that in RCC, truncated CHC-TFE3 also exists as a major form. It’s yet to be determined which form plays a major role in transcriptional activation and tumorigenesis. Based on the evidence that N terminal region of CHC is required for transcriptional activation activity of CHC in GAL4 luciferase assay, it is more likely that the full-length CHC-TFE3 harbors full transactivation activity and thus plays a major role in renal cell transformation.

Another possible mechanism of tumorigenesis by CHC-TFE3 (as well as other TFE3 fusion proteins) is that translocation of TFE3 deprives TFE3 the amino acids that are required for its interaction with Rag GTPases and its cytosolic retention, an important self-regulation mechanism of native TFE3 and MiTF family genes (Martina and Puertollano, 2013; Martina et al., 2014c), thus facilitating the constitutive activation of TFE3 downstream genes. Therefore, it might be meaningful to examine the subcellular localization of CHC-TFE3 (as well as other

TFE3 fusions), compared to native TFE3 and TFE 296-483. I hypothesize that unlike native

TFE3 that is arrested in the cytoplasm in fully-fed cells, CHC-TFE3 is predominantly localized in the nucleus in both nutrient-rich and starvation conditions.

The post-translational modification of TFE3 also seems to play an important role in regulation of TFE3 transcriptional activation activity. In addition to phosphorylation of S321 that

102

controls the subcellular localization of TFE3 (Martina et al., 2014c), one undetermined modification of TFE3, which was enhanced by inactivation of FLCN and caused an increase of about 17kDa in apparent molecular weight, seems to play a positive role in TFE3 transactivation activity (Hong et al., 2010). This modification is less likely to be SUMOylation, as

SUMOylation of MiTF negatively regulates the transcriptional activation activity of MiTF

(Miller et al., 2004; Murakami and Arnheiter, 2005). Identifying this post-translational modification may elucidate the functionality of this modification and its relationship with the transactivation activity of TFE3.

Materials and Methods

Plasmids and Antibodies

For GAL4-CHC construction, clathrin heavy chain was cloned from mouse embryonic stem cell cDNA. Fragments were amplified and cloned into pBXG1 vector (gift from Michael

Carey) with EcoRI and BamHI sites. G5E4T-luc, which contains 5 tandem GAL4 binding sites, a minimal E4 promoter, and firefly luciferase gene, is also a gift from Michael Carey. 3 tandem

µE3 sites (GGTCATGTGGCAAGGCTATTTG (Beckmann et al., 1990)) were inserted into pGL4.27 (Promega) between SacI and XhoI sites. CHC fragment was amplified from GFP-

CHC17KDP (a gift from Stephen Royle (Addgene plasmid # 59799)), and TFE3 was amplified from pEGFP-N1-TFE3 (gift from Shawn Ferguson (Addgene plasmid # 38120)). Segments were assembled into FU-CGW plasmid that contains the promoter of Ubiquitin for protein expression, using NEBuilder DNA assembly kit. GAL4 antibody is a house-made rabbit polyclonal antibody, a gift from Michael Carey. TFE3 C terminal antibody is from Sigma (HPA023881). Clathrin heavy chain N terminal antibody is from BD Sciences (610499).

103

Dual Luciferase assay

For the dual luciferase assay in mouse embryonic stem cells, cells were cultured on feeders in mouse embryonic stem cell media. Reverse transfection was done with freshly trypsinized and feeder-depleted mES using Lipofectamine 2000. 200k cells were plated per well in a gelatin-treated 6-well plate without feeders. 1440 ng G5E4T-luc and 72 ng pGL4.75

(Promega) and 720 ng pBXG1 expression plasmids were transfected into each well. 24-40 hours post transfection, cells were trypsinized. 2.5% of the total cells were used for luciferase assay with Dual-Glo luciferase reagent (Promega). The rest of the cells were lysed for western blot for

GAL4-fusion expression. For the dual luciferase assay in HeLa cells, 40k cells were seeded per well in 24-well plates the day before transfection. 100 ng G5E4T-luc, 2ng pGL4.75, and 4ng pBXG1 expression plasmids were transfected on top the next day using Lipofectamine 3000 reagent. Cells were harvested 2 days post-transfection for luciferase assay with Dual-Luciferase

Reagent (Promega). For µE3 transactivation assay, procedures are the same as GAL4 luciferase in HeLa cells, with transfection of 100 ng µE3 reporter plasmid, 2 ng pGL4.75, and indicated amount of FU-CGW expression plasmids. 10 µL cell lysates were loaded for western blotting by

TFE3 C terminus antibody.

Cell viability assay

10k HEK293T cells were seeded per well in a 24-well plate in the morning of transfection day. 500 ng FU-CGW expression plasmids were transfected into each well 8 hours later using Lipofectamine 3000. 4 days later, CellTiter-Glo Luminescent Cell Viability Assay kit

(Promega G7570) was used to quantify the number of viable cells in the culture. Steps in the manual were followed, except that 100 µL media and 100 µL reagent were used for each well in

24-well plates.

104

Cell cycle analysis

0.4 million HEK293T cells were seeded per well in a 6-well plate the day before transfection. 3000 ng of FU-CGW expression plasmids were transfected into the cells the next morning using Lipofectamine 3000. Media were replaced 8 hours later. Cells were harvested 48 hours post-transfection by trypsinization. Cells were resuspended in 300 µL PBS. While being vortexed, 700 µL ice-cold 100% ethanol was added. Cells were then kept at 4 ℃ overnight.3 mL PBS was added to the fixed cell suspension and cells were pelleted at 2000 rpm for 10 minutes. Cells were washed once with PBS, and resuspended in 500 µL PBS with 50 µg/mL propidium iodide and 0.5 µg/mL RNase. Cell suspension were kept at 37 ℃ for 30 minutes, and then subjected to flow cytometry analysis. FCS Express 6 plus software was used to analyze the cell cycle, by fitting with the appropriate cell cycle model.

CHC knockdown

On day 0, 40k HeLa cells were seeded in each well for 4 wells for each knockdown in a

24-well plate, and 5 hours later siRNA was transfected using Lipofectamine RNAiMAX. On day

1, media was replaced in the morning and cells were 1:2 split into 6-well plates (four 24-well wells to two 6-well wells) in the evening. On day 2 evening, siRNA was transfected again directly on top of the cells. At this time point, cell morphology difference can be observed between siCHC and control. siCHC cells are longer and look more like mouse embryonic fibroblasts. On day 3 afternoon, cells were trypsinized and counted, and there was no apparent difference between siControl and siCHC (~400k cells for both knockdowns). 90% of the cells were split into nine 24-well wells for luciferase assay, and the remaining 10% were split into two

24-well wells for western blot. 6 hours later, transfection for dual luciferase assay was conducted on top of the cells. On day 5 evening, cell were lysed for luciferase reading. siRNA was

105

customized and manufactured by GE Dharmacon. CHC siRNA: sense 19nt core sequence: 5’

AUCCAAUUCGAAGACCAAU 3’ with UA over hang. Antisense loading was enhanced by

ON-TARGET by Dharmacon. Negative control sense siRNA: 5’

UAGCGACUAAACACAUCAA 3’ with UU over hang.

Inhibition of CHC by PITSTOP2

HeLa cells were normally transfected for dual luciferase assay. About 42 hours post- transfection, 30 mM PITSTOP2 dissolved in DMSO was diluted 1000 times in DMEM containing 0.1% FBS, and this mixture was used to replace the cell culture media. After continued culture for 30 minutes, 1 hour, and 2 hours, cells were lysed for dual luciferase assay.

106

Figure 3-1 Diagrams of current understandings of TFE3 regulation network and CHC-

TFE3 fusion proteins.

A) Diagram depicting the TFE3 target genes and the network regulating TFE3 at starvation and rich-nutrient conditions. When cells are starved, TFE3 is located in nucleus and activates catabolism-related genes, p21, MET and RagD. RAG GTPases are in the inactive form (RagA/B bound with GDP and RagC/D bound with GTP) that binds to FLCN but not mTORC1. In fully- fed cells, RAG GTPases are activated by the GEF activity of Ragulator and GAP activity of

FLCN, resulting in the release of FLCN and recruitment of mTORC1 to lysosomes, where mTORC1 is subsequently activated by Rheb. Active Rag GTPases also recruit TFE3, facilitating the phosphorylation of TFE3 by mTORC1. This phosphorylation causes the cytosolic retention of TFE3 by interacting with 14-3-3. B) Feedback loop of TFE3. By upregulating FLCN, RagD, and metabolism-related genes, TFE3 potentiates the mTORC1 signaling pathway, and in turn down-regulates the nuclear localization of TFE3. C) Illustration of transcripts of CHC-

TFE3/TFEB constructs and native CHC, TFE3, and TFEB. Each box represents an exon with exon number below the box, and thicker boxes represent protein coding regions. Functional domains are annotated above the corresponding regions of the transcripts using colored bars.

Transcript and exon lengths are approximate to scale. CHC-TFE3 and CHC-TFEB share a nearly identical protein architecture. Abbreviations: AD, activation domain; bHLH-LZ, basic-helix- loop-helix-leucine-zipper domain; NTD, N terminal domain.

107

A

Amino acids STARVATION lysosomal lumen lysosomal lumen lysosomal lumen

GEF activity Rheb FLCN FLCN GTP RagA/B GDP RagA/B GDP RagA/B activitate Ragulator Ragulator Ragulator mTORC1 V-ATPase

V-ATPase GAP activity RagC/D GTP V-ATPase RagC/D GTP RagC/D GDP

TFE3 P 14-3-3 P TFE3 TFE3 FLCN

nucleus nucleus

p21, MET, RagD, p21, MET, RagD, TFE3 CATABOLISM genes CATABOLISM genes

B TFE3

CATABOLISM mTORC1 RagD FLCN

AD bHLH-LZ C TFE3 1 2 3 4 5 6 7 8 9 10

AD bHLH-LZ TFEB 1 2 3 4 5 6 7 8 9

NTD Clathrin LC binding CLTC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 27 29 31 24 26 28 30 32

CLTC-TFE3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 6 7 8 9 10

CLTC-TFEB 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 5 6 7 8 9

108

Figure 3-2 The interaction between Klf4 and clathrin does not appear to be functional in

reprogramming and transcriptional activation.

A) The 5-amino-acid clathrin-box motif is underlined in the Klf4 transactivation domain sequence (aa90-110). Leucine 101 and isoleucine 106 are colored in red. B) Purified 6xHis- tagged CHC (aa1-363) was incubated with indicated GST-Klf4 90-110 proteins bound to glutathione-Sepharose. Bound proteins were separated by SDS-PAGE and detected by staining with Coomassie blue. Input (IN) represents 18% of 6xHis-tagged CHC used for the samples analyzed in lanes 2-6. C) Western blotting for CHC and Klf4 in immunoprecipitates of Klf4 from nuclear extract of mouse embryonic stem cells. Input (IN) represents 1.4% of nuclear extract used for the samples in lanes 1-2. 12.5% of eluates of samples were loaded. * denotes an

IgG heavy chain band detected by the secondary antibody. D) Klf4 1-349 viral vector was co- delivered to mouse embryonic fibroblasts with OSK (Oct4, Sox2, and Klf4), and Nanog positive colonies were counted after 12 days of culture. Fluorescent protein tdTomato was set as a negative control. Results of two experiments are shown. Rep1 and Rep2 are technical replicates from the same experiment. E) GAL4-Klf4 TAD transactivation activity in HeLa cells was tested in CHC knockdown (left panel) and overexpression (right panel) cells. VP16 TADc (VPC) and

E1A TAD were used as controls. Luciferase expression levels were normalized to renilla luciferase. Error bars represent SD. *P≤ 0.05 **P ≤ 0.01 *** P ≤ 0.001 as determined by unpaired t-test. Data are from one experiment with 3 replicates. Western blotting to detect overexpression and knockdown of CHC are shown in Figure 3-S1A.

109

A clathrin-box motif LφXφ[D/E] 90 ARRETEEFNDLLDLDFILSNS 110

B IN WT L101A I106A L I GST

6xH-CHC NTD

GST-fusion

1 2 3 4 5 6

C αKlf4 IgG IN

CHC

Klf4 *

1 2 3 D Rep1 Rep1 OSK+tomato Rep2 OSK+tomato Rep2

OSK+1-349 OSK+1-349

0 50 0 100 150 200 250 100 200 300 400 Nanog+ colony number Nanog+ colony number

E 1.0 *** 0.15

0.8 *

0.10 0.6 U U L L R 0.4 ** R * 0.07 0.05 0.09 0.2

0.0 0.00

siCtrl A siCHC pcDNA A A Klf4 siCtrl VPC siCtrl E1 GFP-CHC Klf4 siCHC VPC siCHC E1 Klf4 pcDNA VPC pcDNA E1 A Klf4 GFP-CHC VPC GFP-CHC E1

110

Figure 3-3 CHC 1-579 harbors transcriptional activation activity.

A) Dual luciferase assay of GAL4 fusions to various CHC constructs in HeLa cells. Constructs are illustrated by schematic diagrams. Construct and domain sizes are approximate to scale.

Luciferase expression levels were normalized to renilla and then to GAL4 DNA binding domain

(DBD). Data were pooled from two experiments. Error bars represent SD. *P≤ 0.05 when compared to DBD as determined by ratio t-test. Abbreviations: NTD, N terminal domain (1-

330); CHCR, clathrin heavy chain repeat; td, trimerization domain. Also see Figure 3-S2 for luciferase assay in mouse embryonic stem cells. B) Extracts of E. coli expressing the indicated

6xHistidine-tagged CHC 1-579 (WT and K96E K98E double mutant (KK)) were incubated with

GST, GST-Klf4 (aa90-110) or GST-Klf4 (aa1-396) immobilized on glutathione-Sepharose.

Bound proteins were eluted, separated by SDS-PAGE and visualized by staining with Coomassie blue. IN represents 6% of input used for samples in lanes 3-6.

111

A DBD NTD CHCR0 CHCR1 CHCR2 CHCR3 CHCR4 CHCR5 CHCR6 CHCR7 t d 1-1675 NTD CHCR0 CHCR1 CHCR2 CHCR3 CHCR4 CHCR5 CHCR6 1-1406 NTD CHCR0 CHCR1 CHCR2 1-932 * NTD CHCR0 1-579 * NTD CHCR0 1-579KK * NTD 1-479 NTD 1-389 1-100 CHCR3 CHCR4 CHCR5 CHCR6 833-1406

0 10 20 30 40 RLU

DBD

NTD CHCR0 1-579 * CHCR1 CHCR2 472-838 472-579 453-579 453-529 530-579 547-556

0 10 20 30 40 RLU

B IN Eluate WT KK WT KK WT KK

GST-Klf4 1-396 6xH-CHC 1-579 (WT/KK)

GST-Klf4 90-110

1 2 3 4 5 6

112

Figure 3-4 Effects of ectopic expression of CHC-TFE3 on cell growth and cell cycle.

A) Dual luciferase assay of CHC-TFE3 in HeLa cells. 50 ng of FU-CGW expression plasmids were transfected for each sample. Data were pooled from three experiments. Luciferase expression levels were normalized to the vector control. P value was determined by ratio t-test.

B) CHC-TFE3 was transfected into HEK293T cells and relative cell number was determined by

CellTiter-Glo reagent 4 days post-transfection. Raw light units (LU) are presented for each sample. Data is from one experiment with 3 replicates. P value was determined by unpaired t- test. In (A) and (B), error bar represents standard deviation. **P ≤ 0.01 *** P ≤ 0.001. C) Cell cycle distributions of HEK293T cells transfected with indicated constructs. Cells were harvested

48 hours post-transfection, fixed, and stained with PI for FACS analysis. Y axis is cell number count and X axis is the PI fluorescent signal.

113

A B

20 1.4 108 ** 1.2 108 15 1.0 108 8.0 107 *** 1 107 U ** L 10 U 8 106 L R ** 6 ** 6 10 5 6 NS 4 10 2 106 0 0

.L.) F vector TFE3 f.l. TFE3( CHC(1-932) CHC-TFE3 CHC 1-932 CHC-TFE3 TFE3(296-575) TFE3 296-575 mock transfection

C Vector CHC-TFE3 TFE3 296-575 1245 1195 976 G0/G1 38.8% G0/G1 42.4% G0/G1 38.0% S 34.6% S 35.9% S 40.6% 934 896 732 G2/M 26.6% G2/M 21.7% G2/M 21.4% t t t n n n u u 623 598 u 488 o o o C C C

311 299 244

0 0 0 0 16000 32000 48000 64000 0 16000 32000 48000 64000 0 16000 32000 48000 64000 PE-A PE-A PE-A

TFE3 1-575 CHC 1-932 p53DN_shRB 1327 1204 1310 G0/G1 40.5% G0/G1 38.0% G0/G1 36.0% S 39.2% S 37.7% S 40.1% 995 903 983 G2/M 20.4% G2/M 24.4% G2/M 23.9% t t t n n n u u u 664 602 655 o o o C C C

332 301 328

0 0 0 0 16000 32000 48000 64000 0 16000 32000 48000 64000 0 16000 32000 48000 64000 PE-A PE-A PE-A

114

Figure 3-S1 CHC inhibition or overexpression in dual luciferase assay.

A) Immuno-blots of CHC in knockdown and over-expression cells used for the dual luciferase assay. Left panel is CHC knockdown. Both day 4 cell lysates and day 5 cell lysates exhibit significant CHC knockdown. Total protein stain (by Bio-Rad free-stain technology) indicates equivalent protein loading. Right panel is CHC over-expression. Cell lysates from Dual-

Luciferase Reporter assay were directly used for loading. The upper band is the ectopic GFP-

CHC and the lower band is the endogenous CHC. B) Firefly, renilla, and normalized luciferase expression levels for GAL4-Klf4 TAD in the presence of PITSTOP2 are shown in left, middle, and right panel, respectively. PITSTOP2 was administered for 30 minutes, 1 hour, and 2 hours.

In all conditions, both firefly and renilla luciferases were enormously affected.

115

A

day 4 day 5 -CHC -CHC -CHC P P P siCtrl siCHC siCtrl siCHC pcDNA GF pcDNA GF pcDNA GF CHC GFP-CHC CHC

Klf4 VPC E1A VPC ein stain t o r otal p t

B

8 3 107 5 10 a 0.8

4 108 0.6

7 U

U 2 10 3 108 0.4 2 108 7 Firefly L 1 10 Renilla L 0.2 1 108

0 0 Firefly normalized to Renill 0.0

OP2 2hr OP2 1 hr T OP2 1 hr OP2 2hr OP2 2hr DMSO 1hrT DMSO 2hr DMSO 1hrT DMSO 2hrT OP2 1 hr T OP2 30min OP2 30min OP2 30minDMSO 1hrT DMSO 2hr DMSO 30minT T T PITS DMSO 30min PITS DMSO 30min PITS PITS PITS PITS PITS PITS PITS

116

Figure 3-S2 CHC constructs exhibit transactivation activity in mESCs.

A) The same constructs of GAL4-CHC fusions as those tested in HeLa cells were tested in mESCs for transactivation activity by dual luciferase assay. Data were pooled from multiple experiments, all normalized to DBD in each experiment. Some constructs were repeated multiple times and show a significant P value compared to DBD. *** P ≤ 0.001 **** P ≤ 0.001 as determined by ratio t-test. B) Cell lysates from the luciferase assays were immunoblotted for the expression of GAL4 fusion proteins using antibody against GAL4. Arrows point to the specific band in each lane, corresponding to the correct size of each construct.

117

A DBD 1-1675 1-1406 *** 1-932 1-579 **** 1-579 KK 1-479 1-389 1-100 472-838 472-579 453-579 453-529 530-579 547-556

0 100 200 300 RLU

B GFP 1-1675 1-1406 1-479 833-1406 1-579 1-579 KK 1-579 KK 472-838 472-579 453-579 453-529 530-579 547-556 1-100 1-389 1-932 1-579

58 175 25

20 46 175 80

80

58

46

118

Figure 3-S3 CHC harbors transactivation activity in the context of CHC-TFE3.

2 ng and 50 ng of FU-CGW expression plasmids were transfected into HeLa cells and tested for transactivation activity by dual luciferase assay. Cell lysates were immunoblotted against TFE3

C terminal region shown in (B). For quantification of protein expression, intensity of the bands marked by asterisks were summed for CHC-TFE3 and TFE3 296-575, and total area marked by bracket was quantified for native TFE3 (endogenous TFE3 signal subtracted). Protein expression levels were normalized to endogenous TFE3, and plotted with luciferase expression level in (A).

C) HEK293T cell lysates of TFE3 constructs overexpression were immuno-blotted using clathrin

NTD antibody (green channel) and TFE3 C terminal region antibody (red channel). Asterisks mark the band positive for both antibodies.

119

A 15 TFE3 1-575 CHC-TFE3

10 TFE3 296-575 U L R 5

0 0 10 20 30 40 50 protein level relative to endogeneous TFE3

B 50 ng 2 ng

TFE3 TFE3 tor - or - c ct e e V CHC TFE3 296-575TFE3 1-575 V CHC TFE3 296-575TFE3 1-575

245 190 135 ** * 100 ** * 80 58

46 * * * 32 * * 25 * 22

tion tion C c ction c fe fe fe rans rans rans -TFE3 -TFE3 -TFE3 CHC TFE3 296-575TFE3 1-575non-t CHC TFE3 296-575TFE3 1-575non-t CHC TFE3 296-575TFE3 1-575non-t

190 135 ** * ** 100 80 58 46

32

25 22

Merged Green Red

120

References

Argani, P., Lui, M.Y., Couturier, J., Bouvier, R., Fournet, J.-C., and Ladanyi, M. (2003). A novel CLTC-TFE3 gene fusion in pediatric renal adenocarcinoma with t(X;17)(p11.2;q23). Oncogene 22, 5374–5378.

Armstrong, F., Duplantier, M.-M., Trempat, P., Hieblot, C., Lamant, L., Espinos, E., Racaud- Sultan, C., Allouche, M., Campo, E., Delsol, G., et al. (2004). Differential effects of X-ALK fusion proteins on proliferation, transformation, and invasion properties of NIH3T3 cells. Oncogene 23, 6071–6082.

Artandi, S.E., Merrell, K., Avitahl, N., Wong, K.K., and Calame, K. (1995). TFE3 contains two activation domains, one acidic and the other proline-rich, that synergistically activate transcription. Nucleic Acids Res. 23, 3865–3871.

Bar-Peled, L., Schweitzer, L.D., Zoncu, R., and Sabatini, D.M. (2012). Ragulator Is a GEF for the Rag GTPases that Signal Amino Acid Levels to mTORC1. Cell 150, 1196–1208.

Beckmann, H., Su, L.K., and Kadesch, T. (1990). TFE3: a helix-loop-helix protein that activates transcription through the immunoglobulin enhancer muE3 motif. Genes Dev. 4, 167–179.

Blagosklonny, M.V. (2011). Cell cycle arrest is not senescence. Aging 3, 94–101.

Calcagnì, A., Kors, L., Verschuren, E., De Cegli, R., Zampelli, N., Nusco, E., Confalonieri, S., Bertalot, G., Pece, S., Settembre, C., et al. (2016). Modelling TFE renal cell carcinoma in mice reveals a critical role of WNT signaling. Elife 5, e17047.

Chahoud, J., Malouf, G.G., and Tannir, N.M. (2016). Translocation Renal Cell Carcinomas. In Rare Genitourinary Tumors, (Cham: Springer, Cham), pp. 41–52.

Davis, I.J., Hsi, B.-L., Arroyo, J.D., Vargas, S.O., Yeh, Y.A., Motyckova, G., Valencia, P., Perez-Atayde, A.R., Argani, P., Ladanyi, M., et al. (2003). Cloning of an Alpha-TFEB fusion in renal tumors harboring the t(6;11)(p21;q13) translocation. Proceedings of the National Academy of Sciences 100, 6051–6056.

Davis, I.J., Kim, J.J., Ozsolak, F., Widlund, H.R., Rozenblatt-Rosen, O., Granter, S.R., Du, J., Fletcher, J.A., Denny, C.T., Lessnick, S.L., et al. (2006). Oncogenic MITF dysregulation in clear cell sarcoma: defining the MiT family of human cancers. Cancer Cell 9, 473–484.

Di Malta, C., Siciliano, D., Calcagnì, A., Monfregola, J., Punzi, S., Pastore, N., Eastes, A.N., Davis, O., De Cegli, R., Zampelli, A., et al. (2017). Transcriptional activation of RagD GTPase controls mTORC1 and promotes cancer growth. Science 356, 1188–1192.

Du, J., Widlund, H.R., Horstmann, M.A., Ramaswamy, S., Ross, K., Huber, W.E., Nishimura, E.K., Golub, T.R., and Fisher, D.E. (2004). Critical role of CDK2 for melanoma growth linked to its melanocyte-specific transcriptional regulation by MITF. Cancer Cell 6, 565–576.

Durinck, S., Stawiski, E.W., Pavía-Jiménez, A., Modrusan, Z., Kapur, P., Jaiswal, B.S., Zhang,

121

N., Toffessi-Tcheuyap, V., Nguyen, T.T., Pahuja, K.B., et al. (2015). Spectrum of diverse genomic alterations define non-clear cell renal carcinoma subtypes. Nat. Genet. 47, 13–21.

Enari, M., Ohmori, K., Kitabayashi, I., and Taya, Y. (2006). Requirement of clathrin heavy chain for p53-mediated transcription. Genes Dev. 20, 1087–1099.

Geller, J.I., Ehrlich, P.F., Cost, N.G., Khanna, G., Mullen, E.A., Gratias, E.J., Naranjo, A., Dome, J.S., and Perlman, E.J. (2015). Characterization of adolescent and pediatric renal cell carcinoma: A report from the Children's Oncology Group study AREN03B2. Cancer 121, 2457– 2464.

Haq, R., and Fisher, D.E. (2011). Biology and clinical relevance of the micropthalmia family of transcription factors in human cancer. J. Clin. Oncol. 29, 3474–3482.

Hong, S.-B., Oh, H., Valera, V.A., Baba, M., Schmidt, L.S., and Linehan, W.M. (2010). Inactivation of the FLCN tumor suppressor gene induces TFE3 transcriptional activity by increasing its nuclear localization. PLoS ONE 5, e15793.

Ishiguro, N., and Yoshida, H. (2016). ASPL-TFE3 Oncoprotein Regulates Cell Cycle Progression and Induces Cellular Senescence by Up-Regulating p21. Neoplasia 18, 626–635.

Kauffman, E.C., Ricketts, C.J., Rais-Bahrami, S., Yang, Y., Merino, M.J., Bottaro, D.P., Srinivasan, R., and Linehan, W.M. (2014). Molecular genetics and cellular features of TFE3 and TFEB fusion kidney cancers. Nat Rev Urol 11, 465–475.

Kirchhausen, T., Owen, D., and Harrison, S.C. (2014). Molecular structure, function, and dynamics of clathrin-mediated membrane traffic. Cold Spring Harb Perspect Biol 6, a016725– a016725.

Kobos, R., Nagai, M., Tsuda, M., Merl, M.Y., Saito, T., Laé, M., Mo, Q., Olshen, A., Lianoglou, S., Leslie, C., et al. (2013). Combining integrated genomics and functional genomics to dissect the biology of a cancer-associated, aberrant transcription factor, the ASPSCR1–TFE3 fusion oncoprotein. The Journal of Pathology 229, 743–754.

Laplante, M., and Sabatini, D.M. (2012). mTOR Signaling in Growth Control and Disease. Cell 149, 274–293.

Lemmon, S.K., and Traub, L.M. (2012). Getting in Touch with the Clathrin Terminal Domain. Traffic 13, 511–519.

Martina, J.A., and Puertollano, R. (2013). Rag GTPases mediate amino acid-dependent recruitment of TFEB and MITF to lysosomes. J. Cell Biol. 200, 475–491.

Martina, J.A., Chen, Y., Gucek, M., and Puertollano, R. (2014a). MTORC1 functions as a transcriptional regulator of autophagy by preventing nuclear transport of TFEB. Autophagy 8, 903–914.

Martina, J.A., Diab, H.I., Li, H., and Puertollano, R. (2014b). Novel roles for the MiTF/TFE

122

family of transcription factors in organelle biogenesis, nutrient sensing, and energy homeostasis. Cell. Mol. Life Sci. 71, 2483–2497.

Martina, J.A., Diab, H.I., Lishu, L., Jeong-A, L., Patange, S., Raben, N., and Puertollano, R. (2014c). The nutrient-responsive transcription factor TFE3 promotes autophagy, lysosomal biogenesis, and clearance of cellular debris. Sci Signal 7, ra9–ra9.

Mathur, M., and Samuels, H.H. (2007). Role of PSF-TFE3 oncoprotein in the development of papillary renal cell carcinomas. Oncogene 26, 277–283.

Medendorp, K., van Groningen, J.J.M., Vreede, L., Hetterschijt, L., Brugmans, L., van den Hurk, W.H., and Geurts van Kessel, A. (2009). The renal cell carcinoma-associated oncogenic fusion protein PRCCTFE3 provokes p21WAF1/CIP1-mediated cell cycle delay. Experimental Cell Research 315, 2399–2409.

Miller, A.J., Levy, C., Davis, I.J., Razin, E., and Fisher, D.E. (2004). Sumoylation of MITF and Its Related Family Members TFE3 and TFEB. J. Biol. Chem. 280, 146–155.

Murakami, H., and Arnheiter, H. (2005). Sumoylation modulates transcriptional activity of MITF in a promoter-specific manner. Pigment Cell Research 18, 265–277.

Ohata, H., Ota, N., Shirouzu, M., Yokoyama, S., Yokota, J., Taya, Y., and Enari, M. (2009). Identification of a function-specific mutation of clathrin heavy chain (CHC) required for p53 transactivation. J. Mol. Biol. 394, 460–471.

Ohmori, K., Endo, Y., Yoshida, Y., Ohata, H., Taya, Y., and Enari, M. (2008). Monomeric but not trimeric clathrin heavy chain regulates p53-mediated transcription. Oncogene 27, 2215– 2227.

Organ, S.L., and Tsao, M.-S. (2011). An overview of the c-MET signaling pathway. Therapeutic Advances in Medical Oncology 3, S7–S19.

Pastore, N., Vainshtein, A., Klisch, T.J., Armani, A., Huynh, T., Herz, N.J., Polishchuk, E.V., Sandri, M., and Ballabio, A. (2017). TFE3 regulates whole-body energy metabolism in cooperation with TFEB. EMBO Molecular Medicine 9, 605–621.

Perera, R.M., Stoykova, S., Nicolay, B.N., Ross, K.N., Fitamant, J., Boukhali, M., Lengrand, J., Deshpande, V., Selig, M.K., Ferrone, C.R., et al. (2015). Transcriptional control of autophagy- lysosome function drives pancreatic cancer metabolism. Nature 524, 361–365.

Petit, C.S., Roczniak-Ferguson, A., and Ferguson, S.M. (2013). Recruitment of folliculin to lysosomes supports the amino acid-dependent activation of Rag GTPases. J. Cell Biol. 202, 1107–1122.

Piskacek, S., Gregor, M., Nemethova, M., Grabner, M., Kovarik, P., and Piskacek, M. (2007). Nine-amino-acid transactivation domain: establishment and prediction utilities. Genomics 89, 756–768.

123

Qin, H., Diaz, A., Blouin, L., Lebbink, R.J., Patena, W., Tanbun, P., LeProust, E.M., McManus, M.T., Song, J.S., and Ramalho-Santos, M. (2014). Systematic Identification of Barriers to Human iPSC Generation. Cell 158, 449–461.

Roczniak-Ferguson, A., Petit, C.S., Froehlich, F., Qian, S., Ky, J., Angarola, B., Walther, T.C., and Ferguson, S.M. (2012). The Transcription Factor TFEB Links mTORC1 Signaling to Transcriptional Control of Lysosome Homeostasis. Sci Signal 5, ra42–ra42.

Royle, S.J., Bright, N.A., and Lagnado, L. (2005). Clathrin is required for the function of the mitotic spindle. Nature 434, 1152–1157.

Sancak, Y., Bar-Peled, L., Zoncu, R., Markhard, A.L., Nada, S., and Sabatini, D.M. (2010). Ragulator-Rag Complex Targets mTORC1 to the Lysosomal Surface and Is Necessary for Its Activation by Amino Acids. Cell 141, 290–303.

Sancak, Y., Peterson, T.R., Shaul, Y.D., Lindquist, R.A., Thoreen, C.C., Bar-Peled, L., and Sabatini, D.M. (2008). The Rag GTPases bind raptor and mediate amino acid signaling to mTORC1. Science 320, 1496–1501.

Sardiello, M., Palmieri, M., di Ronza, A., Medina, D.L., Valenza, M., Gennarino, V.A., Di Malta, C., Donaudy, F., Embrione, V., Polishchuk, R.S., et al. (2009). A Gene Network Regulating Lysosomal Biogenesis and Function. Science 325, 473–477.

Saucedo, L.J., Gao, X., Chiarelli, D.A., Li, L., Pan, D., and Edgar, B.A. (2003). Rheb promotes cell growth as a component of the insulin/TOR signalling network. Nat. Cell Biol. 5, 566–571.

Settembre, C., Di Malta, C., Polito, V.A., Garcia Arencibia, M., Vetrini, F., Erdin, S., Erdin, S.U., Huynh, T., Medina, D., Colella, P., et al. (2011). TFEB links autophagy to lysosomal biogenesis. Science 332, 1429–1433.

Settembre, C., Zoncu, R., Medina, D.L., Vetrini, F., Erdin, S., Erdin, S., Huynh, T., Ferron, M., Karsenty, G., Vellard, M.C., et al. (2012). A lysosome-to-nucleus signalling mechanism senses and regulates the lysosome via mTOR and TFEB. Embo J. 31, 1095–1108.

Shields, J.M., and Yang, V.W. (1997). Two Potent Nuclear Localization Signals in the Gut- enriched Krüppel-like Factor Define a Subfamily of Closely Related Krüppel Proteins. J. Biol. Chem. 272, 18504–18507.

Skalsky, Y.M., Ajuh, P.M., Parker, C., Lamond, A.I., Goodwin, G., and Cooper, C.S. (2001). PRCC, the commonest TFE3 fusion partner in papillary renal carcinoma is associated with pre- mRNA splicing factors. , Published Online: 17 January 2001; | Doi:10.1038/Sj.Onc.1204056 20, 178–187.

Stocker, H., Radimerski, T., Schindelholz, B., Wittwer, F., Belawat, P., Daram, P., Breuer, S., Thomas, G., and Hafen, E. (2003). Rheb is an essential regulator of S6K in controlling cell growth in Drosophila. Nat. Cell Biol. 5, 559–566.

Tsuda, M., Davis, I.J., Argani, P., Shukla, N., McGill, G.G., Nagai, M., Saito, T., Laé, M.,

124

Fisher, D.E., and Ladanyi, M. (2007). TFE3 fusions activate MET signaling by transcriptional up-regulation, defining another class of tumors as candidates for therapeutic MET inhibition. Cancer Res. 67, 919–929.

Tsun, Z.-Y., Bar-Peled, L., Chantranupong, L., Zoncu, R., Wang, T., Kim, C., Spooner, E., and Sabatini, D.M. (2013). The Folliculin Tumor Suppressor Is a GAP for the RagC/D GTPases That Signal Amino Acid Levels to mTORC1. Mol. Cell 52, 495–505.

Weterman, M.A., van Groningen, J.J., Tertoolen, L., and van Kessel, A.G. (2001a). Impairment of MAD2B-PRCC interaction in mitotic checkpoint defective t(X;1)-positive renal cell carcinomas. Proceedings of the National Academy of Sciences 98, 13808–13813.

Weterman, M.J., van Groningen, J.J., Jansen, A., and van Kessel, A.G. (2000). Nuclear localization and transactivating capacities of the papillary renal cell carcinoma-associated TFE3 and PRCC (fusion) proteins. Oncogene 19, 69–74.

Weterman, M.A.J., van Groningen, J.J.M., Hartog, den, A., and Geurts van Kessel, A. (2001b). Transformation capacities of the papillary renal cell carcinoma-associated PRCCTFE3 and TFE3PRCC fusion genes. Oncogene 20, 1414–1424.

Widlund, H.R., Horstmann, M.A., Price, E.R., Cui, J., Lessnick, S.L., Wu, M., He, X., and Fisher, D.E. (2002). Beta-catenin-induced melanoma growth requires the downstream target Microphthalmia-associated transcription factor. J. Cell Biol. 158, 1079–1087.

Yasumoto, K.-I., Takeda, K., Saito, H., Watanabe, K.-I., Takahashi, K., and Shibahara, S. (2002). Microphthalmia-associated transcription factor interacts with LEF-1, a mediator of Wnt signaling. Embo J. 21, 2703–2714.

Ybe, J.A., Fontaine, S.N., Stone, T., Nix, J., Lin, X., and Mishra, S. (2013). Nuclear localization of clathrin involves a labile helix outside the trimerization domain. FEBS Lett. 587, 142–149.

125

CHAPTER 4

THE CLATHRIN COAT ACCESSORY PROTEIN IRC6 FUNCTIONS THROUGH ITS

CONSERVED C TERMINAL DOMAIN

126

Abstract

Clathrin-coated vesicle formation mediates key protein trafficking events that occur at plasma membrane and between the trans-Golgi network and endosomes, and clathrin coat accessory factors are essential in facilitating this process. Yeast Irc6p is a clathrin coat accessory factor, involved in the vesicle transport between trans-Golgi network and early endosomes. The specific functionality of the N terminal G-protein like domain and the C terminal adaptin-binding domain, however, are not well understood, given an unexpected dominant negative role of the G- protein like domain discovered in the previous study. Here, I show that the dominant negative role observed previously is actually due to the genetic difference between strains. I further reveal that the adaptin-binding domain is important for Irc6p functioning as an accessory protein of clathrin coat. In particular, the last two amino acids of Irc6p are required for the functionality of

Irc6p and its interaction with the clathrin adaptor complex AP-1 and AP-2.

Introduction

Yeast Irc6p is a conserved G protein-like protein involved in clathrin-coated vesicle trafficking (Gorynia et al., 2012). Its mammalian orthologue p34 (AAGAB) was initially discovered to be an interaction partner of the adaptor complexes of clathrin coats, specifically the

⍺ subunit of AP-2 (⍺ adaptin) and γ subunit of AP-1 (γ adaptin), in a yeast two-hybrid screen

(Page et al., 1999). Mutation of p34 has been shown to be a direct cause of one type of

Palmoplantar keratodermas (PPKs) (Dinani et al., 2017; Giehl et al., 2016; Pohler et al., 2012).

Electron microscopy of the affected skin tissue showed abnormal vesicle aggregation near the plasma membrane and prominent distended Golgi apparatus, suggesting a defect in the clathrin- coated vesicle trafficking (Pohler et al., 2012). Irc6 is therefore considered as an accessory protein in clathrin-mediated membrane trafficking.

127

Irc6p is composed of N-terminal G protein-like domain and C-terminal adaptin-binding domain, and is a founding member of a conserved protein family encoded in the genomes of species from fungi, plants, fish, to mammals (Gorynia et al., 2012). The N-terminal G protein- like domain has a G-protein fold, but lacks GTPase activity and high binding affinity to GTP.

Nevertheless, a unique and conserved BC-YY motif (beta-strand connecting YY motif), located in the accessible loop connecting beta 2 and 3 strands, contributes to the functionality of Irc6p and p34, as the mutation of the two conserved tyrosines to alanines (YY-AA) interfered with clathrin-dependent chitin synthase Chs3p cycling between trans-Golgi network and early endosomes (Gorynia et al., 2012). Interestingly, the G protein-like domain seems to play a dominant negative role, as the cells expressing Irc6p G protein-like domain from the endogenous promoter displayed an even more severe phenotype than Irc6 full deletion.

The conserved C terminal adaptin-binding domain appears to play a more important role for Irc6p/p34, as over-expression of this domain alone is sufficient to rescue cycling defect of

Chs3p between trans-Golgi network and early endosomes caused by irc6 deletion (Gorynia et al.,

2012). Of note, the C terminal domain was originally named after the initial discovery of full- length p34 binding to ⍺ and γ adaptin (Page et al., 1999), without evidence that the C terminal region mediates this interaction. Later, C terminal region binding to AP-1 was confirmed in our group’s recent study (Gorynia et al., 2012), though the N terminal G-protein-like domain also interacted with AP-1 (Gorynia et al., 2012). Therefore, adaptin binding is not only limited to the

C terminus. Thus, further characterization of the N terminal G protein-like domain and C terminal adaptin-binding domain are necessary to dissect the mechanism of how Irc6 regulates clathrin-coated vesicle trafficking.

Here, I clarified that the dominant negative effect of G protein-like domain of Irc6p is

128

actually due to different selectable markers, and identified the key residues in the C terminal adaptin-binding domain for adaptin interaction, supporting the accessory role of Irc6p in clathrin-mediated membrane trafficking between trans-Gogi network and endosomes.

Results

Irc6p G protein-like domain does not appear to function dominant-negatively.

One finding from our lab’s previous study on Irc6p is that cells with the absence of the C terminal domain of Irc6p exhibit a more severe Chs3p cycling defect than cells with the complete deletion of Irc6p. As an approach to measure the Chs3p cycling defect, genomic mutations of IRC6 were generated in chs6Δ cells and tested for the sensitivity to calcofluor white, a dye that binds chitin on the cell wall and inhibits cell growth. Chs6p is a transporter of chitin synthase Chs3p. It forms a coat complex termed “exomer” with Chs5p and three paralogues of Chs6p, and is required for delivering Chs3p to the plasma membrane from trans-

Golgi network (Wang et al., 2006). Deletion of CHS6 therefore renders wild-type yeast cells resistant to the chitin-binding dye CCFW, as Chs3p is not transported to the cell surface, but keeps cycling between trans-Golgi network and early endosomes (Valdivia et al., 2002). The wild-type yeast cells cannot grow on media containing CCFW, whereas chs6Δ cells can survive.

However, if AP-1 subunit β1 was deleted in the chs6Δ cells, which blocks the AP-1 mediated trans-Golgi/early endosome cycling pathway, Chs3p was rerouted to the plasma membrane and rendered the cells sensitive to CCFW again (Valdivia et al., 2002).The rerouting mechanism is unclear, but this assay is powerful enough to enable us to investigate the trans-Golgi/early endosome cycling activity indirectly. Deletion of IRC6 in the chs6Δ cells also made the cells sensitive to CCFW, indicating that the trans-Golgi/early endosome cycling was hampered. In our previous study, chs6Δ irc6ΔC (which lacks the C terminal region) strain (GPY4993) was

129

even more sensitive to CCFW than chs6Δ irc6Δ (GPY4042), indicating a dominant negative role of the Irc6 N terminal domain. However, one genomic difference between the two test strains may potentially contribute to the different sensitivity to CCFW: the chs6Δ irc6Δ strain

(GPY4042) was constructed with selectable marker HIS3, while the chs6Δ irc6ΔC strain

(GPY4993) was constructed with selectable marker TRP1. To rule out the possibility of contribution from the different selectable markers, I constructed a new strain of chs6Δ irc6Δ with the TRP1 marker. The resulting chs6Δ irc6Δ::TRP1 strain exhibited much worse growth on

CCFW than chs6Δ irc6Δ::HIS3 (GPY4042), but similar to GPY4993 (Figure 4-1A, S1). This demonstrates that selectable marker difference does have an effect and challenges the basis for a dominant negative effect of the G protein-like domain of Irc6p.

As an alternative approach to test the effect of the TRP1 marker, I independently constructed a strain of irc6ΔC with no marker by introducing irc6ΔC PCR product into a haploid irc6Δ::URA3 strain and clones were selected by 5-fluororotic acid (5-FOA). irc6ΔC was then crossed with GPY4042 (chs6Δ irc6Δ) to obtain the chs6Δ irc6ΔC strains. As shown in Figure 4-

1B, multiple strains of chs6Δ irc6ΔC grew better than chs6Δ irc6ΔC::TRP1 on CCFW, and with similar sensitivity to CCFW as chs6Δ irc6Δ::HIS3.

Another line of evidence that suggests the G protein-like domain does not play a dominant negative role is that, over-expression of irc6ΔC::TRP1 by inserting a strong promoter

[glyceraldehyde- 3-phosphate dehydrogenase promoter (GPD)] into the endogenous locus of irc6ΔC did not further compromise the cell growth on CCFW. In contrast, it rescued the cell growth (Figure 4-1C).

130

Identification of mutations in Irc6p C terminal domain that affect C terminal function.

Since the deletion of the C terminal adaptin-binding domain displays almost the same phenotype as irc6 full deletion, I proposed that the adaptin-binding domain plays a prominent role in facilitating AP-1 function in cycling Chs3p. I aimed to screen for the key amino acids in this region responsible for Irc6 function. The screening strategy was to transform a library of low-copy plasmids harboring the IRC6 native promoter and IRC6 with mutations in the C terminal region into the chs6Δ irc6Δ strain, and to select cells that do not grow on CCFW plates.

The library plasmids were generated by error-prone PCR amplification of IRC6 C terminal region and co-transformation of yeast with a linearized low-copy backbone plasmid harboring

IRC6 native promoter and N terminal 3xFLAG tagged IRC6 lacking the C-terminal region. There are over-laps between the PCR products and the linearized plasmid DNA, so recombination will occur and new circular plasmids will form in the transformed yeast cells. The colonies that fail to grow on the CCFW will become candidates for mutations that hamper C terminal domain function (Illustrated in Figure 4-2A). Full length wild-type IRC6 and irc6ΔC (1-179) served as positive and negative control, respectively.

Out of about 2000 colonies, 40 showed suppressed growth on CCFW. After checking protein size by western blotting, plasmids of 9 colonies that maintained normal protein size and expression level were extracted and subjected to DNA sequencing. Figure 4-2B shows the distribution of mutations in all 9 mutants. The number of mutated amino acids in each mutant ranges from 2 to 5. Notably, there are 5 mutants missing the last two or last three amino acids, and one missing the last amino acid. Mutant #42 only misses the last two amino acids, while keeping the rest of the sequence intact. All the other mutations occur only once. The frequent mutations eliminating the last few amino acids strongly suggest an important role for these

131

amino acids in Irc6 function.

To evaluate the relative conservation of residues at each positions in the C terminal adaptin-binding domain, a Position-Specific Scoring Matrix (PSSM) of Irc6p adaptin-binding domain was generated from NCBI Conserved Domain Database (CDD) (Figure 4-2C). The

PSSM score is calculated as the log (base 2) of the observed substitution frequency at a given position divided by the expected substitution frequency at that position. Thus, a positive score

(ratio > 1) indicates that the observed frequency exceeds the expected frequency, suggesting that this substitution is surprisingly favored in the conserved domain. Among all the residues of adaptin-binding domain, W178 has the highest PSSM score (Figure 4-2C). Mutation of this amino acid was also captured in mutant #55 (W178R), although accompanied by one more mutation G165W, also at a conserved position. The crystal structure of Irc6p (PDB code: 3UC9) reveals that W178 forms a hydrogen bond with Y50 (Figure 4-S2B), which is the second tyrosine in the BC-YY motif. Since mutation of the BC-YY motif partially affects the function of Irc6p in

CCFW assay (Gorynia et al., 2012) and #55 actually expresses lower level of Irc6p compared to wild-type (Figure 4-S2A), W178R single mutant was generated to see whether W178R single mutation is sufficient to disable Irc6p. Y49A, Y50A single mutants were generated in parallel to test the relative importance of the tyrosines.

An L237A single mutant was also constructed to determine whether mutation of last amino acid is sufficient to ablate the function of Irc6p. Those mutants were transformed into both chs6Δ irc6Δ::HIS3 and chs6Δ irc6Δ::TRP1 strains, along with Mutant #42 that lacks the last two amino acids, and tested for growth on CCFW plates. Figure 4-2D shows that W178R or L237A single mutation is not sufficient to diminish Irc6p function. Neither does Y49A or Y50A single mutation (Figure 4-2E). YY-AA mutant only partially disables Irc6, compared to mutant #42

132

(Figure 4-2E). These results suggest that the last two amino acids play a critical role for Irc6p.

The last two amino acids of Irc6p are conserved and mediate interaction with AP-1 and AP-2 complexes.

Even though the last 2 amino acids of Irc6p were not included in the adaptin-binding domain which ends at E232 according to the Conserved Domain Database, the alignment with adaptin-binding domain family members showed great conservation of the last leucine residue

(Figure 4-3A). The PSIPRED secondary structure prediction predicts the second last residue may form ⍺ helix with the upstream 17 amino acids (Figure 4-3B), indicating this tyrosine is incorporated into the adaptin-binding domain.

To test whether the absence of the last two amino acids affects Irc6p binding to AP-1 or

AP-2 complexes, co-immunoprecipitation assays were performed from mutant #42 as well as

YY-AA mutant expressing cell lysates. Interestingly, mutant #42 completely abolished Irc6p’s interaction with AP-1 and AP-2 (Figure 4-3C, compare #42 with GPY4042). YY-AA does not appear to affect interaction with AP-1, but it reduced interaction with AP-2 (Figure 4-3C).

GTPase Ypt31p interaction with Irc6p was reported in our lab’s previous study, but it was not robustly detected in my co-immunoprecipitation (Figure 4-3C), likely due to the requirement of the GTP-bound form of Ypt31p (Gorynia et al., 2012).

Ontology analysis of IRC6 negative genetic interactions reveal major function in Golgi vesicle transport.

In addition to the functionality in vesicle traffic, IRC6 has been found involved in maintenance of genomic integrity, since deletion of IRC6 increased DNA recombination centers

(Alvaro et al., 2007). To gain an insight of the general functionality of IRC6 from an unbiased

133

respect, I took advantage of the available negative genetic interaction data set of IRC6 from a large scale unbiased yeast genetic interaction study (Costanzo et al., 2016)(Gene list in Table 4-

S1). Negative genetic interaction is defined by having a more severe growth defect when two single gene deletions with trivial effects are combined. It usually connects functionally related genes (Costanzo et al., 2016). Gene ontology analysis of the negative interaction genes reveals that ER to Golgi vesicle-mediated transport is most significantly enriched (Figure 4-S3A).

Moreover, top three GO terms are all vesicle-transport related. Similar results were observed when the negative interaction genes found in all the other studies were included (Figure 4-S3B).

This result suggests that IRC6 is more involved in vesicle transport as a multifunctional protein.

Interestingly, BET3, BET5, and TRS23, three subunits of TRAPP I/II complex, came up in the negative genetic interaction list. Recently, TRAPP II complex has been shown as a bona fide G- protein exchange factor for Ypt31/32 (Pinar et al., 2015; Thomas and Fromme, 2016). This result is in line with the negative genetic interaction between YPT31/32 and IRC6, found in our previous study (Gorynia et al., 2012). In sum, the major function of IRC6 is involved in vesicle transport.

Discussion

Irc6p belongs to a new protein family, with a G-protein-like domain in the N terminus and a conserved adaptin-binding domain in the C terminus. Multiple studies have shown mutations of the mammalian orthologue AAGAB (p34) underlie autosomal dominant punctate palmoplantar keratoderma, and ultrastructural analysis of the lesional plantar skin reveals abnormal vesicle aggregation and Golgi apparatus dilatation, consistent with a role of Irc6p in clathrin-mediated membrane trafficking (Pohler et al., 2012). In our study, both the G-protein- like domain and the adaptin-binding domain play a role in the functionality of Irc6p. Over-

134

expression of the G-protein-like domain alone rescued the cell growth defect of chs6Δ irc6Δ on

CCFW plates. Deletion of the adaptin-binding domain completely abolished the functionality of

Irc6p, evidenced by the same growth defect in the chs6Δ background cells on CCFW plates. By screening for mutations in adaptin-binding domain, I discovered that the last two amino acids of

Irc6p are essential for AP-1 and AP-2 complex binding.

It is surprising that lack of the last two amino acids almost completely abolished the interaction between Irc6 and AP-1/AP-2 (Figure 4-3C), given that the G-protein-like domain also interacts with AP-1/AP-2 revealed in our previous study (Gorynia et al., 2012). Two scenarios can explain this. Either the adaptin-binding domain mediates the major binding to AP-1/AP-2, or both domains cooperatively binds to AP-1/AP-2. Lack of either domain can significantly affect the binding affinity in the second scenario. The second scenario seems more likely, since

YY-AA mutation in the G-protein-like domain can affect Irc6 binding to AP-2. One way to test which scenario is true is to determine the binding affinity of G-protein-like domain, adaptin- binding domain, and full-length Irc6 binding to AP-1/AP-2, respectively. An alternative but indirect way is to express the adaptin-binding domain at the endogenous level, and see whether it can rescue the growth of chs6Δ irc6Δ cells on CCFW plates. Our previous study has shown that overexpression of the adaptin-binding domain alone is sufficient to rescue the growth of chs6Δ irc6Δ cells on CCFW plates (Gorynia et al., 2012). If the second scenario is true, endogenous level of adaptin-binding domain should not be sufficient to rescue the cell growth.

The BC-YY motif in the N terminal domain is also important for the function of Irc6p, as mutation of this motif renders the chs6Δ background cells more sensitive to CCFW (Figure 4-

2E) (Gorynia et al., 2012). However, YY-AA mutant Irc6p still robustly interact with AP-1.

Therefore, BC-YY motif may engage in other protein interactions that are involved in trans-

135

Golgi network-early endosome traffic. Ypt31p is a candidate to test. On the other hand, YY-AA mutant Irc6p shows reduced binding to AP-2 complex (Figure 4-3C), raising the question whether inhibition of the endocytosis pathway can also make the yeast cell sensitive to CCFW. It was proposed that cell surface Chs3p may be endocytosed via AP-2 dependent pathway, as a potential (D/E)XXXLL motif that can be recognized by AP-2 is present in the N terminal end of

Chs3p (Weiskoff and Fromme, 2014). It is possible that deletion of AP-2 together with AP-1 will allow more secreted Chs3p to accumulate on the plasma membrane. Thus, Irc6 may act through both AP-1 and AP-2 pathways.

The mechanism how Irc6p facilitates clathrin-mediated membrane trafficking between trans-Golgi and early endosome is still elusive. From our lab’s previous study, Irc6p functions as linking Ypt31p to AP-1. However, Irc6p and p34 are cytosolic and do not show punctate distribution (Gorynia et al., 2012; Pohler et al., 2012), in contrast to the distinct punctate appearance of AP-1 and Ypt31p. It is likely that Irc6 only interacts with cytosolic AP-1/AP-2, and as soon as AP-1 is recruited to the membrane, Irc6p is dissociated from both Ypt31p and

AP-1. GTP-bound Ypt31p was recently shown to stimulate the GTP exchange activity of Sec7p to further activate Arf1p (McDonold and Fromme, 2014), which subsequently recruits and activates AP-1 (Ren et al., 2013) on trans-Golgi network. One possibility is that GTP-bound

Ypt31p simultaneously facilitates recruitment of AP-1 via Irc6p.

The interaction of Irc6p with solely cytosolic AP-1/AP-2 can be explained by two possible but not mutually exclusive scenarios. One is that Irc6p competes with membrane for binding to the lipid-binding domain of ⍺ and ɣ adaptins. It has been shown that p34 interacts with N terminal domain of ⍺ and ɣ adaptins (Page et al., 1999), where the lipid-binding domain is located. The other scenario is the conformational change that occurs in AP-1 and AP-2

136

complexes upon their recruitment to the membrane (Jackson et al., 2010; Ren et al., 2013). It is likely that Irc6p interacts only with the cytosolic conformation of ⍺ and ɣ adaptins. To distinguish between these scenarios, further mapping of the interaction sites within the N terminal domain of ⍺ and ɣ adaptins will be necessary.

Materials and Methods

Plasmids and yeast strains

For construction of pRS315-3XFLAG-IRC6 mutants library, IRC6 coding sequence with

500bp flanking genomic sequences were amplified from GPY404.2 (Mating type a of SEY6210) genomic DNA with primers containing SpeI and XmaI sites. 3XFLAG tagged was inserted into the N terminus of IRC6 by recombination PCR. The PCR product was then cut by SpeI and

XmaI endonuclease and ligated into pRS315 plasmid (Sikorski and Hieter, 1989). The resulting

IRC6 containing plasmid retains ~300 bp promoter sequence, due to an endogenous genomic

SpeI site within the 500 bp IRC6 promoter sequence. IRC6 was used as a template for error- prone PCR. Error-prone PCR mix recipe: 1x standard buffer (NEB), dATP/dGTP 0.2 mM

(each), dCTP 1 mM, dTTP 1 mM, Irc6 435F primer 2 µM, Irc6 814R primer 2 µM, template

DNA (Irc6 -300bp to 500bp) 4.9 ng, MnCl2 0.5 mM, MgCl2 7 mM, Taq 5U in 100 µL system.

PCR program: 94℃ 1 min, 12 cycles of 94℃ 1 min, 54 ℃ 1 min, 72 ℃ 3 min. The PCR product was purified and co-tranformed into the chs6Δ irc6Δ::TRP1 with pRS315-IRC6 (digested with

BamH I (at ~aa169)and Nde I (at ~30bp downstream of Irc6 stop codon). 20 µl PCR product

(~10ng/µl) + 100 ng purified cut pRS315-3FLAG-IRC6 were co-transformed into 12.5 OD600 chs6Δ irc6Δ::TRP1 (GPY4042), with pRS315-3FLAG-IRC6 as a positive control, and pRS315-

IRC6 (1-179) as a negative control. After 2-3 days of incubation at 30 degrees, cells were made replica to YPD +200ug/ml CCFW and YPD sequentially. One day later, the colonies that grew

137

less well on CCFW were picked for further validation on CCFW and YPD. About 2000 colonies were screened. Colonies were also subjected to immuno-blot against FLAG tag to check the mutant Irc6 protein size and expression level. The confirmed colonies were extracted for plasmids and sent to Sanger sequencing. The purified plasmids were transformed into chs6Δ irc6ΔC::TRP1 again to confirm the sensitivity to CCFW.

For introduction of GPD promoter into the endogenous irc6ΔC locus, GPD promoter with KAN marker was amplified from pYM-N14 plasmid (Janke et al., 2004). The resulting products were transformed into GPY4993 (chs6Δ irc6ΔC::TRP1) and clones were selected on

G418 plates. For construction of chs6Δ irc6Δ::TRP1, deletion cassette was amplified from pFA6a-TRP1(Longtine et al., 1998) and products were transformed into SEY6210 diploid cells.

TRP1+ colonies were sporulated and dissected for haploid cells. Haploids were crossed with

GPY3102 (chs6Δ) / GPY4042 (chs6Δ irc6Δ::HIS3) and to get chs6Δ irc6Δ::TRP1 haploids.

MAT⍺ haploids chs6Δ irc6Δ::TRP1 from both GPY3102 and GPY4042 parental strains (same mating type as GPY4993, 4042) were collected and subjected to CCFW assay. For construction of irc6ΔC with no marker, irc6ΔC (aa 1–179) were amplified and co-transformed with pRS315(LEU) into irc6Δ::URA3 (GPY4986), clones were selected on SD-LEU and were replicated onto 5-FOA to select for irc6ΔC cells. Clones were crossed to GPY4042 to obtain chs6Δ irc6ΔC cells.

Yeast media, growth assays

Strains were grown in YPD (1% Bacto yeast extract [Difco, Detroit, MI], 2% Bacto peptone [Difco], 2% dextrose plus additional 20 µg/mL Alanine, 20 µg/mL Uracil, and 224

µg/mL Tryptophan) or SD media (0.67% yeast nitrogen base without amino acids [Difco], 2% dextrose) with the appropriate supplements. CCFW (Sigma-Aldrich, St. Louis, MO) was added

138

to YPD agar plates at the concentrations indicated in the figures. For growth tests, cells were diluted to 1 x 106 cells/ml and then serially diluted 5-fold before dilutions were spotted onto appropriate agar plates.

Protein expression immunoblotting and Co-immunoprecipitation

Yeast cells that transformed with pRS315-3xFLAG-IRC6 WT/mutants were cultured in

SD-LEU liquid media, and 1x107 cells were harvested at exponential growth stage and were spun down, and resuspended with 50 µl 1x SDS sample buffer and boiled for 5 minutes. After

20000g centrifugation for 20 minutes, 5 µl supernatant were loaded onto SDS-PAGE gel for immuno-blotting. Yeast cells carrying pRS315-3xFLAG-IRC6 WT/mutants were inoculated from SD-LEU into YPD and cultured for about 12 cell-cycles at 30 degrees. 50 OD600 cells were harvested at exponential growth stage and were converted to spheroplasts by DTT and subsequent Zymoliase treatment. Spheroplasts were resuspended in binding buffer (20 mM

HEPES, pH 7.0, 125 mM KAc, 0.4M Sorbitol, 0.1% Triton-X 100, 2 mM MgCl2, 0.5 mM

EGTA, Proteinase Inhibitor Cocktail (Sigma-Aldrich), 0.5 mM PMSF, 0.5 mM DTT) and the membrane was disrupted by vortexing with glass beads. Final volume was brought to 500 µL by adding binding buffer, and collected subjected to 20000g 30 minutes centrifugation at 4 degrees.

380 µL supernatant was collected and incubated with 20 µL packed ANTI-FLAG M2 beads

(Sigma-Aldrich) for 2 hours at 4 degrees. Beads were washed four times with binding buffer, once with elution buffer (250 µg/mL 3xFLAG peptides, 20 mM HEPES pH 7.0, 0.1% Triton-X

100, 0.5 mM PMSF, 125 mM NaCl) without FLAG peptides. Bound proteins were eluted with

20 µL Elution buffer 3 times. Eluates were pooled and mixed with 12 µL 5x SDS sample buffer and boiled for 4 minutes. 15 µL was loaded for immuno blotting analysis. Following antibodies were used: FLAG (Sigma F3165). Irc6p (house made) AP-1 subunit β1 (house made) AP-2

139

subunit β2 (house made?) Ypt31 (from which lab?).

Ontology analysis

Negative genetic interactions were downloaded from www.thebiogrid.org. The majority of the hits were from (Costanzo et al., 2016) and the rest were from (Gorynia et al., 2012;

McClellan et al., 2007; Sharifpoor et al., 2012; Tong et al., 2004; van Pel et al., 2013). Gene lists were submitted to http://pantherdb.org/, with all genes of Saccharomyces cerevisiae in the database as a reference list. PANTHER overrepresentation test (released 20170413) was carried out with the GO Ontology database (released 2017-05-25) (Mi et al., 2010; Thomas et al., 2003).

140

Figure 4-1 G protein-like domain of Irc6p does not appear to function dominant-

negatively.

A) chs6Δ irc6Δ::TRP1 exhibits worse cell growth than chs6Δ irc6Δ::HIS3 on CCFW, but similar to chs6Δ irc6ΔC::TRP1. Left panel is control growth on normal YPD media. Also see Figure 4-

S1A. B) chs6Δ irc6ΔC (generated by irc6Δ::URA negative selection) exhibits better growth on

CCFW than chs6Δ irc6ΔC::TRP1. C) Over-expression of irc6ΔC rescues cell growth defect of chs6Δ irc6ΔC on CCFW. A GPD promoter cassette was introduced directly into the promoter region of IRC6 in GPY4993. 3 strains were tested, and whole cell lysates western blot against

Irc6p were shown. Note that only #14 and #15 successfully over-expressed irc6ΔC, and exhibited CCFW resistance.

141

A YPD YPD + 200 μg/mL CCFW

chs6Δ irc6Δ::HIS3 (GPY4042) chs6Δ irc6ΔC::TRP1 (GPY4993) chs6Δ irc6Δ::TRP1

B YPD YPD + 100 μg/mL CCFW chs6Δ chs6Δ β1Δ chs6Δ irc6Δ::HIS3 chs6Δ irc6ΔC::TRP1 chs6Δ irc6ΔC 1Dα chs6Δ irc6ΔC 2Aa chs6Δ irc6ΔC 3Dα

C SD YPD + CCFW 100 μg/mL

chs6Δ chs6Δ β1Δ chs6Δ irc6Δ chs6Δ GPD: Irc6ΔC #14 chs6Δ GPD: Irc6ΔC #15 chs6Δ GPD: Irc6ΔC #16 Δ

#15 #16 #14 chs6

Irc6 Irc6ΔC

142

Figure 4-2 Screening and testing for mutations that affects the functionality of Irc6 C

terminal region.

A) A schematic illustration of the screening strategy. GPY4042 was transformed with cleaved pRS315-3FLAG-IRC6 and Error-prone PCR products of IRC6 C terminal region. After selection on SD-LEU plates, the sensitivity to CCFW of each clone was further tested on CCFW plates by replica plating. B) Mutation distribution of each mutant from the screen aligned with wild-type

Irc6 C terminal sequence. Left column is the ID number of each mutant. C) PSSM view of Irc6p from aa162-181. First line is the amino acid frequency at each position for adaptin-binding domain. Second line is the consensus sequence of adaptin-binding domain. Last line shows the sequence of Irc6p. W178 has the highest PSSM score. The adaptin-binding domain after aa181 was not shown. D) CCFW assay for GPY4042 expressing #42, W178R, and L237A mutants of

Irc6p. #42 loses CCFW resistance, while W178R and L237A appear normal. Pictures were taken for both day1 and day2 on CCFW. E) CCFW assay for GPY4042 expressing YY-AA, Y49A,

Y50A mutants of Irc6p (Upper and middle panel). YY-AA shows reduced resistance to CCFW, whereas Y49A and Y50A appears normal. By immuno-blotting, #42 and other mutants were expressed at similar level with wild-type Irc6p, except for #9 mutant (Lower panel).

143

A

B

C

D YPD + CCFW 200 μg/mL YPD

- + L237A W178R #42 - + L237A W178R #42

chs6Δirc6Δ::TRP1 day1 chs6Δirc6Δ::HIS3

chs6Δirc6Δ::TRP1 1/10 cell concentration day2 chs6Δirc6Δ::HIS3

144

E YPD YPD + CCFW 200 μg/mL day1 day1 day2 WT 1-179 #42 YY to AA Y49A Y50A

YPD + CCFW 200 μg/mL

#42 1-179

YY to AA WT

Y49A Y50A

WT #9 #42 YYAA Y49A Y50A

3xFLAG-Irc6p

145

Figure 4-3 The last two amino acids of Irc6p are responsible for binding to AP-1 and AP-2.

A) Alignment of C terminal region of Irc6p with other adaptin-binding-domain containing proteins in a wide variety of species. The yellow arrow indicates the range of adaptin-binding domain determined by Conserved Domain Database. The last two amino acids of Irc6p were outlined in a red box. B) PSIPRED prediction of the secondary structure of Irc6p C terminal region. The second last tyrosine residue appears to be integrated in the ⍺-helix. C) Co- immunoprecipitation of FLAG-Irc6p #42/YYAA/WT from transformed GPY4042 yeast lysates, with non-transformed GPY4042 as a negative control. Membrane were blotted with anti-β1, β2,

Ypt31, and FLAG.

146

A

Adaptin_binding L237 (yeast)

B C WT #42 YY 4042

β1

β2

Ypt31

FLAG

147

Figure 4-S1 chs6Δ irc6Δ::TRP1 exhibits worse cell growth than chs6Δ irc6Δ::HIS3 on

CCFW.

Additional strains of chs6Δ irc6Δ::TRP1 on CCFW plates compared to chs6Δ irc6ΔC::TRP1

(4993) and chs6Δ irc6Δ::HIS3 (4042).

148

YPD YPD + 200 μg/mL CCFW chs6Δ irc6Δ::HIS3 MATα (GPY4042) chs6Δ irc6ΔC::TRP1 MATα (GPY4993)

chs6Δ irc6Δ::TRP1 MATα #6 chs6Δ irc6Δ::TRP1 MATα #7

chs6Δ irc6Δ::TRP1 MATα #9

149

Figure 4-S2 Immuno-blots of 3xFLAG-Irc6p and Irc6p structure, related to Figure 4-2.

A) Immuno-blotting of 3XFLAG-Irc6p mutants from the screening. Most mutants either completely lost expression or were truncated at the C terminal end. B) Close-up view of the interaction between Y50 and W178. Distance is annotated in Å. PDB code: 3UC9.

150

A 1 7 9 10 11 16 17 18 19 20 21 23 24 26 27 28 29 30

32 3xFLAG-Irc6 25 22

17

32 33 35 36 39 41 42 44 45 46 47 48 49 50 51 52 53 55 WT

3xFLAG-Irc6

B

151

Figure 4-S3 Gene ontology analysis of the negative genetic interaction partners of IRC6.

Gene ontology analysis of the negative genetic interactions solely detected in (Costanzo et al.,

2016) and pooled from all studies are shown in (A) and (B), respectively. Enriched terms of biological processes are displayed with corrected Bonferroni-corrected p value.

152

A ER to Golgi vesicle-mediated transport (GO:0006888) Golgi vesicle transport (GO:0048193) vesicle-mediated transport (GO:0016192) cellular macromolecule biosynthetic process (GO:0034645) macromolecule biosynthetic process (GO:0009059) cellular process (GO:0009987) intra-Golgi vesicle-mediated transport (GO:0006891) beta-glucan biosynthetic process (GO:0051274) 'de novo' protein folding (GO:0006458) beta-glucan metabolic process (GO:0051273) cellular localization (GO:0051641)

0 -1 -2 -3 -4 -5 -6 10 10 10 10 10 10 10 Bonferroni-corrected p value

B Golgi vesicle transport (GO:0048193) vesicle-mediated transport (GO:0016192) ER to Golgi vesicle-mediated transport (GO:0006888) 'de novo' protein folding (GO:0006458) establishment of localization in cell (GO:0051649) cellular localization (GO:0051641) intracellular transport (GO:0046907) cellular process (GO:0009987) autophagy (GO:0006914) cellular component organization (GO:0016043) organelle organization (GO:0006996) cellular response to stimulus (GO:0051716) cellular macromolecule biosynthetic process (GO:0034645) macromolecule biosynthetic process (GO:0009059) ncRNA transcription (GO:0098781) cellular component organization or biogenesis (GO:0071840) regulation of cellular component organization (GO:0051128) cytoskeleton organization (GO:0007010) intra-Golgi vesicle-mediated transport (GO:0006891) transcription from RNA polymerase I promoter (GO:0006360) response to stimulus (GO:0050896)

0 -1 -2 -3 -4 -5 -6 -7 10 10 10 10 10 10 10 10 Bonferroni-corrected p value

153

Table 4-S1 List of IRC6 negative genetic interaction genes from (Costanzo et al., 2016)

other studies used for gene ontology analysis. (Spreadsheet appended)

Table 4-S2 Strains used in this study. (Spreadsheet appended)

154

References

Alvaro, D., Lisby, M., and Rothstein, R. (2007). Genome-Wide Analysis of Rad52 Foci Reveals Diverse Mechanisms Impacting Recombination. PLoS Genetics 3, e228.

Costanzo, M., VanderSluis, B., Koch, E.N., Baryshnikova, A., Pons, C., Tan, G., Wang, W., Usaj, M., Hanchard, J., Lee, S.D., et al. (2016). A global genetic interaction network maps a wiring diagram of cellular function. Science 353, aaf1420–aaf1420.

Dinani, N., Ali, M., Liu, L., McGrath, J., and Mellerio, J. (2017). Mutations in AAGAB underlie autosomal dominant punctate palmoplantar keratoderma. Clin. Exp. Dermatol. 42, 316–319.

Giehl, K.A., Herzinger, T., Wolff, H., Sárdy, M., Braunmühl, von, T., Dekeuleneer, V., Sznajer, Y., Tennstedt, D., Boes, P., Rapprich, S., et al. (2016). Eight Novel Mutations Confirm the Role of AAGAB in Punctate Palmoplantar Keratoderma Type 1 (Buschke-Fischer-Brauer) and Show Broad Phenotypic Variability. Acta Derm. Venereol. 96, 468–472.

Gorynia, S., Lorenz, T.C., Costaguta, G., Daboussi, L., Cascio, D., and Payne, G.S. (2012). Yeast Irc6p is a novel type of conserved clathrin coat accessory factor related to small G proteins. Mol. Biol. Cell 23, 4416–4429.

Jackson, L.P., Kelly, B.T., McCoy, A.J., Gaffry, T., James, L.C., Collins, B.M., Höning, S., Evans, P.R., and Owen, D.J. (2010). A large-scale conformational change couples membrane recruitment to cargo binding in the AP2 clathrin adaptor complex. Cell 141, 1220–1229.

Janke, C., Magiera, M.M., Rathfelder, N., Taxis, C., Reber, S., Maekawa, H., Moreno-Borchart, A., Doenges, G., Schwob, E., Schiebel, E., et al. (2004). A versatile toolbox for PCR-based tagging of yeast genes: new fluorescent proteins, more markers and promoter substitution cassettes. Yeast 21, 947–962.

Longtine, M.S., McKenzie, A., Demarini, D.J., Shah, N.G., Wach, A., Brachat, A., Philippsen, P., and Pringle, J.R. (1998). Additional modules for versatile and economical PCR-based gene deletion and modification in Saccharomyces cerevisiae. Yeast 14, 953–961.

McClellan, A.J., Xia, Y., Deutschbauer, A.M., Davis, R.W., Gerstein, M., and Frydman, J. (2007). Diverse cellular functions of the Hsp90 molecular chaperone uncovered using systems approaches. Cell 131, 121–135.

McDonold, C.M., and Fromme, J.C. (2014). Four GTPases differentially regulate the Sec7 Arf- GEF to direct traffic at the trans-golgi network. Dev. Cell 30, 759–767.

Mi, H., Dong, Q., Muruganujan, A., Gaudet, P., Lewis, S., and Thomas, P.D. (2010). PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38, D204–D210.

Page, L.J., Sowerby, P.J., Lui, W.W., and Robinson, M.S. (1999). Gamma-synergin: an EH domain-containing protein that interacts with gamma-adaptin. J. Cell Biol. 146, 993–1004.

155

Pinar, M., Arst, H.N., Pantazopoulou, A., Tagua, V.G., de los Ríos, V., Rodríguez-Salarichs, J., Díaz, J.F., and Peñalva, M.A. (2015). TRAPPII regulates exocytic Golgi exit by mediating nucleotide exchange on the Ypt31 ortholog RabERAB11. Proc. Natl. Acad. Sci. U.S.a. 112, 4346–4351.

Pohler, E., Mamai, O., Hirst, J., Zamiri, M., Horn, H., Nomura, T., Irvine, A.D., Moran, B., Wilson, N.J., Smith, F.J.D., et al. (2012). Haploinsufficiency for AAGAB causes clinically heterogeneous forms of punctate palmoplantar keratoderma. Nat. Genet. 44, 1272–1276.

Ren, X., Farías, G.G., Canagarajah, B.J., Bonifacino, J.S., and Hurley, J.H. (2013). Structural basis for recruitment and activation of the AP-1 clathrin adaptor complex by Arf1. Cell 152, 755–767.

Sharifpoor, S., van Dyk, D., Costanzo, M., Baryshnikova, A., Friesen, H., Douglas, A.C., Youn, J.-Y., VanderSluis, B., Myers, C.L., Papp, B., et al. (2012). Functional wiring of the yeast kinome revealed by global analysis of genetic network motifs. Genome Res. 22, 791–801.

Sikorski, R.S., and Hieter, P. (1989). A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122, 19–27.

Thomas, L.L., and Fromme, J.C. (2016). GTPase cross talk regulates TRAPPII activation of Rab11 homologues during vesicle biogenesis. J. Cell Biol. 215, 499–513.

Thomas, P.D., Campbell, M.J., Kejariwal, A., Mi, H., Karlak, B., Daverman, R., Diemer, K., Muruganujan, A., and Narechania, A. (2003). PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141.

Tong, A.H.Y., Lesage, G., Bader, G.D., Ding, H., Xu, H., Xin, X., Young, J., Berriz, G.F., Brost, R.L., Chang, M., et al. (2004). Global mapping of the yeast genetic interaction network. Science 303, 808–813.

Valdivia, R.H., Baggott, D., Chuang, J.S., and Schekman, R.W. (2002). The yeast clathrin adaptor protein complex 1 is required for the efficient retention of a subset of late Golgi membrane proteins. Dev. Cell 2, 283–294. van Pel, D.M., Stirling, P.C., Minaker, S.W., Sipahimalani, P., and Hieter, P. (2013). Saccharomyces cerevisiae genetics predicts candidate therapeutic genetic interactions at the mammalian replication fork. G3 (Bethesda) 3, 273–282.

Wang, C.-W., Hamamoto, S., Orci, L., and Schekman, R. (2006). Exomer: A coat complex for transport of select membrane proteins from the trans-Golgi network to the plasma membrane in yeast. J. Cell Biol. 174, 973–983.

Weiskoff, A.M., and Fromme, J.C. (2014). Distinct N-terminal regions of the exomer secretory vesicle cargo Chs3 regulate its trafficking itinerary. Front Cell Dev Biol 2, 47.

156

CHAPTER 5

CONCLUSIONS AND FUTURE DIRECTIONS

157

The work presented in this dissertation reviewed the current understanding of mechanisms of somatic cell reprogramming in Chapter 1, and investigated the role of Klf4 in transcriptional control of reprogramming by focusing on the Klf4 transactivation domain in

Chapter 2. I identified CBP/p300 and Mediator complex as functional interaction partners of

Klf4 in reprogramming. Chapter 3 focused on the transactivation activity of clathrin heavy chain

(CHC), a non-canonical transcription regulator that interacts with Klf4. Chapter 4 further extends the functionality of CHC in membrane trafficking by investigating the functional domains of

Irc6p, a CHC accessory protein. The main focus of my thesis research has been on direct mediators of Klf4 transcriptional activation in somatic cell reprogramming. In sum, we provided mechanistic insights into the transcriptional control elicited by Klf4, yet the functional significance of the interactions with CBP/p300 and Mediator in gene regulation remains to be further elucidated.

Functional significance of CBP/p300 in H3K27ac domain establishment

CBP/p300 are required for pluripotency maintenance (Fang et al., 2014), and are major contributors to acetylation of H3K27 in vivo (Jin et al., 2011; Kasper et al., 2014). Enhancers with acetylated H3K27 correlate with active transcription of associated genes; therefore,

H3K27ac distinguishes active enhancer states from poised or disengaged states (Creyghton et al.,

2010; Rada-Iglesias et al., 2010; Zentner et al., 2011). Since active enhancers are associated with cell-type specific gene expression (Heinz et al., 2015), activation of pluripotency gene enhancers during reprogramming should be critical for pluripotency establishment. Therefore, recruitment of CBP/p300 to these enhancers sites is predicted to be the prerequisite for enhancer activation.

There is some evidence that potential recruitment of CBP/p300 to genes/enhancers targeted by reprogramming factors is beneficial for pluripotency establishment. Fusion of VP16 TADc,

158

which interacts with CBP/p300, to Oct4 and Sox2 enhanced reprogramming efficiency (Wang et al., 2011). Klf4 may therefore bring CBP/p300 to the loci of Oct4 and Sox2 to facilitate reprogramming. More genome-wide localization analysis of CBP/p300, H3K27ac, and Klf4 in reprogramming cells is required to corroborate the model that Klf4 recruits CBP/p300 for enhancer activation.

Structural role of Mediator in 3D organization of genome

Similar to CBP/p300, Mediator is also required for pluripotency maintenance (Kagey et al., 2010). Several studies have suggested an architectural role for Mediator complex in 3D organization of genomes together with cohesin in ES cells (Kagey et al., 2010; Phillips-Cremins et al., 2013). Most of the ES-specific cohesin binding sites are also bound by Mediator, and these sites mediate the enhancer-promoter loops at the genes that tend to be highly expressed in ES cells (Phillips-Cremins et al., 2013). Also, knocking down either Smc1 or Med12 (subunit of cohesin and Mediator, respectively) markedly down-regulated these highly-expressed genes, compared to all genes found in ES-specific 3D interactions (Phillips-Cremins et al., 2013), suggesting that Mediator/cohesion-mediated loops are functionally required for the expression of these highly-expressed genes in ES cells. I wondered whether Oct4, Sox2, and Nanog, the fundamental transcription factors for pluripotency maintenance, shape ES chromatin looping interactions. However, evidence does not seem to support this notion, as Oct4, Sox2, and Nanog were not significantly enriched in overall chromosomal looping interactions in ES cells (Phillips-

Cremins et al., 2013). Interestingly, some studies suggested that Klf4 may play a role in mediating the long-range chromosomal interactions, supported by evidence that Klf4 interacts with cohesin (Wei et al., 2013), and evidence from Hi-C, ChIP-seq, and ES cell imaging experiments (Stevens et al., 2017; Wei et al., 2013). It remains unclear how cohesin and

159

Mediator are recruited to these ES-specific genomic interaction sites, but a reasonable speculation would be that they are recruited by transcription factors that can bind these genomic regions. Klf4 has the properties expected of being one of these transcription factors. Knocking down of Klf4 in ES cells followed by chromatin conformation capture (3C)-based genome-wide sequencing would comprehensively test this possibility.

Recently, super-enhancers gained attention, as they appear to define cell identity as cis- regulatory elements (Hnisz et al., 2013; Whyte et al., 2013). Super-enhancers are characterized by dense clusters of enhancers with high occupation by Mediator complex (Whyte et al., 2013;

Hnisz et al., 2013). Again, the occupancy of Mediator at super-enhancers should be stabilized by some mechanism. The enrichment of Oct4, Sox2, and Nanog at super-enhancers did not show a difference from their enrichment at constituent enhancers, whereas Klf4 and another reprogramming factor Esrrb, which also interacts with Mediator (van den Berg et al., 2010), are both preferentially enriched at super-enhancers rather than constituent enhancers. Klf4 and Esrrb are thus likely to play a role as recruiters and stabilizers of Mediator.

Interwoven relationships of multiple factors in enhancer-promoter interaction and

transcriptional activation

In addition to CBP/p300 and Mediator, emerging evidence suggests that enhancer RNA

(eRNA), which is transcribed by Pol II at enhancer regions, plays an important role in chromatin looping and associated gene activation (Li et al., 2013; Sanyal et al., 2012). Knocking down of eRNA by LNA or siRNA resulted in a loss of enhancer-promoter interactions and a decrease of cohesin recruitment to enhancers (Li et al., 2013). Notably, eRNA also seems to play a role in histone acetylation, as a recent study shows that eRNA interacts with CBP, and this interaction stimulates the HAT activity of CBP, allowing H3K27 acetylation and gene activation (Bose et

160

al., 2017). This appears to provide a plausible explanation for why CBP/p300 binding at enhancers does not correlate well with associated gene expression and H3K27ac (Creyghton et al., 2010; Kasper et al., 2014; Kim et al., 2010). It remains to be answered what is the upstream event that results in the transcription of eRNA, but it’s likely that transcription factors at enhancers drive transcription of eRNA, possibly via transcription coactivators such as Mediator and TAFs. Notably, TAFs were also detected to preferentially bind to wild-type Klf4 TAD in our mass spectrometry experiment (Table 2-S2). It seems that certain transcription factors,

CBP/p300, Mediator, eRNA, and cohesin may all be necessary for enhancer-promoter looping, yet how they communicate with each other, what is the mechanism of establishing this enhancer- promoter physical contact, and how this physical contact facilitates transcriptional activation is far from clear.

Concluding remarks

Gene regulation is fundamental to a cell, the basic structural and functional unit of all living organisms. In the work presented in this dissertation, I approached understanding gene regulation from the perspective of somatic cell reprogramming, focusing on the reprogramming factor Klf4. Transcription factors direct the transcription of genes that specify the functionality of a particular cell, and over-expression of certain “master” transcription factors is sufficient to convert cell identity. The different mode of transcriptional control elicited by Klf4 compared to

Oct4 and Sox2, indicated by others’ and my studies, provides an example of the complexity of gene regulation. How transcription factors control gene expression, in cooperation with transcriptional coactivators, chromatin remodelers, and histone modifiers, remains to be fully understood. Characterizing this process is of particular significance, as it helps us understand the change of transcription program accompanying the developmental process of a cell in the context

161

of tissue, organ, and organism. This benefits our understanding of diseases ranging from congenital disorders to cancers, and would definitely facilitate the discovery of new drugs and methodologies to treat diseases.

References

Bose, D.A., Donahue, G., Reinberg, D., Shiekhattar, R., Bonasio, R., and Berger, S.L. (2017). RNA Binding to CBP Stimulates Histone Acetylation and Transcription. Cell 168, 135–149.e22.

Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., et al. (2010). Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. U.S.a. 107, 21931–21936.

Fang, F., Xu, Y., Chew, K.-K., Chen, X., Ng, H.-H., and Matsudaira, P. (2014). Coactivators p300 and CBP maintain the identity of mouse embryonic stem cells by mediating long-range chromatin structure. Stem Cells 32, 1805–1816.

Heinz, S., Romanoski, C.E., Benner, C., and Glass, C.K. (2015). The selection and function of cell type-specific enhancers. Nature Reviews Molecular Cell Biology 16, 144–154.

Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-André, V., Sigova, A.A., Hoke, H.A., and Young, R.A. (2013). Super-enhancers in the control of cell identity and disease. Cell 155, 934– 947.

Jin, Q., Yu, L.-R., Wang, L., Zhang, Z., Kasper, L.H., Lee, J.-E., Wang, C., Brindle, P.K., Dent, S.Y.R., and Ge, K. (2011). Distinct roles of GCN5/PCAF-mediated H3K9ac and CBP/p300- mediated H3K18/27ac in nuclear receptor transactivation. Embo J. 30, 249–262.

Kagey, M.H., Newman, J.J., Bilodeau, S., Zhan, Y., Orlando, D.A., van Berkum, N.L., Ebmeier, C.C., Goossens, J., Rahl, P.B., Levine, S.S., et al. (2010). Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435.

Kasper, L.H., Qu, C., Obenauer, J.C., McGoldrick, D.J., and Brindle, P.K. (2014). Genome-wide and single-cell analyses reveal a context dependent relationship between CBP recruitment and gene expression. Nucleic Acids Res. 42, 11363–11382.

Kim, J., Woo, A.J., Chu, J., Snow, J.W., Fujiwara, Y., Kim, C.G., Cantor, A.B., and Orkin, S.H. (2010). A Myc network accounts for similarities between embryonic stem and cancer cell transcription programs. Cell 143, 313–324.

Li, W., Notani, D., Ma, Q., Tanasa, B., Nunez, E., Chen, A.Y., Merkurjev, D., Zhang, J., Ohgi, K., Song, X., et al. (2013). Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature 498, 516–520.

162

Phillips-Cremins, J.E., Sauria, M.E.G., Sanyal, A., Gerasimova, T.I., Lajoie, B.R., Bell, J.S.K., Ong, C.-T., Hookway, T.A., Guo, C., Sun, Y., et al. (2013). Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153, 1281–1295.

Rada-Iglesias, A., Bajpai, R., Swigut, T., Brugmann, S.A., Flynn, R.A., and Wysocka, J. (2010). A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283.

Sanyal, A., Lajoie, B.R., Jain, G., and Dekker, J. (2012). The long-range interaction landscape of gene promoters. Nature 489, 109–113.

Stevens, T.J., Lando, D., Basu, S., Atkinson, L.P., Cao, Y., Lee, S.F., Leeb, M., Wohlfahrt, K.J., Boucher, W., O'Shaughnessy-Kirwan, A., et al. (2017). 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature 544, 59–64. van den Berg, D.L.C., Snoek, T., Mullin, N.P., Yates, A., Bezstarosti, K., Demmers, J., Chambers, I., and Poot, R.A. (2010). An Oct4-centered protein interaction network in embryonic stem cells. Cell Stem Cell 6, 369–381.

Wang, Y., Chen, J., Hu, J.-L., Wei, X.-X., Qin, D., Gao, J., Zhang, L., Jiang, J., Li, J.-S., Liu, J., et al. (2011). Reprogramming of mouse and human somatic cells by high-performance engineered factors. EMBO Rep. 12, 373–378.

Wei, Z., Gao, F., Kim, S., Yang, H., Lyu, J., An, W., Wang, K., and Lu, W. (2013). Klf4 organizes long-range chromosomal interactions with the oct4 locus in reprogramming and pluripotency. Cell Stem Cell 13, 36–47.

Zentner, G.E., Tesar, P.J., and Scacheri, P.C. (2011). Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions. Genome Res. 21, 1273–1283.

163