Characterization of CSL Complexes in the Notch Pathway: the Su(H)-NICD Interaction and the RBP-J-DNA Interaction

Ashley N. Contreras B.A., Earlham College, 2007

Committee Chair: Rhett A. Kovall, Ph.D.

A dissertation submitted to the Division of Graduate Studies and Research of the University of Cincinnati

In partial fulfillment of the requirements for the degree of Doctor of Philosophy (Ph.D.) In the Department of Molecular Genetics, Biochemistry, and Microbiology of the College of Medicine

November 2014 Abstract

In metazoans, the Notch pathway is a conserved cell-to-cell signaling mechanism that plays a key role in cellular specification. It is essential for cell fate determination in embryonic development, organogenesis, hematopoiesis, and adult tissue homeostasis.

Aberrant Notch signaling has been implicated in various cancers, developmental defects, and cardiovascular disease. Despite these various roles, Notch signaling is relatively simple; extracellular ligand- binding triggers proteolytic processing of the Notch receptor. The newly generated receptor fragment, the Notch intracellular domain (NICD), translocates to the nucleus and interacts with the DNA-binding

CSL (CBF1/RBPJ in mammals; Suppressor of Hairless [Su(H)] in flies; Lag-1 in worms), to ultimately activate transcription of Notch responsive . In the absence of pathway activation, however, CSL also represses transcription of these genes. The ability of CSL to differentially regulate expression is determined by its interaction with co-regulators (co-activators or co-repressors). The research described in the following chapters characterizes the molecular details of two different interactions of

CSL—the Su(H)-NICD interaction and the RBP-J-DNA interaction. Chapter 1 reviews

Notch signaling. In Chapter 2, the thermodynamic details of the Su(H)-NICD interaction are explained, revealing a domain-domain interaction novel to the complex in

Drosophila. The interaction between RBP-J and different sequences of DNA is quantified in Chapter 3, which demonstrates variations to the CSL binding sequence can positively or negatively affect the binding affinity of CSL. In Chapter 4, the work presented here is summarized and additional studies are proposed to further elucidate

ii

the molecular mechanisms of the complexes regulating expression of Notch target genes.

iii

iv

Acknowledgements

Science is a difficult path, full of obstacles and failure and fleeting joy. When I chose to go to graduate school for molecular biology, I could not comprehend this truth.

Since then, I have experienced my share of obstacles and failure and joy, but I would not have succeeded in reaching this point in my graduate career without a few tremendously important people. I would like to acknowledge them now.

First and foremost, I am grateful for my mentor, Dr. Rhett Kovall, and the Kovall lab, who have guided and supported me every step of the way. Rhett, with his passion for science and genuine interest in our scientific development and our well-being, has been the best mentor any graduate student could want. I truly appreciate that he allowed me to pursue options outside the lab, like the Preparing Future Faculty program and being a teaching assistant, and how he has supported me during challenging parts of my graduate career. He has, by word and action, shown me what a scientist should be. As for my lab colleagues, thank you for everything and then some. Zhenyu Yuan and Brad VanderWielen answered a few million questions from me alone, teaching me how to use and even fix equipment, how to troubleshoot a problem, and how to multitask like a pro. Nassif Tabaja and Nate Miller have been equally helpful in developing my lab skills, though I am also grateful they gave me the opportunity to teach them when they first started in the lab. They have gone beyond anything I could teach them now, and I know they will go much further. And for all other members of the

Kovall lab, past and present, I will miss our morning coffee discussions, where I learned so much about science and life. Without the Kovall lab, I would not be here.

v

Next, I would like to thank my family and friends, who have been with me through the roller coaster of graduate school. They have all listened to me drone on about how cool science is or about how frustrating repeated failures in the lab are or about how I will never decide what to do with my life. Because they are incredible, patient, supportive people, they remained with me through it all and have continued to encourage my scientific career. In particular, I would like to thank my parents, Bill and

Patsy Reyer, who have always believed in me and helped me with everything beyond the lab. I would also be remiss if I did not thank my close friends Maureen and Megan as well as the other ladies in my year of graduate school—Kayleigh, Christine, and

Fabiola—who made life fun, encouraged me, and commiserated with me.

Last, I owe endless gratitude to my husband, Eric Contreras. We met during my first year of graduate school, and my life has been infinitely better because of it. I cannot say thank you enough for listening to me prattle on about my research or school issues or searching for a career. Eric listens, then tells me to calm down, points out it is not that big a deal, gives me encouragement, and makes me laugh. Without his strength, calm, and humor, I could never have survived the obstacles of graduate school.

vi

Table of Contents

ABSTRACT…………………………………………………………………………….. ii

ACKNOWLEDGEMENTS…………………………………………………………….. v

TABLE OF CONTENTS………………………………………………………………. vii

LIST OF ABBREVIATIONS…………………………………………………………... x

CHAPTER 1: AN OVERVIEW OF THE NOTCH SIGNALING PATHWAY…….. 2 Signaling Through the Notch Pathway………………………………………….. 3 Figure 1: Overview of the Notch Signaling Mechanism…………………… 6 Regulation of Notch Signaling……………………………………………………. 7 Biological Role of Notch Signaling and Associated Disease States…………. 12 Notch Pathway Components……………………………………………………… 21 Figure 2: Domain Schematics of Notch Signaling Components………….. 24 CSL and DNA Binding………………..………………..………………..…………. 32 Figure 3: CSL Binds Two Distinct Sites in the Hes1 Target Gene………… 35 Figure 4: Structures of Corepressor Complexes in the Notch Pathway….. 43 Figure 5: The Ternary Complex………………………………………………. 45

CHAPTER 2: THERMODYNAMIC BINDING ANALYSIS OF NOTCH TRANSCRIPTION COMPLEXES FROM D. MELANOGASTER………………….. 49 Abstract..………………..………………..………………..………………………... 50 Introduction..………………..………………..……………………………………… 51 Figure 1: Overview of CSL-Mediated Transcription Regulation……………. 53 Results………………..………………..………………..…………………………… 55 Analysis of Su(H)-NICD Interactions……………….………………………….. 55 Cross-species Binding Studies of CSL-NICD Interactions…………………...56 Table 1: Calorimetric Data for the Binding of Drosophila NICD to Su(H)…. 57 Figure 2: Thermodynamic Binding Analysis of Notch from Drosophila………..………………..………………..…………. 58 Table 2: Calorimetric Data for NICD-CSL Binding Between Mouse and Drosophila Components…………………………………………….. 59 Figure 3: Cross-Species Binding Experiments (RBP-J + dNICD)…………. 60 Figure 4: Cross-Species Binding Experiments (Su(H) + mNICD)…………. 61 Binding Analysis of Su(H)-dRAMANK Point Mutations……………………… 62 Characterizing the Effect RAM Binding Has on Su(H)-Hairless Interactions 63 Table 3: Calorimetric Data for Su(H) and dNICD Points Mutants…………. 64 Table 4: Calorimetric Data for Competition ITC Between Su(H)-RAM and Hairless…………………………………………….. 64 Discussion………………………..………………..………………..……………….. 65

vii

Figure 5: Characterizing the Effect RAM has on Su(H)-Hairless Interactions………………………………………… 66 Figure 6: Revised Model of Ternary Complex Assembly for Drosophila Notch Proteins……………………………………… 70 Materials and Methods……………………………………………………………... 71

CHAPTER 3: QUANTITATIVE BINDING ANALYSIS OF THE CSL-DNA INTERACTION IN THE NOTCH PATHWAY………………………………………… 74 Figure 1: CSL Binds two Distinct Sites in the Hes1 Gene………………….. 78 Figure 2: Diagram of the Contacts Made Between CSL and the Hes1 Consensus Sequence DNA………………………………….. 83 Figure 3: CSL-Hes1 DNA ITC Binding Assays………………………………. 86 Putative CSL Binding Sites in the Mouse Math5 Gene…………………………. 89 Thermodynamic Characterization of Computationally Derived CSL Binding Sites…………………………………………………………. 90 Figure 4: ITC Binding Data for RBP-J and Putative CSL Binding Sites in the Math5 Gene……………………………………………………………. 93 Additional DNA Sequences………………………………………………………… 95 Figure 5: ITC Binding Data for RBP-J and Computationally Derived Sites…………………………………….... 97 Table 1: Additional Sequences of DNA Analyzed for RBP-J Binding by ITC…………………………………………….. 99

CHAPTER 4: CONCLUSIONS AND FUTURE DIRECTIONS…………………….. 100 Hairless and NICD Competition for Binding to Su(H)…………………………… 101 The Effect of Sequence Variation on the CSL-DNA Interaction………………. 105 Figure 1: Species-Specific Models of Ternary Complex Assembly………. 107 Figure 2: Sequence Alignments of CSL, NICD, and the ANK Domain for Mouse and Fly Orthologs……………………….. 108 Future Directions……………………………………………………………………. 110 Figure 3: Comparison of CTD Binding Site for Hairless and ANK Domain 114

BIBLIOGRAPHY………………………………………………………………………… 117

APPENDIX A: A COMBINATION OF COMPUTATIONAL AND EXPERIMENTAL APPROACHES IDENTIFIES DNA SEQUENCE CONSTRAINTS ASSOCIATED WITH TARGET SITE BINDING SPECIFICITY OF THE CSL……………………………………………………………………………. 130 Figure 3: CSL-DNA ITC Binding Experiments………………………………. 138 Table 2: Calorimetric Data for Various DNA Sequences Binding to CSL… 141

APPENDIX B: STRUCTURAL AND BIOPHYSICAL CHARACTERIZATION OF THE Su(H)-HAIRLESS REPRESSION COMPLEX……………………………………….. 145 Figure 1: Hairless is a Corepressor of Notch Signaling in Drosophila……. 149 Figure 2: Hairless Interacts with Su(H)……………………………………….. 150

viii

Figure 3: Structure of the DNA-Su(H)-Hairless Repression Complex……. 153

APPENDIX C: CLONING, PURIFICATION, AND BINDING STUDIES OF THE NOVEL NOTCH CO-REPRESSOR, DROSOPHILA INSENSITIVE………………………… 154

APPENDIX D: PURIFICATION AND QUANTITATIVE BINDING ANALYSIS OF THE SPOC DOMAIN OF DROSOPHILA SPLIT ENDS PROTEIN……………………… 161 Figure 1: The SPOC Domain in MINT, Spen, and Nito…………………….. 165 Figure 2: The SPOC Domain Interacts with NCoR Corepressors………… 166

APPENDIX E: PURIFICATION AND CRYSTALLIZATION OF THE ANKYRIN REPEATS OF THE HUMAN NOTCH INTRACELLULAR DOMAIN………………. 170

ix

List of Abbreviations (alphabetical)

ANK Ankyrin domain BEN BanP/SMAR1, E5R, NAC1 domain bHLH basic helix-loop-helix BTD β-trefoil domain CBF-1 C promoter binding factor 1 ChIP Chromatin immunoprecipitation CSL CBF-1, Su(H), Lag-1 CtBP C-terminal binding protein CTD C-terminal domain DLL Delta-like ligands DNA Deoxyribonucleic acid DOS Delta and OSM11-like proteins DSL Delta, Serrate, Lag-2 DTT Dithiothreitol EGF Epidermal growth factor EMSA Electrophoretic mobility shift assay E(spl) Enhancer of split GST Glutathione S-Transferase HAT histone acetyltransferase HDAC Histone deacetylase complex HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid Hes Hairy and Enhancer of Split Hey Hes-related with YRPW motif IPTG Isopropyl β-D-1-thiogalactopyranoside ITC Isothermal titration calorimetry kD kilodalton Kd disassociation constant LNR Lin12/Notch repeats MAM Mastermind, also abbreviated Mm or MAML MBP Maltose binding protein MES 2-(n-morpholino) ethanesulfonic acid MINT Msx2 interacting nuclear target NaPi Sodium phosphate buffer NCoR co-repressor 1 NICD Notch intracellular domain NLS Nuclear localization signal NRR Negative regulatory region NTD N-terminal domain PEST Proline, glutamic acid (E), serine, threonine-rich domain PDB Protein database RAM RBP-J associated molecule RBP-J Recombining binding protein Suppressor of Hairless

x

RHR Rel homology region RITA RBP-J interacting and tubulin associated protein SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis SHARP SMRT/HDAC associated repressor protein SMRT Silencing mediator of retinoid and thyroid hormone receptors SMT Suppressor of mif two 3, yeast homolog of SUMO, fusion tag SPOC SPEN Paralog and Ortholog C-terminal SPS Sequence paired site or Su(H) paired site Su(H) Suppressor of Hairless T-ALL T-cell acute lymphoblastic leukemia TACE ADAM family metalloprotease TCEP Tris(2-carboxyethyl)phosphine Trr Trithorax-related Trx Trithorax WT Wild type

Δ delta g gram μ micro m milli n nano Å angstrom °C degrees Celsius M Molar kcal kilo-calories mol mole, 6.02*1023 T temperature ΔG change in free energy ΔH change in enthalpy ΔS change in entropy ΔCp heat capacity

xi

1

Chapter 1: An Overview of the Notch Signaling Pathway

2

Overview of the Notch Signaling Pathway

The Notch signaling pathway is a cell-to-cell signaling mechanism conserved in metazoans that regulates the expression of specific genes critical for cell fate determination. The pathway is activated when a DSL ligand on the surface of one cell interacts with the extracellular domain of the Notch receptor present on a neighboring cell. Ligand-receptor interaction triggers proteolytic processing of the receptor, resulting in release of the Notch intracellular domain (NICD) from the cell membrane. The NICD translocates to the nucleus and interacts with the DNA-binding protein CSL

(CBF1/RBPJ in mammals, Su(H) in flies, Lag-1 in worms) to ultimately activate transcription of Notch responsive genes. In the absence of pathway activation, however, CSL also functions to repress transcription of these genes. The ability of CSL to differentially regulate gene expression is determined by its interaction with coregulatory proteins (coactivators or corepressors), placing CSL at the center of a transcriptional switch. Despite the relatively simple linear nature of signaling in the

Notch pathway, there are numerous levels of complexity added from the highly variable and tissue-specific nature of the pathway’s transcriptional output.

Signaling Through the Notch Pathway

Canonical Notch signaling proceeds through a relatively linear pathway, converting an extracellular signal into transcriptional activation of target genes.

Regulation of the pathway occurs at multiple levels, enabling the pathway to function in a highly specific manner.

3

The Canonical Pathway

Pathway activation occurs when a DSL ligand on the signal-sending cell interacts with the Notch receptor on an adjacent signal-receiving cell. (Figure 1) This trans interaction initiates endocytosis of the ligand-receptor complex into the signal-sending cell, and the mechanical force generated by this process causes a conformational change in the bound receptor, exposing site 2 (S2) for cleavage by the TACE metalloprotease.1-2 S2 cleavage releases the extracellular domain of the Notch receptor from the signal-receiving cell, allowing the signal-sending cell to complete endocytosis of the ligand-receptor (extracellular domain) complex and its subsequent lysosomal degradation.3-4 The remaining portion of the cleaved receptor remains anchored to the cell membrane until the γ-secretase complex progressively cleaves site 3 (S3) in the

Notch transmembrane domain.5-7 γ-secretase cleavage results in formation of the NICD

(Notch intracellular domain), which translocates to the nucleus through a poorly understood process. Inside the nucleus, NICD associates with the DNA-binding transcription factor CSL present in the regulatory region of Notch target genes.8-10 This complex of DNA-CSL-NICD recruits several coactivators, including Mastermind and the general transcriptional coactivators p300 and GCN5/PCAF, both of which contain histone acetyltransferase domains.11-14 Mastermind binds to the existing complex, forming the ternary complex (CSL-NICD-Mastermind) necessary for transcriptional activation.12-13 In addition to histone acetyltransferases, the ternary complex also recruits other chromatin remodeling factors and the MED8 mediator complex to initiate target gene transcription.15 Mastermind localization to the ternary complex also recruits

CDK8 kinase, which phosphorylates the PEST domain of NICD.16 These sites are

4

bound by E3 ubiquitin ligases Sel10/Fbw7, which ubiquitinate NICD, targeting it for proteasomal degredation.17-19 Without NICD, the ternary complex disassembles and signaling ends as the transcriptional machinery leaves the target gene.15 Presumably,

CSL is bound by one of several corepressor proteins, such as Hairless in flies or

MINT/SHARP and KyoT2 in mammals. Corepressors can indirectly recruit repression machinery by associating with additional corepressors, as demonstrated by Hairless’ recruitment of two global corepressors, Groucho and CtBP (C-terminal binding protein), that interact directly with HDACs.20-22 Similarly, MINT/SHARP and KyoT2 recruit CtBP,

NCoR/SMRT, and other global corepressors to link CSL with repression machinery.23-24

Notch target genes are held in a repressed state by remodeling of the local chromatin environment or by preventing the transcriptional machinery from binding or through a combination of these activities. When a new round of Notch signaling is activated, corepressors are exchanged for NICD and Mastermind coactivators, though the molecular details of this process are currently unknown. It is important to note that in worms there is little to no data that shows CSL functions as a transcriptional repressor.

Rather, Notch targets in worms are likely held in a state poised for transcription and activated upon pathway stimulation.25

Non-canonical Notch signaling has been identified through a variety of mechanisms, though most remain poorly defined to date. Two primary forms of non- canonical Notch signaling have been identified: DSL-independent Notch activation and vesicle-mediated transport.26 In DSL-independent activation, proteins that resemble

5

Figure 1: Overview of the Notch Signaling Mechanism When a DSL ligand present on the surface of the signaling cell interacts with a Notch receptor present on the surface of a signal receiving cell, it induces mechanical stress on the NRR region of the Notch receptor, exposing the proteolytic site for the ADAM metalloprotease, TACE. TACE cleavage allows the extracellular portion of the Notch receptor, still in complex with the DSL ligand, to be endocytosed by the signal sending cell. The portion of the Notch receptor remaining in the signal receiving cell is proteolytically cleaved by the gamma secretase complex, freeing the Notch intracellular domain (NICD). The NICD translocates to the nucleus and displaces corepressors associated with CSL present on the DNA of Notch target genes, thus disassembling the repression complex. NICD and a second coactivator, Mastermind, then interact with CSL, forming the ternary complex necessary for activation of transcription of Notch target genes. Transcriptional activation is terminated when the PEST domain of NICD is phosphorylated, ubiquitinated, and degraded by the proteasome. Without NICD, the ternary complex disassembles and CSL is presumeably bound by a corepressor a nd expression of the target gene is repressed.

6

the DSL ligand but lack the DSL motif interact with the Notch receptor to activate signaling.27-28 Such proteins include Delta-like 1 (DLK1), Delta-and Notch-like epidermal growth factor-related receptor (DNER) and Jedi.27-28 Vesicle-mediated transport of the Notch receptor functions in non-canonical signaling by trafficking the receptor through endosomal-sorting machinery. Since the intracellular location of

Notch can either downregulate or activate signaling, vesicle-mediated transport of the receptor is an important effector of its activity. However, it is unclear precisely which compartment and which mechanisms are responsible for non-canonical pathway activation at different cellular locations. A recent study implicated the Notch antagonist

Numb in regulation of receptor endocytosis, using vesicle-mediated transport of the receptor to prevent pathway activation.29 Additional forms of non-canonical activation have been reported, such as activation without receptor cleavage, indicating the area of non-canonical Notch signaling remains murky.26

Regulation of Notch Signaling

Similar to other signaling pathways, the Notch pathway is highly regulated at various points in the cell to maintain Notch signaling at levels and intensities (i.e. signal duration) appropriate to the cell type and developmental event. The primary ways in which such regulation is accomplished is by controlling the amount of protein components available for signaling. This is done in a variety of ways, such as controlling the expression level of proteins with feedback loops and post-translational modifications. Adding even more complexity is the fact that the stability and intracellular location of these proteins are determined by post-translational modifications and

7

intracellular trafficking. By controlling which ligands and receptors are expressed on the cell surface and the length of time they are expressed, the cell can affect the amount and intensity of Notch activation it experiences.

The Notch pathway involves cis and trans interactions among ligands and receptors. In brief, cis interactions occur between ligands and receptors on the same cell, preventing either protein from participating in Notch signal transduction. Trans interactions occur between ligands and receptors on adjacent cells, resulting in activation of the pathway in the cell expressing the participating receptor. Both interactions will be discussed in greater detail later in this chapter. While the precise details of these cis-trans relationships are not fully understood, it is apparent the ratio of them is an essential regulatory mechanism in cells within an equivalence group, which are a group of undifferentiated cells with the potential to adopt various fates.26 A recent study within a mammalian cell culture system demonstrated it is the ratio of cis and trans interactions between neighboring cells that dictates which cell becomes the signal- sending cell (a high Delta/Notch ratio) and which cell becomes the signal-receiving cell

(a low Delta/Notch ratio) with activation of the Notch pathway.30

Another significant regulatory mechanism of the Notch pathway is gene dosage sensitivity, in which an organism “counts” Notch gene dosage, such that either too much or too little Notch will result in altered function of that cell type.31-32 Sensitivity of the pathway to gene dosage was originally identified in flies, but additional studies have shown mammals also exhibit abnormal effects when gene dosage of pathway components is altered.26,32-33 Both the Notch receptor and the ligand Delta are haploinsufficient, meaning a single functional copy of the gene does not produce

8

enough gene product in a diploid organism to function normally, resulting in an abnormal or disease state.26 Haploinsufficiency is uncommon in diploid organisms, demonstrating the Notch pathway is extremely sensitive to gene dosage for proper function.26 Similarly, Notch is one of only two genes in flies that are triplomutant, meaning three copies of the Notch gene, rather than the standard two copies, displays the characteristic mutant phenotype of notched wings.26,34 A possible reason the Notch pathway is so sensitive to gene dosage may be due to the stoichiometric interactions between pathway components, which suggests even a small stoichiometric difference in receptor and ligand levels may inappropriately restrict signaling in a cell population.26,33

Other signaling pathways lack this stoichiometric interaction, instead relying on enzymatic amplification step(s) to increase the amount of signal.26

Target Genes Regulated by the Notch Pathway

CSL binds to DNA in the regulatory region of Notch target genes and controls gene expression by differentially interacting with coactivators or corepressors. These coregulatory proteins recruit and interact with larger chromatin remodeling complexes, such as histone deacetylases or histone acetyltransferases, or with general repression or activation machinery.

The most characterized target genes of the Notch pathway are the Hes (hairy and enhancer of split) family of genes. The Hes gene family encodes basic helix-loop- helix (bHLH) proteins that repress the transcription of other genes, many of which play a critical role in cell differentiation and fate choice.35 Flies possess one Hairy gene and seven clustered Enhancer of Split [E(spl)] genes that control key developmental

9

processes, like segmentation, myogenesis or neurogenesis, by inhibiting the transcription of proneural bHLH activators like Atonal, Daughterless, and the Achaete-

Scute complex.36-37 In mammals, there are seven Hes genes (Hes1-7) and three Hey

(Hes-related with YRPW motif; Hey1-2, HeyL) genes.38-40 Mammalian Hes genes function in organ development by maintaining populations of progenitor cells and by regulating binary cell fate decisions.35 For example, Hes1, Hes3, and Hes5 play critical roles in repressing neurogenesis in the developing nervous system.41-43 Hes1 alone is essential for development of the nervous system, sensory organs, pancreas, endocrine cells, and lymphocytes.42,44-47 Direct targets identified for repression by Hes1 i nclude the bHLH activators Mash1 and E47, which can heterodimerize and activate neuron- specific gene expression.35,48 Similarly, Hes1 inhibits pancreatic bHLH activators Ptf1a and neurogenin 3, which specify exocrine and endocrine cell fates, respectively.49-

50Additionally, mammalian Hes genes can function as biological clocks to measure time during developmental events, like somite segregation. In many cell types, oscillatory expression of Hes1 through a negative autoregulatory feedback loop may serve as a biological .51 Hes1, Hes5 and Hes7 are expressed and function in an oscillatory manner during somite formation, a developmental process forming the segmental units

(somites) that give rise to the vertebrae, ribs, skeletal muscles and dermis.52-53

However, Hes7 has been identified as the most critical of these three for somite formation, as a lack of Hes7 precludes somite formation, resulting in fused vertebrae and ribs or fused somites.53-54 Together, the Notch pathway and its effectors, the Hes genes, coordinate gene expression spatially and temporally in adjacent cells, promoting proper tissue and organ development during embryogenesis.35

10

All Hes genes contain three conserved domains: the bHLH, Orange, and WRPW domains. The bHLH domain consists of a basic region for DNA binding and the helix- loop-helix region for dimerization.35,55 Unlike other bHLH factors, Hes proteins possess an invariant proline residue in the middle of the basic region that is proposed to confer binding to non-consensus sequences.37 While the consensus sequence for bHLH factors is an E box (CANNTG), Hes proteins can bind with higher affinity to the class C site (CACG[C/A]G) or the N box (CACNAG).55-57 In Hey proteins, the invariant proline is replaced by a glycine residue, allowing Hey proteins to bind only the E box sequence.58

In addition to forming homodimers, Hes proteins can heterodimerize through the helix- loop-helix region with Hey proteins, forming a complex that binds class C sites with higher affinity to repress transcription more effectively.58 Hes proteins can heterodimerize with other bHLH factors, though these complexes cannot bind DNA and therefore have a dominant-negative effect on E-box binding bHLH activators.35 This dominant-negative effect is the mechanism of passive repression by Hes proteins.35

Selection of bHLH heterodimer partners is regulated by the two amphipathic helices of the Orange domain.59-60 The WRPW domain of the Hes protein family, consisting of the tetrapeptide Trp-Arg-Pro-Trp, plays a key role in active repression by Hes homodimers by recruiting and interacting with an additional corepressor—Groucho in flies or the mammalian Groucho orthologs, TLE1-4 (Transducin-like E(spl)).37,61 Groucho or TLE1-

4 recruit histone deacetylase to the site, inhibiting transcription of the Hes target gene by remodeling the local chromatin environment to an inactive state.35 Since Hey proteins possess a YRPW (or YXXW in HeyL) domain instead of a WRPW domain, they are unable to recruit and bind TLE1-4 corepressors, though they are able to repress

11

transcription as a heterodimer with Hes proteins.35,58,62 Additionally, the WRPW domain is important in regulating the very short half-life (~20 minutes) of Hes proteins, as it is polyubiquitylated and targeted for degradation by the proteasome.51

Though Hes genes represent the primary targets of the Notch pathway, recent studies have shown other genes are transcribed in response to pathway activation.63 In cell types where Notch promotes proliferation, direct target genes include , cyclinD,

CDK5, and string/CDC25.64-67 Similarly, in cell types where Notch promotes exit from the cell cycle and differentiation, p21 has been shown to be a target gene.68

Biological Role of Notch Signaling

The Notch pathway plays a fundamental role in the development of multicellular organisms by controlling cell fates and, thus, morphogenesis. Signaling through the pathway yields highly pleiotropic outcomes influencing differentiation in a cell-specific manner. The Notch pathway affects a very broad, if not complete, spectrum of developing tissues and organs and is intimately associated with the maintenance and fate of stem cells. To determine cell fate, the pathway operates through two primary modes of signaling: binary cell fate decisions, in which cells choose between two possible fates, and inductive signaling, in which a novel fate is specified.69 With metazoa frequently generating different cell types through successive rounds of mitosis, iterative Notch signaling at each step enables a single multipotent cell to give rise to a diverse array of terminally differentiated cells.70

The Notch pathway enables the cell-to-cell communication necessary for binary cell fate decisions, preventing two adjacent equipotent cells from adopting the same

12

fate.71 There are two subtypes of binary cell fate decisions: lateral inhibition and asymmetric cell divisions. In lateral inhibition, two roughly equivalent cells signal to each other via inhibitory reciprocal Notch signaling, which amplifies an initially small difference between the two cells.69 While both cells initially have similar levels of ligand and receptor, activation of the Notch pathway in one cell prevents it from activating the pathway on adjacent cells.71 The Notch-activated cell accomplishes this by downregulating the expression and activity of DSL ligand on its cell surface.69 This downregulation occurs through a transcriptional feedback loop mediated by the Hes gene family of Notch targets.69,71 An example of lateral inhibition executed by the Notch pathway occurs in developing fly ommatidia and neural preclusters.69

In addition, recent work suggests a role for cis-inhibition during lateral inhibition.

During cis-inhibition, ligand and receptor expressed on the same cell interact with each other, rendering the Notch receptor refractory to trans interactions with a ligand on an adjacent cell and leading to receptor degradation.30,72 Ultimately, cis-inhibition causes

Notch signaling to transition from bidirectional to unidirectional, directing each cell to adopt a different fate. Since cis-inhibition occurs independently of transcriptional feedback, it has been proposed to function at the early stages of lateral inhibition.30

When cis-inhibition occurs, both the ligands and receptors on a cell are unable to interact with their partners on neighboring cells.73-74 With all of its receptors occupied, the cis-inhibited cell is unable to respond to ligands on adjacent cells, maintaining its inactive signaling state.73-74 Eventually, the cell produces more ligand than receptor, so the excess ligand is able to interact with a receptor on a neighboring cell.73-74 Coupling these events—inhibition and initiation of signaling—together through cis-inhibition

13

allows lateral inhibition to select exactly one precursor cell from the initial cluster with very little error.73-74

In a different subtype of binary cell fate decisions—asymmetric cell divisions— intrinsic signals bias equivalent sister cells to adopt different cell fates.71 Essentially,

Notch regulators are asymmetrically inherited in daughter cells. A classic example of this occurs during fly neurogenesis, in which a Notch inhibitor, Numb, is asymmetrically segregated during successive rounds of cell division.75 Numb affects Notch receptor endocytosis, thereby lowering the level of Notch available for activation at the cell membrane and biasing that cell to become the signal-sending cell.75 Cells with higher levels of Numb, therefore, have much lower levels of Notch activation than their sister cell, resulting in each daughter cell adopting a distinct fate.71,75 Mammalian Numb has been implicated in inhibiting Notch signaling, but it is unclear whether it uses asymmetric sorting or additional mechanisms to do so.71

During inductive signaling, Notch signaling occurs between two non-equivalent populations of cells and can establish an organizer and/or segregate the two groups.69

This allows a new cell type to be specified at the boundary of two fields of distinct cell types.71 Inductive signaling is known to occur in both stromal and progenitor cells, as well as in the fly wing primordium.69

The Notch pathway has only recently been associated with proliferation, self- renewal, and differentiation of adult stem cell populations. Stem cells reside in nearly all mammalian tissues, serving as the basis of development and maintenance of the tissue.71 Adult stem cells can self-renew or produce daughter cells that either terminally differentiate or continue to proliferate as stem cells with more restricted potential.71

14

Though most of the data to date comes from genetic experiments rather than biochemical experiments, the Notch pathway has been shown to function in several stem cell-maintained tissues, including the fly and mammalian intestine and the mammalian skin, respiratory system, hematopoietic system, and muscle.76-82

Notch Signaling and Disease

With such prominent and highly pleiotropic effects in metazoan development, it is not surprising aberrant Notch signaling is involved in a diverse array of human disease states. The Notch pathway is therefore a highly attractive therapeutic target with the potential to be modulated at various points. However, a host of challenges are posed by the sheer number of cell types the pathway affects as well as the increasingly complex regulatory mechanisms governing its activity. Despite these obstacles, the

Notch pathway remains a highly attractive therapeutic target for researchers and pharmaceutical companies alike.

Notch as a classic oncogene

Human T-cell acute lymphoblastic leukemias/lymphomas (T-ALL) were the first diseases associated with altered Notch function.83 Preferentially affecting children and adolescents, T-ALL is an aggressive cancer commonly associated with acquired chromosomal translocations and other genetic or epigenetic abnormalities that lead to aberrant expression of transcription factors.31 In 1991, when T-ALL was first linked to the Notch pathway, a case study showed a chromosomal translocation occurred between the Notch1 locus on 9 and the T-cell receptor β locus on

15

chromosome 7.83 This rearrangement resulted in the expression of a truncated, constitutively active Notch1 receptor.83 Subsequent research has shown the Notch1 receptor is essential for the normal development of T-cell progenitors, required for commitment of progenitors to the T-cell fate, and is required for the assembly of pre-T- cell receptor complexes in immature thymocytes.84 More than half of all T-ALL cases involve an activating mutation of Notch1, with these mutations typically affecting the

NRR (negative regulatory region) or PEST domains of the receptor.85 In T-ALL cell lines lacking a chromosomal translocation, Notch pathway signals are necessary for growth, emphasizing the central role of Notch1 in T-ALL pathogenesis.85

Similarly, a subtype of mature B-cell lymphomas called diffuse large B-cell lymphomas have been linked to activating mutations in the Notch2 receptor.86 One mutation results in either partial or complete deletion of the PEST domain from the receptor while another mutation is a single amino acid substitution at the C-terminus of the protein.86 Some cases possess increased copies of the mutated Notch2 allele.86

Tumor models in mice have been performed, linking abnormal activity of all four

Notch receptor paralogs with the formation of solid tumors and leukemias in mice.87-88

However, only correlative evidence exists for human cancers other than T-ALL.31 It has been proposed that Notch signaling, presumably in synergy with other factors, may promote dramatic proliferation of cell populations prone to accumulating oncogenic mutations, making Notch activation important for oncogenesis but not necessarily as a classic oncogene.31,89

Notch in hereditary pleiotropic disease

16

In 1997, the Jagged1 ligand of the Notch pathway was first linked to Alagille syndrome, a multisystem disorder with developmental abnormalities that may affect numerous tissues and organs, including liver, skeleton, kidney, heart, and face.31,90-91

The disease typically manifests as a paucity of intrahepatic bile ducts in combination with a variety of additional clinical features, the most common of which are chronic liver disease, cardiac disease, and skeletal abnormalities.92 With such highly pleiotropic symptoms, Alagille syndrome reflects the broad developmental action of Notch signaling in humans. The disorder is caused by frameshift or splice site mutations of the Jagged1 ligand, resulting in Jagged1 haploinsufficiency.90-91 Though dosage of genes in the

Notch pathway were known to be important in flies, the role of Jagged1 haploinsufficiency as the basis of Alagille syndrome marks the first time dosage was associated with human disease.90 Further research revealed premature termination and missense mutations in the Notch2 receptor can also result in the disorder, making

Allagile syndrome a heterogeneous disorder of Notch signaling.93 The role of Notch2 in disease development can be explained by mouse models, where the Notch2 receptor is a genetic modifier of Jagged1 haploinsufficiency.94

Notch and the skeleton

Notch signaling has long been associated with patterning the mammalian axial skeleton and, thus, with skeletal disorders such as spondylocostal dysostosis (SCD).31

A group of vertebral malsegmentation syndromes, SCD manifests in patients as reduced stature due to axial skeletal defects.31 The role of the Notch pathway in SCD was first identified in a classic mouse mutant, pudgy, whose severe vertebral and rib

17

deformities were caused by a mutation in the Notch pathway ligand Delta-like 3 (Dll3).95

Autosomal recessive forms of SCD in humans are caused by truncation and missense mutations of the human ortholog of the ligand, DLL3.96-97 Mutant forms of DLL3 target newly synthesized Notch1 receptor for lysosomal degradation prior to its post- translational processing and presentation on the cell surface.98 Three additional proteins in the Notch pathway have been linked to SCD. Inactivating mutations in the glycosyltransferase Lunatic Fringe prevents the enzyme from adding regulatory modifications to the Notch receptors.99 Missense mutations in Hes7, a direct target gene of Notch signaling, disrupt its ability to attenuate Notch signaling through a negative feedback mechanism.100-101 Mutations in another direct target gene of the

Notch pathway, MESP2, can cause not only SCD, but a related disorder, spondylothoracic dysostosis.102-103

Notch in metabolic bone disease

Aberrant Notch signaling is associated with additional skeletal disorders in the form of metabolic bone disease. Two such diseases share analogous mutations in the

Notch2 receptor, with the mutations occurring primarily in the last exon.104-106 These mutations are predicted to result in premature termination, disruption, or elimination of the PEST domain of the receptor, thus affecting its proteasomal degradation and allowing pathway activation to persist.104-106 One of these diseases, Hajdu-Cheney syndrome, is a rare, mostly sporadic disease, though it has autosomal dominant inheritance in certain families.31 It is a multisystem disorder of connective tissues, characterized by severe and progressive focal bone loss, generalized osteoporosis and

18

variable craniofacial abnormalities and renal cysts.31 The other rare metabolic bone disease linked to Notch2 mutations is serpentine fibula polycystic kidney syndrome, which manifests similar to Hadju-Cheney syndrome except for a characteristic S-shaped fibula.31

Mouse studies have shown a link between Notch signaling and regulation of bone density in adults. Mice lacking Notch1 and Notch2 receptors initially showed increased bone mass during adolescence, but later developed osteoporosis (reduction in bone density) with age, implicating Notch signaling in bone homeostasis.107 Further support comes from research showing constitutive activation of Notch1 signaling in osteoblasts caused increase proliferation of immature osteoblasts, leading to severe osteosclerosis (increase in bone density).108 Additionally, loss of Notch signaling in osteoblasts through loss of γ-secretase activity resulted in late-onset osteoporosis.108 In both disorders, the root cause appears to be misregulation of Notch target genes, including osteoblastic transcription factors like Runx2, as well as cell cycle proteins.31,108

Notch in cardiovascular disease

Genetic analysis of numerous mouse models demonstrated the important role of

Notch signaling in various aspects of cardiovascular development.109 Most of these mouse models exhibited lethal cardiovascular defects or vascular abnormalities.109 In humans, one of the most common features of Alagille syndrome, caused by mutations in the Notch pathway ligand Jagged1, is cardiovascular anomalies.110 Jagged1 mutations are also present in patients with isolated congenital heart defects, such as tetralogy of Fallot or pulmonic stenosis, implicating the pathway in a broader variety of

19

cardiovascular diseases.111-112 Cases of familial and sporadic forms of aortic valve disease have been linked to nonsense and frameshift mutations in the Notch1 receptor, manifested as structural defects in the aortic valve and an increased rate of valve calcification in adulthood.113 Similar to the metabolic bone diseases affected by aberrant Notch signaling, the Notch1 mutations linked with cardiovascular diseases result in deregulation of Notch-mediated repression of Runx2, a transcription factor required for osteoblast cell fate.113

Notch in cerebrovascular disease

CADASIL (Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy) is the most common monogenic form of ischemic stroke, presenting in a varied manner with symptoms including subcortical ischemic events, cognitive impairment and dementia, migrane with aura, mood disturbances, and apathy.31,114 Affecting primarily the small penetrating cerebral and leptomeningeal arteries, the arteriopathy is characterized by thickening of the arterial wall and morphological changes on vascular smooth muscle cells and pericytes.114-115 Another noteable feature of the disease is the presence of granular osmiophilic material close to the surface of smooth muscle cells and pericytes in the brain and skin arteries.115-116

Mutations of the Notch3 receptor are the causative agent of CADASIL.31,117 Strikingly, the mutations occur almost exclusively in the extracellular domain of the Notch3 receptor, typically within exons 3 and 4 of the EGF-like repeats.117 Over 95% of the mutations are missense, though some small in-frame deletions or splice site mutations have been identified in patients.117 All mutations lead to an odd number of cysteine

20

residues within the affected EGF-like repeat, though it remains unclear precisely how this perturbs receptor function at a molecular level.31,117 Current studies suggest these mutations may be neomorphic, enabling mutant receptors to form higher order multimers.118 Certain point mutations also appear to be resistant to ERAD-mediated degradation, accumulating in the endoplasmic reticulum and potentially leading to cytotoxicity.31,119 Most studies of Notch3 function have been performed with mouse models. Notch3 knockout mouse models were highly susceptible to ischemic stroke upon challenge.120 Since this phenotype could be rescued by directing expression of wild type Notch3 in smooth muscle cells, a direct link was established between Notch3 function in blood vessels and susceptibility to ischemia.120 When brain-derived smooth muscle cells from Notch3 knockout mice were profiled, they possessed downregulation of genes implicated in muscle contraction and varying misregulation of genes involved in cell structure and mobility as well as genes involved in muscle and mesoderm development.120

Notch Pathway Components

DSL Ligands

Orthologs and Domain Organization

Most ligands in the Notch pathway are DSL (Delta and Serrate in flies, Lag-2 in worms) ligands, though additional proteins have been shown to function as non- canonical ligands for Notch receptors. In flies, DSL ligands consist of the two prototypes, Delta and Serrate.15 Mammals possess orthologs in the guise of three

21

Delta-like ligands (Dll 1, 3, 4) and two Serrate orthologs called Jagged 1-2.15,69 Worms are unique in that they have five DSL orthologs, including APX-1, LAG-2, ARG-2, and

DSL1-2 as well as three co-ligands--DOS1-3, OSM7, and OSM11.15 These worm co- ligands contain a specific domain absent from the four ligand orthologs and are predicted to function with the ligands to activate the pathway.15

DSL ligands in higher eukaryotes share a similar modular domain organization: the MNNL domain, the DSL domain, the EGF-like repeats, the transmembrane domain, and a cytoplasmic tail. (Figure 2A) The N-terminal MNNL domain is not present in worm orthologs.121 The DSL (Delta, Serrate, Lag-2) domain is involved in binding to the

Notch receptor while the EGF-like repeats function in regulation of ligand trafficking and turnover when ubiquitinated.69 Serrate orthologs as well as mammalian Dll1 are marked by the presence a DOS (Delta and OSM11-like proteins) domain, which encompasses the first two EGF-like repeats.121 These repeats have a unique secondary structure but still maintain the characteristic disulfide bond pattern found in classical EGF repeats.15 In concert with the DSL domain, the DOS domain is involved in binding to the Notch receptor.72 Serrate orthologs are further distinguished by an additional module, a cysteine-rich domain homologous to von Willebrand Factor C, located between the EGF-like repeats and the transmembrane domain.121 Based on their domain composition, DSL ligands can be classified into three categories:

DSL/DOS domain containing, DSL domain only, and DOS domain only (also referred to as DOS coligands).15 Ligands falling in the DSL/DOS domain category are Dll1 and

Jagged 1-2 in mammals as well as Delta and Serrate in flies.15 DSL domain only ligands consist of mammalian Dll3 and Dll4 in addition to worm APX-1, LAG-2, ARG-2,

22

and DSL1-7.15 DOS coligands, which possess only a DOS domain, encompass DOS1-

3, OSM7, and OSM11 in worms as well as DLK-1 and DLK-2/EGFL9 in mammals.15

Details on the regulation of ligand populations and on the interaction between DSL ligands and Notch receptors will be discussed below.

Notch Receptors

Orthologs and Domain Organization

The Notch pathway was named for a notched wing phenotype observed in flies lacking the Notch receptor. While there is a single ortholog of the receptor in flies

(Notch), there are two orthologs in worms (LIN-12, GLP-1) and four in mammals

(Notch1-4).15 Despite evolutionary duplication, Notch receptors share a similar modular domain organization. (Figure 2B) In the extracellular domain of the receptor, there are three domains: the EGF-like repeats, the LNR repeats, and the heterodimerization domain. Near the N-terminus are 29-36 epidermal growth factor (EGF)-like repeats responsible for ligand binding, with repeats 11 and 12 being essential for ligand binding.56,69,122 The number of these EGF repeats varies by species and receptor subtype, with fly and mammalian receptors possessing several more repeats than worm versions.121 Additionally, the EGF-like repeats are differentially modified by O-linked glycosyation, altering their ability to interact with ligand subtypes in flies and mammals.15,123 Three cysteine-rich LIN-12/Notch (LNR) repeats follow the EGF repeats and are unique to Notch receptors.124 The LNR repeats, along with the adjacent heterodimerization (HD) domain, constitute the negative regulatory region (NRR).2 The

NRR functions to protect the S2 cleavage site critical for activation of the Notch

23

Figure 2: Domain Schematics of Notch Signaling Components A) Domain schematic of the DSL ligand Jagged1, from N to C terminal, depicting domains shared among eukaryotic DSL ligands: the MNNL domain (yellow), the DSL domain (green), the EGF-like repeats (blue), the transmembrane domain (TM, purple), and a cytoplasmic tail (black bar with red stars). As the mammalian ortholog of Serrate, Jagged1 also possesses a DOS domain (dark blue) between the DSL domain and EGF- like repeats as well as a cysteine-rich domain (pink) between the EGF-like repeats and the transmembrane domain. B) Domain schematic of the mammalian Notch1 receptor, from N to C terminal. The extracellular domain of the receptor consists of the EGF-like repeats (blue), the LNR repeats (purple), and the heterodimerization domain (green). The number of EGF-like repeats varies by species and receptor subtype. Together, the LNR and heterodimerization domain are the negative regulatory region (NRR), which protects the S2 cleavage site from premature TACE metalloprotease cleavage. The three sites of cleavage are denoted as S1, S2, and S3. Also shown is the transmembrane domain (purple) and the domains of the intracellular domain of the receptor, which consists of the RAM domain (red), the ankyrin repeats domain (ANK, blue), two nuclear localization signals (NLS, yellow), and PEST domain (brown). C) CSL consists of three domains: the N-terminal domain (NTD, blue), the beta-trefoil domain (BTD, green), and the C-terminal domain (CTD, peach). The NTD and CTD are Rel homology regions, or RHR. D) Mastermind consists of three domains required for interaction with: the transcription factor CSL (dnMAM, red), the transcriptional coactivators CBP/p300 (yellow), and CDK8 kinase (CycC/CDK8, yellow).

24

receptor, thereby preventing premature receptor activation in the absence of ligand.85,125

To achieve this, the HD domain adopts a globular fold that makes extensive contacts with the three LNR repeats.2 The interactions formed by this structure sterically occlude the S2 site and provide global stabilization of the HD-LNR region.1-2 When mammalian

Notch receptors are cleaved at S1 during intracellular maturation, the HD domain is split so that the N-terminal portion ends the extracellular domain of the receptor while the C- terminal portion begins the transmembrane domain of the receptor.121

The transmembrane and intracellular portion of the Notch receptor encompasses six domains: the transmembrane segment, the RAM domain, the ANK repeats, two nuclear localization signals, and a PEST domain. The single-pass transmembrane domain is followed by the RBPJ-association module (RAM) domain and seven ankyrin

(ANK) repeats.121 The RAM and ANK domains are necessary and sufficient for interacting with CSL and, together with the three remaining domains, form the Notch intracellular domain (NICD).126-127 The NICD is released from the cell membrane following S3 cleavage by γ-secretase and translocates to the nucleus, where it interacts with CSL and an additional coactivator, Mastermind, to form the ternary complex.

Ultimately, the ternary complex initiates transcription of Notch target genes. The two

NLS domains direct the NICD to the nucleus while the PEST (proline, glutamic acid [E], serine, threonine-rich) domain is important for receptor turnover and, consequently, termination of pathway activation.16 NICD and the ternary complex will be discussed in further detail later in this chapter.

Regulating Expression of DSL Ligands and Notch Receptors

25

As Notch receptors are expressed on the cell surface, regulating their availability is a critical method of controlling pathway activation. This can be done by restricting the expression of receptors spatially and temporally, which is achieved through a combination of differential expression, proteolytic processing, post-translational modifications, and endocytic trafficking.15,69

While CSL is expressed in all cell types, the expression of DSL ligands and

Notch receptors subtypes can vary by cell type. This is particularly true in certain developmental contexts, such as inductive Notch signaling, when Notch signaling occurs between two populations of cells and establishes a boundary between the two groups.69

More frequently, alternative mechanisms affect the spatial and temporal expression of ligands and receptors to control pathway activation. Post-translational modifications of both proteins influence their intracellular processing and their turnover.

DSL ligands must be ubiquitinated by Neuralized or Mind bomb ring finger E3 ligases

(Neuralized1-2 and Mind bomb1-2 in mammals, Neuralized and Mind bomb1 in flies,

F10D7.5 in worms) to be activated.128-129 Cells lacking Neuralized or Mind bomb exhibit compromised trafficking of DSL ligands, with inactive ligands accumulating at the cell surface.130-131 Together, this data suggests DSL ligands are inactive until modified by

Neuralized or Mind bomb. Unmodified ligands are endocytosed and degraded.

Following ubiquitination, the ligands are endocytosed with the aid of Epsin, an ubiquitin- binding protein, and auxilin, a J-domain containing protein able to disassemble clathrin coats.132-133 Ligands are now competent to signal, though it remains uncertain whether

26

endocytosis itself activates the ligand or simply places the ligand into a specific location that renders it active.69

Notch receptors are also ubquitinated to regulate their trafficking and turnover.

Two different classes of E3 ligases modify Notch receptors to affect its trafficking, and it has been suggested their opposing activity serves to limit the amount of time the receptor is expressed at the cell surface.69 HECT domain E3 ligases (Itch and NEDD4 in mammals, Su(dx) and NEDD4 in flies, WWP-1 in worms) modify the RAM and ANK domains of NICD to target the receptor for degradation.134-135 The ring finger E3 ligase

Deltex (Deltex in both mammals and flies) modifies the ANK domain of NICD, promoting pathway activation and anatagonizing the activity of HECT domain E3 ligase Su(dx) in flies.136-137 Since Deltex also interacts with the β-arrestin Kurz, it may mediate internalization of the Notch receptor-Deltex complex to control the amount of receptor at the cell surface.138 However, in certain cell types, such as mammalian lymphoid cells and neurons, Deltex activity actually antagonizes pathway activation.139 The NICD of

Notch receptors is also ubiquitinated by nuclear F-box E3 ligases(FBW7/SEL10 in mammals, Archipelago in flies, SEL-10 in worms) while in the ternary complex to target the NICD for proteasomal degradation and ultimately terminate signaling.16-17

In addition to ubquitination, Notch receptors are modified with glycosylation and fucosylation of their EGF-like repeats. While in the endoplasmic reticulum, newly synthesized Notch receptors are first modified by O-fucosyltransferase, which is an essential modification for a functional receptor.140-141 Independent of its enzymatic function, O-fucosyltransferase also acts as a chaperone to promote foldi ng and transport of the receptor from the endoplasmic reticulum to the membrane.142 During

27

trafficking, the fucose moieties can be extended by the Fringe family of glycosyltransferases (Lunatic Fringe, Radical Fringe, and Manic Fringe in mammals;

Fringe in flies), which can modify multiple EGF-like repeats.143 Fringe-mediated modifications produce differentially glycosylated receptors that seem to alter the affinity of the receptor for different DSL ligands, thereby affecting the ability of a ligand to activate the pathway.69,123,143 As such, these glycosylations have been proposed to play a role in the stability and/or duration of ligand-receptor interactions, though the effects are subtle, since data regarding interaction of Notch2 with ligands offered contradicting results.121,144-145

Even without post-translational modifications directing it, trafficking of Notch receptors can determine their availability at the cell membrane. A well-characterized inhibitor of Notch signaling, Numb, acts during asymmetric cell division to reduce the number of Notch receptors on the cell surface and thereby shift the cell towards a signal-sending state. Numb recruits receptors into endocytic vesicles by interacting with both the Notch receptor and α-adaptin, a component of the AP2 complex that links cargos to clathrin coats of transport vesicles.146 Furthermore, mammalian Numb proteins promote ubiquitination of Notch receptors, increasing their endocytosis by the

HECT domain E3 ligases.147 However, fly Numb can also aid pathway activation when associated with the transmembrane protein Sanpodo, though the mechanism of this process remains unknown.148-149

Ligands and receptors of the Notch pathway also undergo proteolytic processing to regulate their activity, though the precise role of such events to ligands is unclear.150

All Notch receptor orthologs undergo three cleavage events during pathway activation.

28

(Figure 2B) Mammalian receptor orthologs experience an additional cleavage prior to expression on the cell surface. This unique first cleavage event occurs at site 1 (S1) in the trans-Golgi, where PC5/Furin cleaves the receptor into two subunits—the extracellular domain and the transmembrane domain.125,151 The two subunits associate through non-covalent interactions and are subsequently targeted to the cell surface.152

Proteolytic processing continues to all receptor orthologs when ligand binding to the receptor moves the NRR to expose site 2 (S2) for cleavage by the ADAM family of metalloproteases (ADAM10 and TACE/ADAM17 in mammals, Kuzbanian in flies, SUP-

17/Kuzbanian and ADM-4/TACE in worms).2,69,153 It has been proposed that receptor trans-endocytosis generates enough mechanical strain to move the NRR of the receptor.154 Others suggest additional events must occur to expose S2 and allow access to the metalloprotease, but more research will be necessary to clarify these events.15,121 Alternatively, in worms and in ligands with shorter EGF-like repeat domains, allostery has been proposed to be the force that reveals the S2 site.155 S2 cleavage also releases the extracellular domain from the receptor, allowing it to be endocytosed by the signal-sending cell. Following S2 cleavage, the remaining portion of the membrane-bound receptor is progressively cut at site 3 (S3), releasing the Notch intracellular domain (NICD).3-5,156 Drugs inhibiting presinilin, a member of the γ- secretase complex, have been employed in research and in clinical use to treat overactive Notch signaling.157 However, these drugs are not limited to affecting Notch signaling, as the γ-secretase complex is involved in a variety of other processes. In addition to off-target effects, these drugs exhibit extreme toxicity in the gut, limiting their clinical application.

29

DSL Ligand – Notch Receptor Interaction

Interaction between DSL ligands and Notch receptors can occur in cis and in trans, though it is currently unclear whether similar binding surfaces are involved in both types of interactions.158-160 During cis interaction, also known as cis inhibition, a ligand and a receptor on the same cell interact with each other, inhibiting both proteins from participating in active Notch signaling.158 In flies, the Abruptex region of the receptor—

EGF-like repeats 24-29—are required for this interaction, but data for mammalian proteins indicate other regions of the receptor are involved.161 For interaction in trans, a

DSL ligand interacts with a Notch receptor on an adjacent cell through EGF-like repeats

11-12 of the receptor.159-160 The interaction between EGF-like repeats 11-14 of the

Notch1 receptor and two parts of the ligand Dll1 (the DSL domain and EGF-like repeats

1-3) was estimated to have a disassociation constant of 130 μM.56 Precisely which ligand and receptor subtypes interact is influenced by the differential glycosylation and fucosylation pattern of each Notch receptor. For example, Serrate binds with higher affinity to Notch receptors that have been fucosylated and with lower affinity to receptors bearing additional glycosylations added by Fringe.162 Studies have also indicated receptor glycosylation patterns may determine pathway activation in a cell type specific manner, as Lunatic Fringe, a mammalian homolog of Fringe, potentiates Delta binding in in vitro studies but inhibits binding within the Delta-driven oscillatory somite clock in some species.163-164 A structure (PDB: 2VJ3) was recently solved of the human Notch1

EGF-like repeats 11-13 in complex with a portion of human Jagged 1, consisting of the

DSL domain and EGF-like repeats 1-3, which contain the DOS domain (EGF-like

30

repeats 1-2).72 In this structure, the proteins directly interact in a calcium-dependent manner as extended rods.72 The surface of Jagged1 involved in binding is associated with numerous diseases in humans, such as Tetrology of Fallot and Alagille syndrome, as well as with the autosomal dominant inner ear malformation phenotypes in mice, including headturner, slalom, and Nodder.15,165-166 Future work will be needed to elucidate the structure and binding energies of additional ligand-receptor pairs.

CSL

The transcription factor generically referred to as CSL sits at the center of a switch mechanism regulating transcription of Notch target genes. CSL proteins are highly conserved in metazoans, with all orthologs consisting of a conserved 420 amino acid core sequence and flanking N- and C-terminal sequences of varying lengths.121

These terminal extensions are predicted to have relatively little secondary structure and low compositional complexity and are not essential for interaction with Notch pathway coregulatory proteins.121 While most metazoans express a single CSL paralog (Su(H) in flies, Lag-1 in worms), vertebrates express two—RBPJ, which functions in canonical

Notch signaling in all tissues, and RBP-L (RBPJ-like), which contains the conserved

CSL core sequence but functions as a constitutive transcriptional activator independent of Notch signaling in the pancreas.167-168 When comparing primary of CSL orthologs, the most divergent members are from nematodes (C. elegans and C. briggsae) with approximately 54% sequence identity between worm and mammalian

CSL orthologs.121 There is approximately 75% sequence identity between fly and mammalian orthologs and >90% identity between zebrafish (Danio rerio) or frog

31

(Xenopus laevis) and mammalian orthologs.121 Only two amino acids are different between mouse and human CSL orthologs, demonstrating mammalian orthologs are nearly identical in primary sequence. Despite some differences in sequence, CSL orthologs have highly similar structures, indicating conservation of structural folds.8-

10,121,169

The core sequence of CSL consists of three distinct domains: the N-terminal domain (NTD), the beta-trefoil domain (BTD), and the C-terminal domain (CTD).8

(Figure 2C) Both the NTD and CTD are similar in structure to the Rel Homology Region

(RHR) family of transcription factors, earning their alternate names of RHR-N and RHR-

C, respectively.8,126 In other Rel family members, such as NFκB and NFAT, the N- terminal RHR is responsible for DNA binding specificity while the C-terminal RHR mediates interactions with other proteins, whether in heterodimerization or with an inhibitory protein.8 However, the insertion of the BTD between the two domains makes

CSL unique among the Rel family, bestowing on it the ability to bind DNA as a monomer.8 The BTD is a 12-stranded capped β-barrel structure, a fold initially observed in cytokine structures like fibroblast growth factor and interleukin-1.170

Overall, the structure of CSL is primarily β-strands with very little α-helical content.121 A long β-strand spans all three domains, serving to integrate them into a single fold.121

CSL & DNA Binding

CSL is a DNA binding protein and, as the sole transcription factor utilized by the

Notch pathway, it is essential to understand its molecular mechanisms for DNA recognition and interaction. Though CSL binding to an eight consensus

32

sequence (C/tGTGGGAA) has been well-characterized, CSL also interacts with a degenerate range of eight base pair sequences in vivo.171-174 Multiple studies determined the consensus sequence from studies in flies and mammals, which is present in the promoter of the Hes1 gene, a Notch target.171-172,174 (Figure 3A)

Identifying non-consensus binding sites in vivo, however, has presented a greater challenge since traditional methods possess a bias toward the sequence used as a reference point—in this case, the Hes1 consensus sequence. Recent approaches have reduced that bias, employing techniques such as genome wide ChIP-seq and custom protein binding microarrays to reveal a variety of DNA sequences that can be bound by

CSL.175-176 Some of these sequences differ greatly in which base(s) are preferred at each position of the binding site compared to the Hes1 consensus sequence and its believed preferences.

Structures have been solved for the worm, mouse, and human orthologs of

CSL bound to consensus DNA, including structures of the ternary complex (CSL-NICD-

Mastermind) or of repressor complexes (CSL-MINT, CSL-KyoT2, CSL-Hairless, CSL-

RITA).10,169,177 (Figure 3B, left) Overall, these structures possess a highly similar binding interface between CSL and DNA, whether co-regulators are bound or not. CSL induces two slight bends in the DNA it binds, though the bends compensate for each other to preserve an overall straight alignment.8 The DNA remains B-form, though there is local distortion as a positive propeller twist and significant buckling of the DNA due to perturbation of base steps between the guanines in positions 4-6 and their complementary base pair cytosines.8 Portions of the BTD, NTD, and the interdomain linker between them form a large electropositive surface on CSL that provides both

33

specific and non-specific contacts with DNA.8,121 The BTD inserts a loop into the minor groove of DNA to recognize the first half of the binding site sequence (C/tG in positions

1 and 2) while the NTD inserts a beta hairpin loop into the major groove of DNA to recognize the second half of the binding site sequence (GGGA in positions 4-7).121 The

CTD of CSL does not function in DNA binding.121 Two structures exist of CSL bound to a non-consensus site. One structure shows the mouse ortholog of CSL bound to a variation of the consensus sequence, CGTGTGAA, where position 5 is a thymine instead of a guanine.178 (Figure 3B, right) Overall, the structure appears similar to the

CSL-consensus DNA structures, since all specific and non-specific contacts between

CSL and DNA are maintained.178 The other structure, by Arnett et al., depicts two ternary complexes (CSL-NICD-Mastermind) in a head to head orientation, with the

Ankyrin repeat domains of NICD interacting with each other.179 The DNA in this structure is the SPS, or sequence paired site (a.k.a. Su(H) paired site), present in the promoter of the Hes1 gene in flies and mammals. The SPS architecture consists of a consensus binding site approximately 15-17 base pairs from a non-consensus binding site in a head-to-head orientation.180 (Figure 3A) When both sites are occupied, the

DNA bends and untwists to bring the two ternary complexes close enough for them to interact.179 Biochemical and reporter gene assays determined the ANK domain contacts are important for cooperative loading of the sites.179

The original high affinity reported (Kd ~ 1 nM) for the interaction between CSL and consensus DNA was calculated using the relatively qualitative method of Scatchard plot analysis of an EMSA.181 This led to a model depicting CSL statically bound to DNA

34

Figure 3: CSL Binds Two Distinct Sites in the Hes1 Target Gene A) The Hes1 gene possesses two CSL binding sites (orange) in its promoter in a head- to-head orientation. The first site, also called the consensus site, is the most thoroughly studied CSL binding site to date. A single nucleotide distinguishes the second site from the consensus site. In the second site, position five is a thymine (T, cyan) instead of a guanine (G). Together, these two sites form the SPS (sequence or Su(H) paired site) architecture. When ternary complexes occupy both sites in the SPS, the two complexes interact through contacts in their ANK domains and the DNA bends and untwists to facilitate this interaction. B) Structures of CSL in complex with each binding site have been solved and reveal highly similar structures, as all specific and non-specific contacts between NTD and BTD of CSL and DNA are maintained, even with the different nucleotide in position five. PDB: 3BRG adapted from Friedmann et al. 2008 and PDB: 3IAG from Friedmann et al. 2010.169,178

35

with co-regulatory proteins associating or disassociating depending on the pathway activation state. Later studies employed more quantitative methods, such as isothermal titration calorimetry, to determine the actual binding affinity was a hundred -fold weaker, at 150 nM.169 The affinity of CSL for consensus DNA, whether alone or in complex with

NICD, remains the same.169 As for CSL binding to the consensus variant site in the

Hes1 SPS, the affinity is two-fold weaker compared to the consensus site, with a Kd of

250 nM.178 With only a moderate affinity for DNA, it is likely the interaction between

CSL and DNA is more dynamic than originally thought and may rely on cooperative mechanisms to assemble and stabilize CSL complexes on the DNA.121

Additional research into the CSL-DNA interaction encompasses several topics.

The SPS architecture is present in the promoters of a few other Notch target genes, suggesting it may be a key point for regulating pathway activity.121 Other target genes possess multiple CSL binding sites in a variety of arrangements and it is unclear what the precise role may be for these differing architectures.180 Some CSL binding sites are adjacent to binding sites for other transcription factors, like GATA and PTF1a, which may facilitate proper loading and function through cooperative or synergistic mechanisms.167,180,182-183 There is also great interest in understanding the affect chromatin structure, particularly with respect to epigenetic marks, has on the ability of

CSL to recognize and bind DNA.184

Coactivators of Notch Signaling

NICD

36

Upon pathway activation, the NICD is generated by multiple cleavage events to the Notch receptor. The NICD consists of several modular domains, including the RAM domain, the ANK domain, the transactivation domain, and the PEST sequence. Only the RAM and ANK domains are necessary and sufficient for interacting with CSL, while the PEST sequence functions in NICD turnover.126-127 Prior to its interaction with CSL, the RAM domain is an unstructured random coil.126,185 It binds with high affinity (ranging from 1 µM to 30 nM, depending on ortholog and detection method) to the BTD, becoming structured upon interaction.169,186-187 For all four mammalian NICD paralogs, the affinity of the RAM domain for BTD is similar, but differences in affinity have been reported between orthologs.187 For example, the mammalian RBPJ-RAM interaction has a binding affinity 50-fold higher than the worm Lag1-RAM interaction.169 The difference in affinity is due entirely to the CSL ortholog, as evidenced by cross-species experiments with mouse and worm proteins.169 The RAM domain binds in a relatively linear fashion across the BTD and quantitative binding studies revealed four distinct motifs in the RAM domain contribute to its interaction.188 A hydrophobic tetrapeptide motif (ΦWΦP, where Φ is a hydrophobic residue) in the RAM domain interacts with a hydrophobic pocket on the BTD surface and is responsible for most of the binding energetics.188 Substantial energetic contributions also come from the five residue N- terminal basic region (~2 kcal/mol), an –HG- dipeptide motif (~1.6 kcal/mol), and a –GF- dipeptide motif (~0.6 kcal/mol).188 Current models in the field suggest binding of the

RAM domain displaces any corepressors bound to CSL, though several studies have made it apparent that this model cannot be true in all cases.

37

The ANK domain of NICD consists of seven iterative ankyrin repeats. Each ankyrin repeat is 33 residues in length, forming two α-helices connected by a short loop.189 Repeats are linked with a long loop terminating in a tight β-turn.189 Among the seven ankyrin repeats in fly NICD, there is modest sequence conservation (17% pairwise identity) whereas the conservation between ANK domains in NICD orthologs is much higher (70% pairwise identity).189 In the ANK domain of NICD, repeats one through six closely match the ankyrin repeat consensus sequence and the seventh C- terminal repeat does not.189 The seventh ankyrin repeat, however, adopts a regular ankyrin fold and was shown to be critical for the stability of the entire ANK domain.189

Surprisingly, structural analysis of the fly ANK domain revealed the first N-terminal ankyrin repeat was largely disordered, likely due to a 15 residue insert it possesses between its two α-helical segments.189 While the exact function of this insert remains unclear, it has been proposed to play a role in a functionally important disorder to order transition upon binding of effector proteins.189 The ANK domain primarily interacts with the CTD of CSL, though there are relatively minor contacts between the ANK domain and the NTD.10,190 As the interaction between the isolated ANK domain and CSL was barely detectable by isothermal titration calorimetry and a quantitative FRET assay, the interaction is considered to be extremely weak and therefore unlikely to contribute to

NICD binding to CSL.169,186 Due to the weak nature of this interaction, a model arose in the field suggesting high affinity binding of the RAM domain increased the local concentration of the ANK domain, facilitating its binding to the CTD.169,185 Recent work by Johnson et al. proved this model to be true, though it also revealed the bivalent interaction of the RAM and ANK domains effectively suppresses formation of

38

undesirable intermediates of Notch signaling (i.e. intermediates in the transition between repression and activation complexes), thus promoting a sharp switch from the repressed to the active state.191

At the C-terminus of NICD lies the transactivation domain containing two nuclear localization signals and a conserved PEST sequence important for NICD turnover. In fly NICD, an additional stretch of 31 glutatmine residues, called the OPA motif, is present between the two nuclear localization signals and the PEST domain.192 Kelly et al. speculated the OPA motif may play a role in NICD self-association, as a known inhibitor to glutamine-mediated self-association prevented the formation of NICD filaments over time.193 However, the OPA motif does not appear to be critical for ternary complex formation or transcriptional activation, rendering it less relevant to current studies. During the transcriptional activation process, CDK8 kinase phosphorylates residues in the PEST domain, targeting NICD for ubiquitination by E3 ligases Sel10/Fbw7 and subsequent proteasomal degradation.16-19 Once NICD is removed, the activation complex disassembles and the cell resets for another round of signaling.

Mastermind

A second coactivator, Mastermind, interacts with the dimeric CSL-NICD complex to form the ternary complex necessary for transcriptional activation of Notch target genes. Flies and worms possess a single Mastermind protein (MAM in flies, LAG-3 in worms) whereas mammals have three homologs, Mastermind-like 1-3 (MAML1-3).69

Mastermind proteins are approximately 1000 residues in length, though only the first 60

39

amino acids are required for binding to CSL-NICD.13,194 Apart from this region, there is little sequence homology between Mastermind orthologs.9-10

In the Notch pathway, Mastermind interacts as a bent, elongated α-helix to the dimeric CSL-NICD complex, with the N-terminal region of the helix binding in a continuous groove formed by the CTD-ANK domain interface and the C-terminal region of the helix associating with a β-sheet in the NTD of CSL.9-10 The long C-terminal tail of

Mastermind is rich in glutamines, allowing it to recruit and bind histone acetyltransferases, chromatin remodeling factors, and the Mediator complex.14,17 Also included in the list of Mastermind binding partners are the coactivators p300 and CBP

(CREB binding protein), which possess histone acetyltransferase activity and recruit basal transcription machinery as well as the CDK8 kinase involved in NICD degradation.14,16 (Figure 2D) Dominant-negative forms of Mastermind, in which the C- terminal tail is missing, are unable to recruit these factors and thus unable to promote transcription of Notch target genes.195 Mastermind also functions as a transcriptional coactivator in other signaling pathways, offering a potential hub for cross-talk.196-198

Corepressors of Notch Signaling

A number of corepressors have been identified as modulating the Notch pathway. In flies, the primary corepressor is Hairless, which is expressed in all fly tissues and functions as an adaptor to tether the more global repressors Groucho and

CtBP to CSL.20-22 More recently identified in flies is a neuron specific corepressor called

Insensitive, which promotes the neuron cell fate during sensory organ precursor specification in the peripheral nervous system.199 Several corepressors have been

40

identified in mammals, including SMRT, MINT/SHARP, KyoT2, RITA, SKIP, CIR,

ETO/MTG8, and MTG16.121 Orthologs of many mammalian corepressors have been identified in higher eukaryotes, such as zebrafish, but worms lack any known corepressors.25 While these corepressors are unrelated at the primary sequence level, they are predicted to function in a generally similar manner by linking CSL to HDAC machinery. Recent studies have shown other factors, such as KDM5A/LID histone demethylases as well as ASF1 and NAP1 histone chaperones, interact with CSL and function in target gene repression.200-202 Though the ternary complex that activates transcription has been thoroughly characterized in multiple species, only a few repression complexes have been investigated to date. This allows several questions to persist, including the essential issue of how, at the molecular level, the transition between repression and activation complexes occurs. In fact, it remains unclear whether multiple corepressors are simultaneously present or whether each corepressor forms a distinct complex, offering potential spatial and temporal separation.69 Since some corepressors have been reported to interact with each other, such as SMRT-SKIP and MINT/SHARP-SMRT, it is uncertain which corepressors directly interact with CSL and which corepressors are a component of multiprotein repression complexes assembled around CSL.203-204

Hairless is an antagonist of Notch signaling in flies of approximately 1000 residues.205-206 Only a small portion of its multidomain structure directly interacts with

CTD of CSL, with Hairless wedging itself into the CTD and causing major structural perturbations of the domain (Yuan and Kovall, unpublished data).207-208 (Figure 4A)

Other domains in Hairless recruit and interact with two additional repressors,

41

Groucho and CtBP, which in turn recruit HDACs to mediate repression of the target genes.20-22,209 Biochemical data revealed Hairless is able to compete with NICD for binding to CSL, and binding studies demonstrated Hairless binds with high affinity (1 nM) to CSL.210 However, it remains unclear how Hairless and NICD compete for binding as they do not share binding surfaces on CSL and no allosteric changes in CSL were noted between the structures of the activation and repression complexes with various orthologs (Yuan and Kovall, unpublished data).121,169 With no clear orthologs in mammals or worms, Hairless is a corepressor unique to flies, even in its binding to CSL

(Yuan and Kovall, unpublished data).211 The most studied corepressor in mammals is

MINT/SHARP, which has been structurally, biophysically, and functionally characterized. MINT/SHARP is a 6600 residue multidomain nuclear protein from the

SPEN (split ends) family of proteins.212 A 44 residue domain within MINT/SHARP interacts directly with the BTD and CTD of CSL, with a structure of the CSL-

MINT/SHARP complex revealing the portion of MINT/SHARP that interacts with the

BTD overlaps the binding surface used by the RAM domain of NICD (VanderWielen and

Kovall, unpublished data).213-215 (Figure 4B) While the structure suggests MINT/SHARP may directly compete with the RAM domain for binding to the BTD, biochemical data indicates it is not a simple direct competition, as the coactivator Mastermind is required to for MINT/SHARP to displace NICD from CSL (Friedmann and Kovall, unpublished data). Biophysical data from binding experiments supports this, showing the affinities of

NICD and MINT/SHARP for CSL are comparable at roughly 10 nM for either interaction.169,215 Functionally, MINT/SHARP antagonizes Notch signaling in B- and T- cells as well as in the kidney by interacting with a host of additional corepressors, such

42

Figure 4: Structures of Corepressor Complexes in the Notch Pathway Diagrams and cartoon representations of four different corepressors in complex with CSL and DNA. Throughout, the coloring of the diagram and the cartoon are consistent: NTD of CSL (cyan), BTD of CSL (green), and CTD of CSL (orange). A) The fly corepressor Hairless (purple) binds to the CTD of Su(H) (orange) (Yuan and Kovall, unpublished data). B) The mammalian corepressor MINT/SHARP (yellow) binds to both the BTD (green) and CTD (orange) of RBP-J (VanderWielen and Kovall, unpublished data). C) The mammalian corepressor KyoT2 (royal blue) binds to the BTD of RBP-J (green).177 D) The mammalian corepressor RITA (purple) binds to the BTD of RBP-J (green) (Tabaja and Kovall, unpublished data).

43

as CtIP, CtBP, ETO, SMRT, and NCor, indicating MINT/SHARP serves as a key organizational element of the repression complex.24,203,216

KyoT2 is another well-characterized mammalian corepressor of approximately

280 residues.217 KyoT2 is a splice variant of the KyoT gene and possesses only three domains: two conserved LIM domains and an RBPJ-interacting domain.217 It directly interacts with the BTD of CSL over much of the same binding surface used by the RAM domain of NICD.177 (Figure 4C) Additionally, the binding affinity of KyoT2 for the BTD is similar to that of the RAM domain for NICD (22 nM).169,177 RITA (RBP-J interacting and tubulin associated) is yet another mammalian corepressor.218 A recently solved structure shows RITA also binds to the BTD of CSL over a surface similar to the binding site of the RAM domain, much like KyoT2 does (Tabaja and Kovall, unpublished data).

(Figure 4D) Together, the structures and binding data suggest KyoT2 and RITA may directly compete with the RAM domain for binding to the BTD of CSL, demonstrating corepressors may necessitate different mechanisms of displacement from CSL in order to form the ternary complex. Much structural and biophysical work will be needed with other corepressors to address the long-standing question: how does CSL transition between repression and activation complexes?

The Ternary Complex

Following pathway activation, the Notch receptor is sequentially cleaved to generate the NICD, which translocates to the nucleus and binds directly to CSL present on the DNA of Notch target genes. Current models of ternary complex assembly propose the RAM domain of NICD binds with high affinity to the BTD of CSL, possibly

44

Figure 5: The Ternary Complex A) Model of ternary complex assembly based on structural and binding data of mouse and worm ternary complexes. This model shows that, following pathway activation, the NICD translocates to the nucleus where the RAM domain (red) of NICD makes a high affinity interaction with the BTD of CSL (green). The ANK domain (yellow) of NICD does not appreciably interact with CSL until Mastermind (MAM, grey) is present to hold it in place on the CTD of CSL (orange). B)Diagram (left) and cartoon (right) representation of the structure of the ternary complex in the Notch pathway, with coloring consistent between the two images (PDB: 2FO1).10 The RAM domain (red) of NICD binds as an elongated peptide chain across the BTD of CSL (green) while the ANK domain (yellow) interacts primarily with the CTD of CSL (orange). Mastermind (MAM or MM, grey) binds as a long, kinked helix along a composite surface created by the CTD-ANK domain interface and the NTD of CSL (blue).

45

promoting the displacement of any corepressors bound to CSL.169 (Figure 5A) The

ANK domain of NICD binds weakly to the CTD of CSL and is locked into place upon binding of Mastermind to the CSL-NICD complex.169 It is these three proteins—CSL,

NICD, and Mastermind—that constitute the ternary complex necessary for transcriptional activation of Notch target genes.

Structures of the worm and human ternary complexes have been solved (PDB:

2FO1 and PDB: 2F8X, respectively), revealing the RAM domain of NICD interacts solely with the BTD of CSL and the ANK domain binds primarily to the CTD of CSL.9-10

(Figure 5B) Mastermind binds as a long, kinked helix along the CTD-ANK domain interface and onto the NTD.9-10 Though the overall architecture of the orthologous complexes is similar, there are notable differences between them.219-220 First, the human structure does not contain the RAM domain of NICD, supporting previous work showing the ANK domain can form a complex with CSL and Mastermind.126,221-223

Complexes of CSL-ANK domain-Mastermind are able to activate transcription, though less efficiently than when both domains of NICD are present.224 While recent research from Johnson et al. confirmed this, they also demonstrated addition of the RAM domain in trans increases transcriptional activation, suggesting the RAM domain may play a role in displacing at least some corepressors from CSL.191 Interestingly, structures of mouse CSL-NICD-Mastermind and CSL-DNA (PDB: 2FO1 and PDB: 3IAG, respectively) showed small structural changes in the BTD and CTD upon binding of the

RAM domain.10,178 These alterations are not believed to be allosteric in nature, indicating there may be species specific differences in ternary complex assembly.169

Second, the worm structure reveals a generally more compact terna ry complex, with the

46

BTD and CTD closer to each other when compared to isolated CSL-DNA complexes or the human structure.121 Third, distinct differences in side chain interactions between the

CTD-ANK domain interface are apparent.220 Structures of the human ternary complex containing the RAM domain (PDB: 2F8X), as well as other CSL-RAM domain structures

(PDB: 3BRD), indicate it is not binding of the RAM domain that accounts for these differential contacts.9,169 It remains unclear whether these differences are functionally relevant, organism specific, or an artifact of crystallization.121 Additional structures, particularly those from other model organisms, are expected to resolve this dilemma.

Currently, no structure of the fly ternary complex exists, though, based on sequence identity and the conservation of tertiary structure between worm and mammalian proteins, it is expected the fly ternary complex would be highly similar.

Rationale for My Research

As the center of a transcriptional switch, CSL interacts with coactivators or corepressors to differentially regulate target gene transcription. Elucidating how this switch operates is central to understanding the complex role of the Notch pathway within the cell. To this end, the characterization of mammalian and worm ternary complexes as well as select mammalian repression complexes offers initial insight into the transcriptional switch. However, it is still unclear how the transition between activation and repression occurs at a molecular level.

My research is focused on characterizing the structural, biochemical, and biophysical interactions of the activation and repression complexes in the fly. Currently, there is no structural data available for fly orthologs, except for a single structure of the

47

fly ANK domain of NICD (PDB: 10T8) in isolation.189 There is also no data quantifying the thermodynamic parameters of binding between fly CSL and its coactivators (NICD,

Mastermind) or corepressors (Hairless, Insensitive). Obtaining this structural and biophysical data will provide a more comprehensive model of ternary complex assembly as well as additional insight into the transition between activation and repression states.

Data on the fly complexes will also offer species specific details that may serve to link evolutionary changes in Notch complexes from worms to mammals. Furthermore, elucidating the structure and binding energetics of the fly complexes will facilitate the development of novel small molecule therapeutics that can be used to treat aberrant

Notch signaling implicated in so many human diseases. The fly corepressor complex containing Hairless is of particular interest in this area as Hairless has been shown to interact solely with the CTD of CSL.210 This utterly unique binding mode of Hairless has the potential to be exploited by small molecule therapeutics to repress or at least dampen overactive Notch signaling in a novel manner.

48

Chapter 2: Thermodynamic Binding Analysis of Notch Transcription Complexes from D. melanogaster

To be published as:

Contreras, A.N., Yuan, Z., and Kovall, R.A. (Expected Publication – 2015) “Thermodynamic Binding Analysis of Notch Transcription Complexes from D. melanogaster”.

49

Thermodynamic binding analysis of Notch transcription complexes from D. melanogaster

Ashley N. Contrerasa, Zhenyu Yuana, and Rhett A. Kovalla* aDepartment of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati, Cincinnati, OH 45267, USA

*Correspondence: [email protected], 513-558-4631 (phone), 513-558-8474 (fax)

Running Title: Characterization of fly CSL-NICD-MAM ternary complex

Abstract

Notch is an intercellular signaling pathway that is highly conserved in metazoans and is essential for proper cellular specification during development and in the adult organism.

Misregulated Notch signaling underlies or contributes to the pathogenesis of many human diseases, most notably cancer. Signaling through the Notch pathway ultimately results in changes in gene expression, which is regulated by the transcription factor

CSL. Upon pathway activation, CSL forms a ternary complex with the intracellular domain of the Notch receptor (NICD) and the transcriptional coactivator Mastermind

(MAM) that activates transcription from Notch target genes. While detailed in vitro studies have been conducted with mammalian and worm orthologous proteins, less is known regarding the molecular details of the Notch ternary complex in Drosophila.

Here we thermodynamically characterize the assembly of the fly ternary complex using isothermal titration calorimetry. Our data reveals striking differences in the way the

RAM (RBP-J associated molecule) and ANK (ankyrin) domains of NICD interact with

CSL that is specific to the fly. Additional analysis using cross-species experiments

50

suggest that these differences are primarily due to fly CSL, while experiments using point mutants show that the interface between fly CSL and ANK is not identical to the mammalian or worm interface. Finally, we show that the binding of the fly RAM domain to CSL does not affect interactions of the corepressor Hairless with CSL. Taken together, our data suggests species-specific differences in ternary complex assembly that may be significant in understanding how CSL regulates transcription in different organisms.

Introduction

From the model organisms D. melanogaster and C. elegans to more complex metazoans, such as mammals, the highly conserved Notch pathway serves as a cell-to- cell communication mechanism to regulate the transcription of numerous target genes.225 Genes controlled by the Notch pathway play a critical role in cell fate specification, thereby making the pathway essential for a number of developmental and homeostatic processes, including embryogenesis, organogenesis, hematopoiesis, and stem cell maintenance.226-228 Emphasizing its important and highly pleiotropic role in multicellular organisms is the fact that aberrant Notch signaling has been implicated in a wide variety of diseases, including cerebrovascular disease, as well as a diverse array of cancers and developmental disorders.31,226,229

Genetic studies in flies and worms identified the central components of Notch signaling, which consist of the receptor Notch, the ligand DSL (Delta, Serrate, Lag-2), and the nuclear effector CSL (CBF1/RBP-J, Su(H), Lag-1).69,225 Notch pathway activation occurs when a DSL ligand on a signal-sending cell interacts with the Notch

51

receptor on an adjacent signal-receiving cell.15 This interaction triggers proteolytic cleavage of the Notch receptor, generating the NICD (Notch intracellular domain), which translocates to the nucleus and interacts with the DNA binding transcription factor CSL.

A third protein, Mastermind (MAM), also binds to the complex, forming the ternary complex (CSL-NICD-MAM) necessary for transcriptional activation of target genes regulated by the pathway. In the absence of an activating signal, the Notch pathway also functions to repress the transcription of some, but not all, target genes.230-231 This is achieved when a corepressor protein, such as Hairless211, interacts with CSL present on the DNA of a Notch target gene. Corepressors mediate interactions with histone remodeling complexes, e.g. histone deacetylase and methyltransferase, which convert the local chromatin to a repressive environment.230 The ability of CSL to differentially regulate gene expression is determined by its interaction with coregulatory proteins

(coactivators or corepressors), placing CSL at the center of a transcriptional switch

(Figure 1A).

As shown in Figure 1B, CSL is a DNA binding protein consisting of three domains — the N-terminal domain (NTD), the beta-trefoil domain (BTD), and the C- terminal domain (CTD).8,121 The BTD and NTD make both specific and non-specific contacts to DNA, allowing CSL to bind DNA sequences present in genes regulated by the Notch pathway.8 Two domains of NICD mediate its interaction with CSL: the RAM

(RBP-J associated molecule) and ANK (ankyrin) domains.127,232 RAM binds solely to the

BTD of CSL, whereas ANK binds the CTD and NTD of CSL.9-10 The third protein of the

CSL-NICD-MAM ternary complex, Mastermind, binds as a long α-helix with a distinctive bend, allowing it to make contacts with ANK as well as the CTD and NTD of CSL.9-10

52

Figure 1. Overview of CSL-mediated transcription regulation. (A) Model of CSL functioning as a transcriptional switch. Left, pathway inactivity allows corepressors (CoR, magenta) to interact with CSL present on DNA in the regulatory regions of target genes, and thereby repress gene transcription. Right, when the pathway is active, the corepressor complex is exchanged for two coactivators, Notch intracellular domain (NICD, red and yellow) and Mastermind (Mam, grey) to activate transcription from Notch target genes. (B) Ribbon diagram (left) and domain schematics (right) of the CSL-NICD-MAM ternary complex bound to DNA.10 Coloring is consistent in both images. CSL consists of three domains—NTD (cyan), BTD (green), and CTD (orange). A beta-strand that bridges all three domains of CSL is colored magenta. The NTD and BTD of CSL make contacts with the DNA (grey). The RAM domain (red) of NICD interacts solely with the BTD of CSL while the ANK domain (yellow) interacts with both the NTD and CTD of CSL. Mastermind (grey) binds as a long helix across a composite surface created by the ANK domain bound to the NTD and CTD of CSL. (C) Model of ternary complex assembly.121 According to this model, the RAM domain (red) of NICD binds to the BTD of CSL (green) in a high affinity interaction. The ANK domain (yellow) of NICD does not bind CSL until the second coactivator, MAM (grey), is present, locking the complex into an active conformation. 53

Detailed biochemical and biophysical studies have defined a step-wise assembly mechanism for the CSL-NICD-MAM ternary complex (Figure 1C).121,126 These studies showed that RAM forms a high affinity interaction with the BTD of CSL, initiating complex formation between CSL and NICD.169,186-187 These studies also showed that isolated constructs of ANK or MAM do not appreciably interact with CSL; conversely, when ANK and MAM are both present, formation of the CSL-NICD-MAM ternary complex occurs.169,186-187 It should be mentioned that these binding studies were performed with mammalian (human and mouse) and C. elegans proteins, and given the high degree of sequence conservation between orthologous Notch proteins, it has been assumed that the assembly mechanism of the CSL-NICD-MAM ternary complex is conserved for all organisms.

However, previous studies from our group using Notch proteins from D. melanogaster have compelled us to re-examine this assumption. In these studies, we demonstrated that the corepressor Hairless binds exclusively to the CTD of Su(H) (the fly ortholog of CSL).210 We also showed using EMSA that NICD (RAMANK) from

Drosophila could efficiently displace Hairless from CSL in the absence of MAM.210

Given that previous studies demonstrated ANK does not interact with the CTD of CSL unless MAM is present, this suggested two possible mechanisms: one, RAM binding to the BTD induces a dramatic long-range conformational change in the CTD, which inhibits Hairless binding; and/or two, unlike the mammalian or worm ANK domain, the fly ANK domain interacts with the CTD of CSL in the absence of MAM and therefore can compete with Hairless for binding Su(H).

54

In order to address these two possible mechanisms, we used isothermal titration calorimetry to describe the binding interactions between Drosophila NICD and Su(H).

Unexpectedly, we show that the ANK domain of Drosophila NICD is able to bind to

Su(H) in the absence of MAM, which does not occur with the mammalian or worm orthologous proteins. To determine the molecular basis of this difference, we conducted a series of cross-species binding experiments using Drosophila and mammalian Notch proteins that suggest Su(H) is the primary factor that mediates this phenomenon.

Additionally, point mutations were introduced into Su(H) and Drosophila NICD, based on the CSL-NICD-MAM X-ray structures, to disrupt the CTD-ANK interface; however, these mutations do not appreciably affect binding, suggesting that in the absence of

MAM the molecular interactions of the Drosophila CSL-NICD complex are distinctly different from those observed in the CSL-NICD-MAM ternary complex structures.9-10

Moreover, EMSA and ITC studies demonstrate that RAM binding does not affect

Hairless interactions with the CTD of CSL. Taken together, our data define the assembly mechanism for Notch transcription complexes from D. melanogaster, which suggests that assembly is not strictly conserved in all metazoans and that a novel set of molecular interactions underlie CTD-ANK complex formation in flies.

Results

Analysis of Su(H) – NICD interactions

To define the thermodynamic binding parameters that underlie complexes formed between Su(H) and the Notch intracellular domain from Drosophila, we used ITC with highly purified preparations of recombinant Su(H) and NICD from bacteria. As shown in

55

Table 1 and Figure 2A, a construct corresponding to the RAM and ANK domains of

Drosophila NICD (dRAMANK) binds Su(H) with 60 nM affinity. This is slightly weaker than the affinity we previously measured between mouse CSL and NICD proteins (K d ~

20 nM) and stronger than the binding we measured between the C. elegans orthologous

169 proteins (Kd ~ 3 μM) under identical conditions. For the mouse and worm NICD proteins, RAM contributes entirely to the observed binding to CSL; however, when we examined the individual contributions of the RAM and ANK domains of Drosophila NICD to Su(H) binding, we saw a distinct difference from the mouse and worm orthologs. The binding affinity between Su(H) and dRAM is 345 nM and the binding affinity between

Su(H) and dANK is 668 nM (Figure 2C,B). We also analyzed the binding between dANK and a construct that corresponds to the CTD of Su(H) (dCTD, Figure 2D). In this case, we observed weaker binding between dCTD and dANK (Kd 21 μM) than between Su(H) and dANK (Kd 668 nM). Remarkably, these data suggest that the fly Notch proteins are behaving in a much different manner than the previously characterized worm and mammalian proteins.169,186-187

Cross-species binding studies of CSL-NICD interactions

To define where the difference(s) in CSL-NICD binding resides in flies, we performed cross-species ITC experiments with Notch components from mouse and Drosophila

(Table 2 and Figures 3 & 4). In the initial set of experiments (Figure 3), we assessed the interaction between RBP-J (mouse CSL) and NICD from Drosophila (dRAMANK). The binding affinity between RBP-J and dRAMANK is 50 nM (Figure 3A), which is identical to the binding observed between Su(H) and dRAMANK (Kd 60 nM) within error. Similar

56

Table 1: Calorimetric data for the binding of Drosophila NICD to Su(H).

-1 G H -TS Cell Syringe K (M ) Kd (M) (kcal/mol) (kcal/mol) (kcal/mol)

7 dRAMANK Su(H) 1.9 ± 0.9 x 10 0.060 -9.9 ± 0.2 -23.8 ± 0.5 13.9 ± 0.5

6 Su(H) dRAM 3.0 ± 0.7 x 10 0.345 -8.8 ± 0.1 -17.4 ± 0.7 8.6 ± 0.9

6 Su(H) dANK 1.5 ± 0.4 x 10 0.668 -8.4 ± 0.1 -6.1 ± 0.4 -2.2 ± 0.5

4.7 ± 0.4 x 104 20.9 -6.3 ± 0.05 -9.3 ± 1.0 2.9 ± 1.0 dCTD dANK

All experiments were performed at 25°C. Values are the mean of at least three independent experiments and errors represent the standard deviation of multiple experiments.

57

Figure 2. Thermodynamic binding analysis of Notch proteins from Drosophila. Figure shows representative thermograms (raw heat signal and nonlinear least squares fit to the integrated data) for Su(H) binding Drosophila NICD. Each experiment was performed at 25°C, with forty titrations of 7μL injections spaced 120 seconds apart. The experimentally determined dissociation constant (Kd) is shown for each experiment. (A) Su(H) binding dRAMANK. (B) Su(H) binding dANK. (C) Su(H) binding dRAM. (D) CTD of Su(H) binding dANK. 58

Table 2: Calorimetric data for NICD – CSL binding between mouse and Drosophila components.

-1 G H -TS Cell Syringe K (M ) Kd (M) (kcal/mol) (kcal/mol) (kcal/mol)

6 mRAMANK Su(H) 4.9 ± 0.8 x 10 0.206 -9.1 ± 0.09 -17.7 ± 0.8 8.6 ± 0.8

6 D mRAM Su(H) 2.5 ± 1.1 x 10 0.437 -8.7 ± 0.2 -14.6 ± 0.7 5.9 ± 0.9

NBD NBD NBD NBD NBD mNIC Su(H) mANK

NBD NBD NBD NBD NBD

Su(H) mANK Su(H) + Su(H) NBD NBD NBD NBD NBD dCTD mANK

2.8 ± 1.6 x 107 0.050 -10.0 ± 0.4 -16.6 ± 0.5 6.5 ± 0.9 dRAMANK RBP-J

RBP-J 2.9 ± 0.5 x 107 0.040 -10.1 ± 0.1 -14.9 ± 0.2 4.7 ± 0.3

dRAM

dANK RBP-J NBD NBD NBD NBD NBD

J J dNICD + dANK NBD NBD NBD NBD NBD - RBPJ

RBP mANK mCTD NBD NBD NBD NBD NBD

mCTD dANK NBD NBD NBD NBD NBD

All experiments were performed at 25°C. Values are the mean of at least three independent experiments and errors represent the standard deviation of multiple experiments. NBD, no binding detected.

59

Figure 3. Cross-species binding experiments (RBP-J + dNICD). Figure shows representative thermograms (raw heat signal and nonlinear least squares fit to the integrated data) for RBP-J binding Drosophila NICD. Each experiment was performed at 25°C, with forty titrations of 7μL injections spaced 120 seconds apart. The experimentally determined dissociation constant (Kd) is shown for each experiment. (A) RBP-J binding dRAMANK. (B) RBP-J binding dRAM. (C) RBP-J does not bind dANK. (D) CTD of RBP-J does not bind dANK. NBD, no binding detected. 60

Figure 4. Cross-species binding experiments (Su(H) + mNICD). Figure shows representative thermograms (raw heat signal and nonlinear least squares fit to the integrated data) for Su(H) binding mouse NICD. Each experiment was performed at 25°C, with forty titrations of 7μL injections spaced 120 seconds apart. The experimentally determined dissociation constant (Kd) is shown for each experiment. (A) Su(H) binding mRAMANK. (B) Su(H) binding mRAM. (C) Su(H) does not bind mANK. (D) CTD of Su(H) does not bind mANK. NBD, no binding detected. 61

to previous binding studies of the mammalian Notch proteins169,186-187, RBP-J bound dRAM with 40 nM affinity (Figure 3B), suggesting that dANK does not interact with the

CTD of RBP-J.169 We confirmed this by measuring the binding between (1) RBP-J and dANK and (2) the CTD of RBP-J (mCTD) and dANK, and in both cases we could not detect any binding by ITC (Table 2 and Figure 3C,D). It should also be mentioned that we measured the binding between mCTD and the ANK domain of mouse NICD

(mANK), since it had not been measured previously, and as expected, we saw no interaction (Table 2). Taken together, these data suggest that the binding profile of

RBP-J with dRAMANK resembles the binding profile for the mouse orthologous proteins.

The second set of cross-species ITC experiments assessed the interaction between Su(H) and the mouse NICD (mRAMANK) (Figure 4). With an affinity of 206 nM

(Figure 4A), the interaction between Su(H) and mRAMANK is approximately three-fold weaker than the Su(H)-dRAMANK complex and approximately ten-fold weaker than the

RBPJ-mRAMANK complex.169 The interaction of Su(H) and mRAM yielded an affinity of

437 nM (Figure 4B), which is similar to the affinity for Su(H)-mRAMANK (Kd 206 nM), suggesting that the mouse ANK domain (mANK) does not interact with Su(H). We confirmed this by binding experiments with Su(H) and mANK as well as dCTD and mANK, which in both cases displayed no observable binding by ITC (Figure 4C,D).

From these experiments, we conclude that the difference in CSL-NICD binding between mouse and fly proteins likely lies primarily with Su(H) and not dNICD.

Binding analysis of Su(H) – dRAMANK point mutations

62

Using the CSL-NICD-MAM ternary complex structures as a guide9-10, point mutations were made to Su(H) and to dRAMANK targeting the CTD-ANK interface.10 These mutations focused on a conserved Glu-Arg ion pair buried at the CTD-ANK interface previously shown to have a deleterious effect on complex formation and transcription when mutations produce like charges.186,220,233 We first tested each point mutant with a wild-type partner. As shown in Table 3, the combinations of Su(H) with dRAMANKR1985E or dRAMANKR2027E showed only small differences in affinity, but were not statistically significant difference in binding when compared to wild-type Su(H) with wild-type dRAMANK. Similarly, the combination of Su(H)E446R with dRAMANK or dRAMANKR1985E or dRAMANK R2027E also showed no statistically significant difference in binding affinity when compared to wild-type Su(H) with wild-type dRAMANK (Table 3). Altogether, analysis of the point mutant binding experiments suggest that the contacts at the interface between the CTD of Su(H) and the ANK domain of fly NICD are different from ones observed in the mammalian and worm CSL-NICD-MAM ternary complex structures.

Characterizing the effect RAM binding has on Su(H)-Hairless interactions

Previous work from our lab using Notch proteins from worm and mammals demonstrated that RAM binding to the BTD of CSL promotes ternary complex formation by inducing a distal conformational change in the NTD of CSL, thereby creating a binding site for MAM.169 In other work, we showed that the corepressor Hairless binds exclusively to the CTD of Su(H); however, in EMSA experiments dRAMANK was able to efficiently displace Hairless from Su(H).210 To determine whether RAM binding affects

63

Table 3: Calorimetric data for Su(H) and dNICD point mutants.

Kd G H -TS G Cell Syringe K (M-1) (M) (kcal/mol) (kcal/mol) (kcal/mol) (kcal/mol) dRAMANKR1985E Su(H) 5.8 ± 1.8 x 106 0.180 -9.2 ± 0.1 -18.3 ± 1.5 9.1 ± 1.6 0.7

1.5 ± 0.2 x 107 0.068 -9.7 ± 0.1 -28.3 ± 0.7 18.5 ± 0.6 0.2 dRAMANKR2027E Su(H) Su(H)E446R 6.4 ± 2.1 x 106 0.170 -9.2 ± 0.2 -18.5 ± 1.4 9.2 ± 1.4 0.7 dRAMANK

E446R 6 dRAMANKR1985E Su(H) 6.3 ± 1.9 x 10 0.167 -9.2 ± 0.1 -21.6 ± 0.6 12.3 ± 0.8 0.7

E446R 8.9 ± 2.4 x 106 0.118 -9.4 ± 0.1 -20.0 ± 0.7 10.6 ± 0.9 0.5 dRAMANKR2027E Su(H)

All experiments were performed at 25°C. Values are the mean of at least three independent experiments and errors represent the standard deviation of multiple experiments.

Table 4: Calorimetric data for competition ITC between Su(H)-RAM and Hairless.

-1 G H -TS Cell Syringe K (M ) Kd (M) (kcal/mol) (kcal/mol) (kcal/mol)

8 Su(H) Hairless 9.1 ± 2.4 x 10 0.001 -12.2 ± 0.2 -16.1 ± 0.7 3.9 ± 0.8

Su(H) - RAM Hairless 7.8 ± 4.3 x 108 0.001 -12.0 ± 0.3 -9.5 ± 0.7 -2.5 ± 1.0

All experiments were performed at 25°C. Values are the mean of at least three independent experiments and errors represent the standard deviation of multiple experiments. Values for Su(H)- Hairless were taken from our publication Maier et al. 2011.

64

Hairless interactions with the CTD of Su(H), we employed two methods: competition

ITC and EMSA. For our competition ITC binding experiments, we pre-formed complexes of Su(H) and RAM before titrating with Hairless. The experiment yielded an affinity of 1 nM, which is essentially identical to the affinity we previously measured for

Su(H) and Hairless (Table 4).210 For the EMSAs, we pre-formed complexes of Su(H) and Hairless before adding either RAM, ANK, and MAM or ANK and MAM (Figure 5).

Both gels showed nearly identical results — the Su(H)-Hairless complexes persisted, as

ANK and MAM (Figure 5A) or RAM, ANK, and MAM (Figure 5B) were very ineffective at displacing Hairless from Su(H). Taken together, these results suggest that RAM binding to Su(H) does not affect Su(H)-Hairless interactions.

Discussion

Canonical Notch signaling ultimately results in changes in gene expression, which is regulated by the DNA binding transcription factor CSL.15,69,234 Upon pathway activation,

CSL forms a ternary complex with the intracellular domain of the Notch receptor (NICD) and the transcriptional coactivator Mastermind (MAM) to activate gene expression from

Notch targets.121 CSL also interacts with corepressors, such as Hairless, to repress transcription from some, but not all, Notch responsive genes.211,231 Both the mechanism of signal transduction and the individual components of the Notch pathway are highly conserved, e.g. fly and mouse CSL proteins share ~78% sequence identity within their structural core.121 Previously, extensive structural, biophysical, and biochemical/cellular studies were performed on Notch proteins, primarily from mammals and worms, resulting in a detailed model of CSL-NICD-MAM ternary complex formation.121 Given

65

Figure 5. Characterizing the effect RAM has on Su(H)-Hairless interactions. Figure shows representative EMSAs in which ANK + MAM (A) or RAM + ANK + MAM (B) compete for binding to the preformed Su(H)-Hairless-DNA complex. The control lanes (1-5) for both EMSAs contain Su(H)-DNA, Su(H)-dRAMANK-DNA, Su(H)-dRAMANK- MAM-DNA, Su(H)-Hairless-DNA, and Su(H)-dANK-MAM-DNA, respectively. ANK was added in increasing amounts (lanes 6-10) either with (B) or without (A) RAM. In both cases, ANK or RAM + ANK compete poorly for the Su(H)-Hairless complex.

66

the high degree of conservation between orthologous components, it has been widely assumed that the assembly mechanism of the ternary complex would also be strictly conserved between organisms. However, previous studies from our group prompted us to reassess whether this assumption held true for Notch proteins from Drosophila.210

A hallmark of the assembly mechanism is that RAM forms a high affinity

169,186-187 interaction with the BTD of CSL (for the mouse proteins Kd ~ 20 nM). This serves to tether ANK to CSL, greatly increasing its local concentration for subsequent interactions with the CTD of CSL and MAM (Figure 1C).185 Despite this dramatic increase in local concentration, in the absence of MAM, ANK has no appreciable interactions with CSL.169,186-187,215 Here, we show that the fly proteins behave quite differently. In this case, both RAM and ANK bind to Su(H) (fly CSL) with sub-micromolar affinity (Table 1 and Figure 2). Interestingly, yeast two-hybrid studies performed 20 years ago also observed significant interactions between Su(H) and the isolated ANK domain of NICD.127,232 Additionally, two other points are worth mentioning: one, while

ANK also binds the isolated CTD of Su(H), it does so with thirty-fold less affinity. This may be due to the interactions ANK makes with the NTD of CSL, as observed in the

CSL-NICD-MAM-DNA X-ray structures9-10, as well as an entropic penalty that may result from folding coupled to binding for the isolated CTD construct. And two, due to the chelate effect, the Gibbs free energy of binding (G) for RAMANK interacting with

Su(H) is greater than it is for the isolated constructs of RAM or ANK, but the free energies are not strictly additive, which is commonly seen for small molecules binding to macromolecules.235 This may be due to the ~55 Å distance between where RAM binds the BTD and ANK binds the CTD of CSL.

67

Given this striking difference in the binding interactions between fly and mammalian Notch proteins, we sought to identify the molecular basis for this observation. As there are no major sequence differences between mammalian and fly orthologs of CSL and NICD, in particular at the CTD-ANK interface, there is no obvious reason as to why dANK binds CTD, whereas mANK does not. In an effort to discern which component, dANK or Su(H), is largely responsible for this effect, we performed cross-species ITC experiments using mouse and fly Notch proteins. These studies convincingly showed that RBP-J interacts with dRAMANK in a very similar manner as it does with mRAMANK, i.e. both mouse and fly RAM form a high affinity interaction with the BTD of RBP-J, and neither dANK nor mANK interact with the CTD of RBP-J.

However, the results of the cross-species experiments with Su(H) and mRAMANK were not as clear-cut. In this case, Su(H) bound both mRAM and mRAMANK with roughly similar affinities, as the two-fold difference in binding was not statistically significant.

Consistent with this, mANK did not bind Su(H). However, mRAMANK bound Su(H) with three-fold less affinity than dRAMANK, which was statistically significant and comparable to the affinity between dRAM and Su(H). Taken together, these data seem to suggest that Su(H) is the factor playing the largest role in the difference between mammalian and fly Notch proteins. Future binding studies will focus on the approximately 30 residues different between the CTDs of mouse and fly to better understand how these changes allow Su(H) to bind ANK.

To further scrutinize Su(H)-dRAMANK interactions, we designed point mutations based on the CSL-NICD-MAM-DNA X-ray structures that focused on a Glu-Arg salt bridge buried at the CTD-ANK interface.9-10,220 We inferred that reversing the charge of

68

one of these residues should have a dramatic effect on Su(H)-dRAMANK complex formation, as it had in previous studies.186,233 In addition, we tested the binding of both dRAMANKR1985E and dRAMANKR2027E, which correspond to the arginines observed in the worm and human X-ray structures, respectively, that would pair with Glu446 on

Su(H), as well as the Su(H)E446R mutant.9-10,220 Interestingly, none of these mutants had a dramatic effect on binding (Table 3), which suggest that R1985 and R2027 on dANK and Glu446 on the CTD of Su(H) are only playing minor roles in dCTD-dANK complex formation. This also suggests that the molecular contacts at the dCTD-dANK interface must be different in some manner than what was observed in the human and worm structures when MAM was bound to the complex. Potentially, the dCTD-dANK interaction may involve residues in NICD and Su(H) that were previously unappreciated for their role in complex formation. Certainly, future mutagenesis studies of dCTD-dANK will help to identify the residues important for this complex to form, providing additional insights into ternary complex assembly.

Previously, we showed that the corepressor Hairless binds solely to the CTD of

Su(H); however, we also showed in competitive binding assays that dRAMANK could efficiently displace Hairless from Su(H) in the absence of MAM.210 In light of herein described binding experiments, in which dANK was shown to bind Su(H), provide a molecular explanation for why dRAMANK is an effective competitor for Su(H)-Hairless complexes. Consistent with this reasoning, we demonstrated via ITC and EMSA that

RAM does not affect Hairless binding to Su(H) (Table 4 and Figure 5). Together, these results indicate RAM binding to the BTD does not cause a long-range conformational change in the CTD of Su(H), but is important for tethering ANK to Su(H).

69

Figure 6. Revised model of ternary complex assembly for Drosophila Notch proteins. Our binding data suggest that the binding of Drosophila NICD to Su(H) is partitioned between its RAM and ANK domains. However, the conformation of the Su(H)-ANK interface without MAM (middle) is different than what was observed in the CSL-NICD-MAM ternary complex structures (right).

70

Finally, we present a revised model of CSL-NICD-MAM ternary complex formation that is fly-specific (Figure 6). In this case, when NICD binds Su(H) both RAM and ANK have appreciable interactions with Su(H). Furthermore, the inability of the fly point mutants to significantly disrupt binding suggests that there are different contacts involved in the Su(H)-ANK interaction in the absence of MAM. We suspect that when

MAM binds Su(H)-dRAMANK, it forms a ternary complex very similar to what was observed in the human and worm X-ray structures. While the biological significance of a fly-specific model is not immediately obvious, it is interesting to speculate that perhaps the difference in dRAMANK binding to Su(H) is necessary for displacement of Hairless from Su(H), but this will require further study. Nonetheless, it will be important for future studies to take into consideration possible species-specific differences in Notch signaling, which may impact interpretation of results and phenotypes.

Materials and Methods

Cloning, Expression, and Protein Purification

The cloning, expression, and purification of constructs that correspond to Mus musculus

RBP-J (53-474), as well as the RAMANK (1744-2113), RAM (1744-1771), and ANK

(1827-2133) constructs from mouse Notch1 were described previously.169 Additionally, the cloning, expression, and purification of Drosophila melanogaster Su(H) (98-523) and the CTD (101-119 + 415-523) domain of Su(H), as well as the RAMANK (1762-2142) and ANK (1858-2142) domains of fly Notch were previously described.210 The construct corresponding to the RAM (1762-1790) domain of fly Notch was cloned, expressed, and purified similar to the RAM domain from mouse Notch1.

71

Isothermal Titration Calorimetry

Proteins for use in isothermal titration calorimetry (ITC) experiments were degassed and buffer-matched using either size exclusion chromatography or dialysis. Protein concentrations were determined by UV absorbance at 280 nm. ITC experiments were performed with a MicroCal VP-ITC microcalorimeter. All experiments were conducted at

25°C in a buffer of 50 mM sodium phosphate, pH 6.5, and 150 mM sodium chloride. A typical experiment consisted of 10 µM macromolecule in the cell and 100 µM ligand in the syringe. Data was analyzed with the ORIGIN software package and fit to a one-site binding model. The reported binding data is the average of at least three individual experiments (n=3). For the competition ITC experiment, proteins were prepared separately as described above. The Hairless construct (232-358) retained an N- terminal SMT3 fusion tag from purification; however, no binding was detected between

SMT3 and Su(H) (data not shown). Purified Su(H) and dRAM were combined in a 1:1 ratio and placed in the microcalorimeter cell and then titrated with Hairless.

Electrophoretic Mobility Shift Assays

EMSAs were performed as described previously.169,210 Briefly, purified constructs of

Su(H) and Hairless (232-269) were incubated for 15 minutes at room temperature with a

19-mer duplex DNA (-GTTACTGTGGGAAAGAAAG-) containing a single CSL-binding site (in bold type) from the Hes-1 gene. Various combinations of purified Drosophila

RAMANK, RAM, ANK, and MAM proteins were added to the preformed DNA-Su(H)-

Hairless complexes and incubated for an additional 15 minutes at room temperature.

72

The complexes were separated on a 7% polyacrylamide gel containing 0.5x Tris-borate buffer, pH 7.0, for three hours at 4°C and visualized using SYBR-GOLD stain

(Invitrogen).

Acknowledgements

We thank members of the Kovall lab for their support and helpful comments for the manuscript. This work was funded in whole or in part by NIH grants CA178974 (to RAK) and ES007250 (to ANC). The authors declare no conflicts of interest.

73

Chapter 3: Quantitative Binding Analysis of the CSL-DNA Interaction in the Notch Pathway

Data presented in this chapter has been published in:

Torella, R.; Li, J.; Kinrade, E.; Cerda-Moya, G.; Contreras, A.N.; Foy, R.; Stojnic, R.; Glen, R.C.; Kovall, R.A.; Adryan, B.; and Bray, S.J. (2014) A Combination of Computational and Experimental Approaches Identifies DNA Sequence Constraints Associated with Target Site Binding Specificity of the Transcription Factor CSL. Nucleic Acids Research 42 (16): 10550-10563.

This publication can be found in Appendix A.

74

DNA recognition by a protein is not as simple as having a unique hydrogen bonding pattern for each amino acid-base pair. Rather, numerous factors affect how proteins recognize and interact with DNA, including flexible protein-DNA interactions,

DNA structure, chromatin structure, and combinatorial interactions with other proteins.180,236 Due to the complex interplay between protein side chains, water molecules, and DNA bases in the binding interface, the interaction between protein and

DNA is highly flexible and able to adopt new conformations in response to changes in protein or DNA sequence.236 Some DNA binding proteins may possess multiple DNA binding domains or may require homo- or heterodimerization to bind DNA, requiring certain site arrangements.236 The architectural arrangement of multiple binding sites, whether for a single DNA binding protein or for multiple DNA binding proteins synergistically loading onto DNA, can also be a factor in determining when and where a protein can bind and function.237 Proteins can directly interact with DNA bases to discriminate specific DNA sequences; this process is called direct or base readout.236

However, indirect or shape readout is equally important to how proteins recognize and interact with a specific DNA sequence, since the structural differences between DNA molecules can serve as a distinguishing feature.236 DNA shape can dramatically affect whether a protein can physically interact with it or vice versa. At a local level, shape readout can entail any deviations from B-form DNA, such as sequence dependent kinks or variation in groove width, that are preferentially recognized by a protein.236

Sequences flanking the binding site of a protein can influence the three-dimensional structure of the binding site and therefore affect protein recognition and binding.236 On a global level, shape readout refers to deformation or bending of much of the DNA

75

molecule, which involves multiple interactions distributed between the protein and DNA as well as within the DNA.236 Related to shape readout is the docking geometry of the protein-DNA complex, which is critical for making the contacts necessary for interaction.

For example, a protein can use different docking geometries when binding different

DNA sequences.236 Alternatively, structurally homologous proteins can have different docking geometries when binding the same sequence of DNA.236 How DNA is packaged into chromatin can also affect whether a protein can interact with a binding site. Highly condensed heterochromatin is unable to be easily accessed by proteins whereas loosely packed euchromatin is readily accessed by proteins. This means cell cycle stage and epigenetic modifications on histones, which direct chromatin packaging, are influential in regulating protein-DNA interactions. Lastly, the proteins that bind DNA often do so with other proteins, either interacting as neighbors or assembling into multi- protein complexes. These combinatorial interactions between proteins provide specificity in terms of which sites in the genome it can bind or they can regulate the activity of the proteins, limiting activity temporally, spatially, or contextually.180

Transcription factors are a particular class of proteins that bind DNA to regulate the expression of one or more genes. Their choice of binding sites is typically degenerate when compared to high specificity DNA binding proteins like restriction enzymes, which bind only a single specific site.236 In contrast, transcription factors bind a diverse array of sites with variable affinity, ranging from low to high.236 Low affinity sites are commonly bound in vivo, but are not often identified by traditional computational methods since the low affinity sites can deviate greatly from a consensus binding sequence.236 Additionally, transcription factors in the same family can have

76

different preferences for lower affinity sites even if they bind a similar high affinity site.236

While transcription factors can bind a variety of sites, they may not have similar function at every one due to combinatorial interactions with other proteins.236 Cooperative interactions typically occur between adjacent transcription factors through protein- protein interactions and can produce several effects.236 They can stabilize the proteins on DNA, enhance each protein’s individual contributions to transcriptional regulation, or alter the binding site specificity of one protein in a negative or positive fashion.236

Transcription factors can also recruit non-DNA binding cofactors to the DNA, and often these cofactors can interact with multiple DNA bound transcription factors to integrate their input for maximal gene expression.236 In some situations, the DNA sequence can cause an allosteric effect in the transcription factor so that it recruits one cofactor verses another to that site.236

CSL is the sole transcription factor utilized by the Notch pathway and interacts with various cofactors to either activate or repress the transcription of target genes.

Since CSL is a DNA binding protein, determining the molecular mechanisms for DNA recognition and interaction is essential for understanding how it achieves differential transcriptional regulation. Furthermore, elucidating these mechanisms will allow other important questions to be addressed, such as elucidating a connection between binding site sequence and functional output or the role of complex binding site architecture in transcriptional regulation.

CSL was first identified as a DNA binding protein in 1989 and a few years later, in 1994, as a component of the Notch pathway.181,232,238 Multiple studies determined the consensus sequence for CSL binding in flies and mammals was the eight base pair

77

Figure 1: CSL Binds Two Distinct Sites in the Hes1 Gene A) The Hes1 gene possesses two distinct CSL binding sites (orange) of eight base pairs each. The two sites are arranged in a head-to-head orientation approximately 15-17 base pairs apart, forming the SPS (sequence or Su(H) paired site). When both sites of the SPS are occupied by ternary complexes, the complexes interact through contacts between their ANK domains, causing the DNA linker to bend and untwist. The nucleotide in position five differs between the two binding sites. In the consensus site, position five is a guanine (G) whereas in the second site, position five is a thymine (T, blue). B) Structure of RBP-J bound to the consensus site of Hes1 (PDB: 3BRG).169 C) Structure of RBP-J bound to the second site of Hes1 (PDB: 3IAG).178 In both structures, the NTD (cyan) and BTD (green) of CSL make specific and non-specific contacts to the DNA sequence. Despite the different nucleotide in position five, all of these contacts are maintained in both structures.

78

sequence C/tGTGGGAA, where the first position could be either cytosine (C) or thymine

(T).171-172,174 This sequence is present in the promoter of a well-studied Notch target gene in flies and mammals, Hes1. (Figure 1A) A recent study estimated approximately

40% of fly genes have a consensus binding site in their regulatory regions, though it is unclear whether all the identified sites are functional and responsive to Notch signaling.239 It is unlikely to be the case, as a different study demonstrated only a subset of genes (~260 or less than 2% of genes) were responsive to Notch signaling in a particular fly cell type.65 However, many genes do not possess a consensus binding site for CSL, indicating other non-consensus sequences can be functionally bound by

CSL. This is, in fact, the case, as will be discussed shortly.

Several structures of CSL bound to the consensus DNA sequence have been solved, including structures CSL bound to DNA (worm, mammalian), of the ternary complex (CSL-NICD-Mam) and of some repressor complexes (CSL-MINT, CSL-KyoT2,

CSL-Hairless, CSL-RITA).8,10,126,177 Overall, these structures of CSL bound to consensus DNA, regardless of function, reveal a highly similar binding interface between CSL and DNA. In the structures showing CSL bound to consensus DNA, the

DNA remains B-form, though CSL interaction induces two slight bends in the DNA.8

Together, the pair of bends compensate for each other to preserve an overall straight alignment.8 The pair of bends, however, generates base pair inclines with respect to the local helical axis and widens the major and minor grooves.8 Perturbation of base steps between the guanines in positions 4-6 and their complementary base pair cytosines produce undertwisting of the helix with significant buckling and positive propeller twist that can locally distort B-form DNA.8

79

All structures show CSL binding DNA as a monomer, unlike other proteins in the

Rel family that interact with DNA as a homodimer or heterodimer.121 Portions of the

NTD, BTD, and the interdomain linker between them form a large electropositive surface on CSL that provides both specific and non-specific contacts with DNA.8,121

(Figure 1B) Unlike other Rel proteins, the CTD of CSL does not function in DNA binding.121 Also unlike other Rel proteins, the BTD of CSL inserts a loop into the minor groove of DNA to recognize the first half of the binding site sequence (C/tG in positions

1 and 2).121 The NTD, however, acts similar to other Rel proteins as it inserts a beta hairpin loop into the major groove of DNA to recognize the second half of the binding site sequence (GGGA in positions 4-7).121 No specific contacts are made between CSL and the thymine in position 3 and the adenine in position 8.8

For the structures of CSL bound to the consensus DNA sequence, protein residues involved in making specific DNA contacts are absolutely conserved in all CSL orthologs.178 Similarly, structures of the worm and mammalian CSL orthologs bound to

DNA reveal all specific contacts are maintained between CSL and DNA, indicating the interaction is essentially identical between species. Most contacts are made between

CSL and the DNA backbone, which position other protein residues for specific base interactions.8 The base specific contacts are made to the core sequence GTGGGAA, which corresponds to positions 2-8 of the consensus sequence.8 The structure of worm

CSL with Hes1 consensus sequence DNA (PDB: 1TTU) provides detailed information about specific CSL-DNA contacts. (Figure 2) From the NTD, Arg234 interacts with the guanine in position 4 in a guanine-specific manner while Gln226 interacts with the guanine in position 5 in a purine specific manner.8 Both of these contacts are in the

80

major groove of the DNA, as is the interaction between Lys368 in the interdomain linker with the guanine in position 6 and the thymine on the opposite strand (the complementary base to the adenine in position 7).8 Both residues from the BTD that make specific contacts do so in the minor groove of DNA: Gln401 interacts with the adenine on the opposite strand (paired with the thymine in position 1) while Ser400 makes a guanine specific interaction with the guanine in position 2.8 Additionally, there are no direct contacts between CSL and the thymine in position 3 or the adenine in position 8, but they appear to be an important requirement for CSL binding, possibly because they may stabilize the local DNA architecture to facilitate CSL binding.8

Structures of CSL orthologs with DNA or in which CSL is simultaneously interacting with

DNA and other co-regulatory proteins (either NICD or a corepressor) show no significant difference in the contacts between CSL and DNA, indicating co-regulator binding does not have an allosteric effect on the DNA binding surface of CSL.10

In addition to structural data, multiple studies have reported a relatively low binding affinity for CSL to the consensus sequence. The original high affinity reported

(Kd ~ 1 nM) was calculated using a qualitative approach, as Scatchard plot analyses of

CSL-DNA EMSAs using partially purified CSL was not a thoroughly quantitative technique.181 This led to a model depicting CSL statically bound to DNA with co- regulatory proteins associating or disassociating depending on the pathway activation state. Later studies employed quantitative methods, such as isothermal titration calorimetry, to determine the actual affinity of CSL for consensus DNA to be relatively

169 modest, with a Kd of 150 nM, a hundred fold weaker than initially reported. (Figure

3A) Whether CSL binds DNA alone or pre-complexed with NICD, its affinity for DNA

81

remains the same.169 (Figure 3D) However, there is a minor difference in the thermodynamic parameters of NICD binding CSL in the presence or absence of DNA, suggesting regions of CSL may undergo folding coupled to DNA binding to facilitate

NICD loading.178 With such a moderate affinity, it is likely the CSL-DNA interaction is more dynamic than previously thought. Therefore, cooperative mechanisms are important for increasing the stability of CSL complexes on DNA and for providing a means of fine-tuning the strength of transcriptional activation.121 It is also possible CSL- co-regulator complexes are forming or exchanging in the nucleoplasm and require combinatorial interactions with other proteins to be recruited to specific binding sites.178

Evidence supporting these hypotheses comes from a recent study demonstrating the number of CSL binding sites occupied by CSL in the fly Enhancer of Split locus transiently increases following activation of the Notch pathway.240 This study also showed a cytoplasmic pool of CSL translocated to the nucleus after the pathway was stimulated.240 Additional research revealed cooperative binding mechanisms, such as the SPS architecture in the Hes1 gene, provide a robust burst of transcription following pathway activation.241 Many CSL binding sites are adjacent to binding sites for other transcription factors, such as GATA and PTF1a, suggesting cooperative or synergistic mechanisms may be important for proper loading and transcriptional activation.167,182-183

Overall, the interaction between CSL and DNA is dynamic and integrates numerous factors to determine the context specific activation of target gene transcription.

CSL, like other transcription factors, binds more than a single DNA sequence. In fact, it binds a broad range of eight base pair sequences, among which is the well- studied Enhancer of Split locus in flies.173,175 The standard techniques used to

82

Figure 2: Diagram of the Contacts Made Between CSL and the Hes1 Consensus Sequence DNA Based on contacts observed in the structure of worm CSL in complex with DNA corresponding to the Hes1 consensus binding site (PDB: 1TTU), this diagram represents all the specific and non-specific contacts made between CSL and DNA.8 The eight base pair CSL binding site is bracketed (blue). CSL residues making specific contacts to nucleotides in the binding site include: Arg234, Gln226, Lys368, Gln401, Ser400. There are no specific contacts made to the thymine in position three or the adenine in position eight. All other residues shown make nonspecific contacts to the DNA. Arrows indicate hydrogen bonding or salt bridge interactions while closed circles denote van der Waals interactions. 83

search for and to represent binding motifs used by transcription factors typically have inherent biases for a particular nucleotide sequence. Recently, less biased techniques, such as genome-wide chromatin immunopreciptation and protein binding microarrays,

have produced several sequences that can be bound by CSL.175,242-243 By genome- wide ChIP-seq analysis, over 10,000 genes were shown to have putative CSL binding sites within 10 kb of the transcription start site.176 The study using protein binding microarrays reported the preferred binding site for CSL, whether alone or in complex with NICD and/or Mastermind, was C/tGTGGGAA, which is identical to the Hes1 consensus sequence.175 However, this study determined the bases in positions 3, 4, 6, and 7 were nearly invariant among sequences bound by CSL whereas the other positions tolerated more variation.175 It is important to note the protein binding microarray technique does not take into account the effect of DNA shape, chromatin packing, or combinatorial interactions with other proteins.

The arrangement of CSL binding sites, whether consensus or not, varies greatly in terms of number and organization. Some enhancers have many CSL binding sites, such as sim, Su(H), and Pax2/sparkling in the fly.180 Some enhancers have only a few

CSL binding sites, like vgBE and E(spl).180 Yet other enhancers possess SPS organization, where there is a consensus binding site approximately 15-17 base pairs away from a non-consensus binding site in a head-to-head orientation.180 (Figure 1A) A structure of the human Hes1 SPS shows the DNA bends and untwists to bring the two ternary complexes close enough for them to interact through the Ankyrin repeat domain of their NICDs.179 Biochemical and reporter gene assays determined the Ank domain contacts are important for cooperative binding of both sites; without this interaction, CSL

84

tended to bind only the consensus site.179 While it is interesting to postulate whether cooperative loading of the Hes1 SPS may function as a key point for regulating gene expression, the relationship between different arrangements and their effect on transcriptional regulation is currently unclear.

There is still relatively little structural or quantitative binding data available for non-consensus CSL binding sites. One structure shows CSL bound to a variation of the consensus sequence, CGTGTGAA, where position 5 is a thymine instead of a guanine

(PDB: 3IAG).178 (Figure 1B) This variation is the second CSL binding site of the mammalian Hes1 SPS architecture. Overall, the structure appears similar to the CSL- consensus DNA structures.178 All specific and non-specific contacts are maintained even though the DNA has a slightly greater propeller twist, and there is only a slight shift in the relative arrangement of CSL domains.178 There are three notable features in the consensus variant structure. First, the beta hairpin loop of the BTD assumes an alternate conformation in the two structures and utilizes different residues to make specific and non-specific contacts with the DNA. However, the protein residues in the loop make specific contacts to the same DNA bases (the adenine paired with the thymine in position 1 and the guanine in position 2) in both structures, thus maintaining equivalent interactions despite the rearrangement.178 Second, a large loop in the BTD, which binds the Ram domain of NICD, is ordered when CSL is bound to the consensus variant DNA, but not ordered when CSL is bound to the consensus sequence.178 It was hypothesized this BTD loop may form two distinct structural elements depending on which DNA sequence CSL binds, though the precise role of this switch is not clear.

Third, a loop in the NTD is open when CSL is bound to the consensus variant DNA

85

Figure 3: CSL-Hes1 DNA ITC Binding Assays Representative thermograms for CSL binding to the consensus site (A) or the second, 178 or nonconsensus, site (B) in the Hes1 gene. The disassociation constant (Kd) is shown, as is the temperature of the experiment. Each experiment consisted of 40 injections of 7 μL each. C) The binding affinity of each CSL ortholog for the individual sequences in the Hes1 gene was measured by ITC.178 Each experiment was performed at 5°C and consisted of 40 injections of 7 μL each. D) Using ITC, the affinity of CSL for DNA was measured and compared to the affinity of CSL-RAMANK and CSL- RAM, indicating the affinity of CSL for DNA remains the same whether in complex or not.169 Experiments were performed at 25°C and consisted of injections of 7 μL each. 86

87

while in a closed conformation when CSL interacts with consensus DNA.178 Based upon additional structures of CSL in complex with only Ram, with NICD, or with NICD

and Mastermind, it was proposed that Ram binding to CSL triggers allosteric opening of the NTD loop to facilitate Mastermind binding for ternary complex formation.178

In addition to structural data, the binding affinity of CSL for the consensus variant sequence of the Hes1 SPS is two-fold weaker compared to the consensus site, with a

178 Kd of 250 nM. (Figure 3B) CSL orthologs for worm and fly display a similar reduction in binding affinity for the consensus variant sequence, indicating CSL affinity for DNA has been conserved.178 (Figure 3C) When the position 5 thymine was mutated to an adenine, which has been observed in other non-consensus CSL binding sites, the binding affinity (Kd= 150 nM) was very similar to that of CSL for the consensus sequence.178 When mutated to a cytosine, the binding affinity was reduced to 1 μM, which is seven-fold weaker than consensus and three-fold weaker than the original consensus variant sequence.178 This mutational analysis indicates this base step at position 5 is important for the affinity and specificity of CSL binding, though there is no strong molecular explanation as to why.178 Currently, there is great interest in identifying and determining the function of all non-consensus CSL binding sites in the genome. Research is investigating the role of chromatin structure, particularly with respect to the role epigenetic marks have on how Notch signaling transitions between target gene repression and activation.184 Similarly, studies are ongoing to address how the Notch pathway selectively activates transcription of target genes, whether through interaction with tissue specific factors, through cross-talk with other pathways, or

88

through other means, such as temporal control.244-246 While great strides have been made in recent years regarding the CSL-DNA interaction, much remains unknown.

My research into the interaction between CSL and DNA is to determine the effect nucleotide variation has on CSL binding and the effect of sequences flanking the binding site on CSL binding. While my data does not provide a comprehensive answer to each of these questions, it does offer insight into how CSL recognizes and binds DNA sequences other than the consensus site. During my research, I worked with two collaborators to measure the binding affinity of mouse CSL, RBP-J, for different sequences of DNA oligomers. Here I will report the results of my individual research.

One of the publications containing my work can be found as Appendix A; the other publication containing my work is in preparation.

Putative CSL Binding Sites in Mouse Math5 Gene

The manuscript containing the following data is in preparation.

Math5 is a Notch target gene important for normal eye development in the mouse. Our collaborator identified four putative CSL binding sites in the Math5 promoter, each with some variation to the consensus sequence. By EMSAs, all four sites were able to be bound by RBP-J. ChIP analysis revealed RBP-J was present at all four sites in vivo. We performed isothermal titration calorimetry (ITC), using purified

RBP-J and a 24 nucleotide duplex containing one putative binding site centered in the oligomer. All experiments were performed at 5°C using 100 μM DNA in the syringe and

10 μM RBP-J in the cell. (Figure 4A) As a control, we tested RBP-J with a non-specific

89

DNA sequence (GCTACTCATACCTAGAACG) and did not detect binding (data not shown).

The results showed RBP-J bound to each putative binding site of the Math5 gene, though each Math5 site had weaker affinity for CSL when compared to the consensus sequence. (Figure 4B) From these experiments, we concluded variations to the eight base pair binding sequence plus variations to the flanking sequence can weaken the CSL-DNA interaction. While we were unable to separate the effects of nucleotide variation and flanking sequence, we were able to demonstrate in vivo non- consensus sites have a range of binding affinities.

Thermodynamic Characterization of Computationally Derived CSL Binding Sites

Please refer to Appendix A for the complete paper. Note that in Figure 3 of the publication, the sequence listed as S3 (CGTGTGAC) is referred to as S9 in the text. It will be referred to as S3 in this section, so it will correspond to the figure showing ITC thermograms. The S3 sequence listed in the text (CGTGTGAA) was not analyzed by

ITC.

Our collaborators employed computational approaches to analyze the energetics of CSL binding to different DNA sequences. To do this, they used FOLDX software to calculate changes in binding energy for every nucleotide variation possible in their model, a structure of RBP-J bound to the Hes1 consensus sequence. Their in silico modeling and subsequent ranking analysis generated 220 DNA motifs that were likely to be bound within 3 kcal/mol of each other. These motifs were then used to create an eight base pair binding logo (Figure 5A) with several interesting features.

90

First, there is a strong preference for cytosine at position 1 despite the general belief that cytosine and thymine are equally preferred. Second, at positions 2 and 6, which are traditionally thought to have a strong preference for guanine, there is a much greater tolerance for other nucleotides. Third, position 5, which is traditionally believed to have a preference for guanine or adenine, can tolerate considerable nucleotide variability. To determine whether their computational predictions were accurate for in vivo binding sites, our collaborators chose a few sequences to further analyze with experiments. They performed competition EMSAs, luciferase assays in Drosophila S2 cells, and in vivo rescue assays with flies while we performed ITC. All ITC experiments were performed at 10°C using 100 μM DNA in the syringe and 10 μM RBP-J in the cell.

Each experiment consisted of 20 injections of 14 μL, spaced 120 seconds apart. A single CSL binding site was centered in each 19 base pair oligomer. Overall, this study determined computational modeling based on structural properties can accurately predict binding site preferences, though it still possess biases and should be used in conjunction with experimental techniques.

To assess the predicted preference for cytosine at position 1, two different DNA sequences were experimentally tested: CGTGGGAA (S1) and TGTGGGAA (S2). The nucleotide that differs from the Hes1 consensus sequence is bolded and underlined.

Oligomers of these two sequences were tested by ITC, EMSA, and luciferase assay.

ITC revealed CSL bound the sequences with cytosine at position 1 (S1) eight-fold more tightly than the sequence with thymine at position 1 (S2) (Kd = 0.06 μM for S1 versus Kd

= 0.5 μM for S2). (Figure 5B, 5C) Similarly, EMSA and luciferase assay results showed the cytosine variant S1 has stronger binding to CSL and higher functional output than

91

the thymine variant S2. Together, these assays support the computational prediction that cytosine is preferred at position 1 of the CSL binding site, though thymine can be tolerated.

We tested another sequence by ITC for our collaborators. This sequence,

CGTGTGAC (S3), was highly ranked in the FOLDX predictions and shown in previous studies that the variants in positions 5 and 8 could be tolerated due to a reduced number of interactions between side chains and DNA bases. Using ITC, we determined this sequence had a moderate affinity for CSL (Kd = 1.7 μM). (Figure 5B, 5C) By competition EMSA and luciferase reporter assay, this sequence exhibited intermediate levels of binding and activation.

In the Hes1 consensus sequence, positions 4 and 5 are both thymines. The

FOLDX predictions suggested position 4 had a strong preference for guanine and position 5 could tolerate adenine, guanine, or thymine. When tested by EMSA and luciferase assays, a sequence in which positions 4 and 5 were changed to adenines

(CGTAAGAA, S6) failed to compete in an EMSA competition assay and had no response in luciferase reporter assays. This sequence, however, is bound by CSL, albeit weakly. By ITC, we measured a much weaker affinity (Kd > 50 μM) between S6 and CSL. (Figure 5B, 5C) It must be noted we were unable to completely fit the data to a binding curve, though we did observe the experimental data was significantly better compared to a non-specific control sequence (TCATACCT) that did not have detectable binding by ITC. Our ITC data contradicts a previous report that said S6 has higher

174 affinity binding than the position 1 thymine variant, TGTGGGAA (S1, Kd = 0.06 μM).

Overall, there appears to be tolerance for adenines at positions 4 and 5, since CSL can

92

93

Figure 4: ITC Binding Data for RBP-J and Putative CSL Binding Sites in the Math5 Gene A) Representative thermograms of RBP-J binding Sites 1-4 from the Math5 gene, with the disassociation constant (Kd) shown. Each experiment was performed at 5°C and consisted of 40 injections of 7 μL each. The DNA sequence is shown, with any deviations from the Hes1 consensus sequence highlighted in blue. B) Table summarizing the experimental results. Data for the Hes1 consensus and Hes1 second site was obtained under identical experimental parameters in the publication Friedmann et al. 2010.178

94

still bind and function, though both facets are significantly reduced. Our results suggest moderate binding correlates with moderate functional ability, at least for the binding sites we analyzed.

Additional DNA Sequences

In addition to our work with collaborators, we have tested eight other DNA sequences by ITC, most of which examine the effect of variations to the sequences flanking the CSL binding site. (Table 1) A recent study demonstrated nucleotides immediately flanking a binding site can affect transcription factor binding.247 We tested seven sequences, each containing a single Hes1 consensus binding site, with either cytosine or thymine in position 1, but the three nucleotides preceding the binding site

(positions -1, -2, -3) vary. We performed ITC at 10°C with 100 μM DNA in the syringe and 10 μM RBP-J in the cell. Each experiment consisted of 20 injections of 40 μL, spaced 120 seconds apart. A single CSL binding site was centered in each 19 or 28 base pair oligomer. Currently, we have completed experiments with four of the seven sequences and all have much lower binding affinity for CSL compared to the Hes1

Consensus DNA. Two caveats must be mentioned: first, the DNA sequences analyzed have variations beyond the three nucleotides preceding the CSL binding site, so that it is not currently possible to separate the effect of flanking sequence variation from effects caused by other variations to the sequence and, second, there is currently no standard for comparison for flanking sequences. In other words, without a standard like the Hes1 Consensus sequence to use for comparison, we cannot accurately calculate the difference caused by variation in the flanking sequence. Clearly, further

95

experiments are needed to generate a standard for flanking sequences and for limiting sequence variation to flanking nucleotides.

Another DNA sequence we are analyzing by ITC is a G-rich sequence that was enriched in a protein binding microarray performed by our collaborator Matt Weirauch.

(Table 1) Though the sequence, AAAGGGGGAAA, deviates greatly from the Hes1 consensus sequence, G-rich tracts were frequently bound by CSL in the microarrays.

We performed ITC at 10°C with 100 μM DNA in the syringe and 10 μM RBP-J in the cell. Each experiment consisted of 20 injections of 40 μL, spaced 120 seconds apart. A single CSL binding site was centered in each 11 base pair oligomer. Unfortunately, we have been unable to successfully complete ITC experiments with this sequence to date due to purification and experimental issues.

96

Figure 5: ITC Binding Data for RBP-J and Computationally Derived DNA Sequences A) Binding logo derived from in silico modeling and ranking analysis performed by our collaborator, Dr. Rubben Torella. The logo depicts the eight base pair CSL binding site generated from the computational modeling. B) Table summarizing the results of ITC binding assays between CSL and different sequences of DNA. The CSL binding site is bold and any deviations from the Hes1 consensus site are highlighted in blue. C) Representative thermograms of RBP-J binding to different DNA sequences. The disassociation constant (Kd) and the DNA sequence is shown, with the CSL binding site in bold and underlined. Each experiment was performed at 10°C and consisted of 20 injections of 14 μL each.

97

98

Table 1: Additional Sequences of DNA Analyzed for RBP-J Binding by ITC Table summarizing the additional DNA sequences tested for RBP-J binding by ITC. The sequences are listed with the CSL binding site in bold and variations of interest to the flanking region highlighted in blue. A disassociation constant (Kd) is listed for sequences whose ITC experiments have been completed. An asterisk indicates the data set for the DNA sequence has not yet been obtained. ITC experiments were performed at 10°C and consisted of 20 injections of 14 μL each.

99

Chapter 4: Conclusions and Future Directions

100

CSL is the central component of a transcriptional switch that regulates target gene expression. Its association with co-repressors or co-activators determine whether target genes are repressed or transcribed. While this is executed by large multiprotein complexes built around CSL, such as HDACs and HATs, defining interactions between the central components of these complexes is important for understanding how the

Notch pathway regulates gene expression. Much research has been done over decades to obtain information about the Notch pathway and the complexes it employs to regulate target gene transcription.

Hairless and NICD Competition for Binding to Su(H)

There is abundant data for the ternary complex in mammals and in worms in the form of numerous structures as well as binding and cellular assays. Similar data is available for some, but not all, of the repression complexes in mammals, inc luding the co-repressors MINT, KyoT2, and RITA. The same information, for either the ternary or repression complex, is not available for the fly, though there is a significant amount of data with respect to Notch signaling and its genetic interactions and role in cell fate specification in the fly. My work as a graduate student in the Kovall lab has been to obtain data for the fly ternary and repression complexes, with the ultimate goal of elucidating the molecular mechanism for how CSL transitions between the two complexes to regulate target gene expression.

My work on the Hairless project contributed to ultimately determining the structure and binding affinity of Hairless for Su(H), as related in Appendix B. We demonstrated Hairless 232-269 binds solely to the CTD of Su(H) with a Kd of 1 nM,

101

partially inserting itself into the CTD of Su(H) and disrupting the local domain structure

(Yuan and Kovall, unpublished data).210 This mode of interaction is unique among co- repressors, since no other CSL co-repressor binds only to the CTD. Most mammalian co-repressors, such as KyoT2, RITA, and EBNA2, bind to the BTD of CSL as elongated chains in the same area where the Ram domain of NICD binds, making it an obvious competitor for binding the BTD. An elegant study by Scott Johnson and Doug Barrick utilizing an artificial construct with the ANK domain fused to the CTD showed the ANK domain-CSL fusion can activate transcription.191 Adding the Ram domain in trans increased levels of transcription, but was not necessary to activate it, suggesting the main role of the Ram domain is to displace co-repressors from CSL.191 Since Hairless does not interact with the BTD at all, this cannot be true in the fly and a novel mechanism for Hairless displacement by NICD must exist. One mammalian co- repressor, MINT, offers perspective on unique co-repressor-CSL interactions. MINT binds both the BTD and CTD of CSL and its binding affinity is bivalently split between these two interactions in a variation of the chelate effect.215 When used in EMSAs,

MINT can displace NICD from CSL when Mastermind is absent, but the re verse is different--NICD cannot displace MINT from CSL without Mastermind (Vanderwielen and

Kovall, unpublished data). This suggests Mastermind is highly important for mammalian ternary complex stability and the transition between repression and activation complexes (Vanderwielen and Kovall, unpublished data). Our fly EMSAs, showing

NICD can displace Hairless without Mastermind but that Hairless cannot displace NICD as efficiently, indicate the molecular requirements for ternary complex assembly are different between fly and mammalian orthologs.210 Our Hairless data raises several key

102

questions: Why does Hairless bind the CTD and not the BTD? Mechanistically, how does fly NICD, even in the absence of Mastermind, displace Hairless? What possible role could ANK domain binding play in Notch signaling? Much additional research is necessary to address these issues, but it is clear Hairless is unique among other Notch co-repressors.

My work on the fly ternary complex provided thermodynamic details about the interaction between NICD and Su(H). We showed by ITC and EMSA that both the Ram and ANK domains contribute to NICD binding and that the ANK domain binds, albeit weakly, without the aid of Mastermind, which is a phenomenon so far observed only in the fly. There may be similar species-specific differences in other Notch orthologs that have not yet been identified. In the future, it will be important to consider species- specific differences when interpreting data from other organisms, and acquiring data from other organisms will be essential to developing a better understanding of the Notch pathway in all model organisms. In mammals, the ANK domain does not bind CSL until

Mastermind is present to hold it in place on CSL. Binding of the mammalian ANK domain to CSL cannot be detected by ITC and is very inefficient by EMSA when in the presence of Mastermind. (DRF 2008) In contrast, we detected binding between the fly

ANK domain and Su(H) (Kd 668 nM) and observed a distinct shift indicating interaction between DNA-Su(H)-ANK-Mastermind by EMSA. Our data allowed us to develop a revised model for ternary complex assembly, which reflects species-specific differences in the order of assembly. (Figure 1) Upon pathway activation in mammals, the NICD translocates to the nucleus where the Ram domain forms a high affinity interaction with the BTD of CSL. The ANK domain does not appreciably interact with the CTD of CSL

103

until Mastermind is present to form the ternary complex. In flies, after the NICD translocates to the nucleus, both the Ram and ANK domains of NICD interact with the

BTD and CTD of CSL, respectively. Mastermind then binds to form the ternary complex. Though the significance of ANK domain binding in flies is not clear, it may play an important role in the molecular transition between the Hairless repression complex and the ternary complex with NICD Mastermind. Additional studies will be necessary to define what, if any, role the ANK domain plays in the fly.

Through my cross-species ITC experiments, we determined the reason fly ANK can bind is a feature inherent in Su(H). We concluded this because, when compared to single species experiments, the combination of mouse NICD with fly Su(H) has similar binding affinities while fly NICD and mouse RBPJ did not. It is interesting neither cross- species ANK domain-CSL or ANK domain-CTD of CSL experiment showed binding, but it is possible the specific residues critical for the ANK domain-CTD interaction are not identical between fly and mammalian proteins. This is, in fact, true, as we demonstrated with our mutational analysis. When we mutated two residues in the ANK domain and one residue in the CTD of Su(H) that correspond to residues necessary for binding in the worm and human ternary complex structures (PDB: 2FO1 & 2F8X, respectively), we did not see significant differences in binding affinity by ITC or in complex formation by EMSA, indicating the binding interface is different between mammals and flies(Contreras and Kovall, unpublished data).10,126 We also demonstrated the basis for ANK domain-CSL binding is not likely to be due to an allosteric effect caused by Ram binding. By ITC and EMSA, we showed the Ram domain binds CSL to a similar extent with or without the ANK domain. By EMSA, we

104

also showed that when present as separate domains, Ram plus ANK does not interact with CSL as efficiently as RAMANK. Additionally, we were unable to determine precisely what the inherent feature of Su(H) is that permits ANK domain binding. It is not an obvious one, as we did a sequence alignment of RBPJ and Su(H) that shows

71.3% sequence identity and no significant areas of difference. (Figure 2A) We also did sequence alignments between fly and mouse NICD (54.5% sequence identity) as well as fly and mouse ANK domains (63.5% sequence identity), once again observing a lack of significantly different regions. (Figure 2B, 2C) A high resolution structure of the fly ternary complex or of the CTD-ANK domain interface (as Su(H)-ANK domain or CTD-

ANK domain) is necessary to address these and several other questions regarding species specific features in the fly.

Effect of Sequence Variation on the CSL-DNA Interaction

CSL is the center of a transcriptional switch that regulates Notch target gene transcription. To do so, CSL must be bound to DNA in the regulatory region of a target gene. Understanding how CSL recognizes and binds specific DNA sequences is therefore critical to understanding the function of the Notch pathway. Several decades of research have shown CSL can bind a consensus sequence (C/tGTGGGAA) as well as other sequences, but it remains unknown how much variation to the sequence can be tolerated before CSL binding is completely lost. It was also unclear whether the sequence of DNA CSL binds could have an effect on its function, namely transcriptional output. My work on this project made some of the initial steps in quantitatively measuring the effect of variations to the DNA sequence on CSL binding. I used ITC to

105

measure the binding affinity of RBPJ for DNA sequences that had variation to the eight nucleotide CSL binding site. In general terms, variations to the CSL binding sequence can positively or negatively affect the binding affinity of CSL for DNA. Most variations we tested weaken binding, but some variations, like S1 (CGTGGGAA) increase binding affinity above that of RBPJ for Hes1 Consensus sequence (TGTGGGAA, Kd 209 nM for

Hes1 Consensus, Kd 60 nM for S1). Our collaborators performed EMSAs and cellular luciferase assays to assess any relationship between DNA sequence, binding affinity, and functional output. Their data showed sequences with weaker CSL binding affinity have reduced functional output while sequences with stronger CSL binding affinity have increased functional output. Taken together, our data suggests there is a correlation between DNA sequence and functional output, though more research will be necessary to fully elucidate this link and define the role it plays in Notch signaling.

I also used ITC to analyze the effect sequences flanking the CSL binding site may have on CSL binding affinity for DNA. To date, I have measured the binding affinity for two different sequences with variation in the three nucleotides immediately preceding the CSL binding site. The binding affinity between CSL and both sequences is weaker compared to the Hes1 Consensus sequence, though it should be noted the

DNA sequences used in these experiments are longer and have variation beyond the three preceding nucleotides. Without a benchmark sequence, such as Hes1

Consensus, and a larger sample set to compare against, it is difficult to conclude whether these two sequences are truly weakening CSL binding. Regardless, we predict proximal flanking sequences will have only a moderate affect on CSL affinity, because the sequence of bases affects local helix geometry in the forms of propeller twist or helix

106

Figure 1: Species-Specific Models of Ternary Complex Assembly Top In the mouse-specific model of ternary complex assembly, once NICD has translocated to the nucleus, the RAM domain binds the BTD of RBP-J in a high affinity interaction. The ANK domain does not appreciably interact with CSL until Mastermind is present to hold the ANK domain in place and form the ternary complex. Bottom In the fly-specific model of ternary complex assembly, both the RAM and ANK domains interact with the BTD and CTD of Su(H), respectively. Mastermind then binds to form the ternary complex.

107

Figure 2: Sequence Alignments of CSL, NICD, and ANK Domain from Mouse and Fly Orthologs A) Alignment of RBP-J and Su(H). B) Alignment of mouse NICD and fly NICD. C) Alignment of mouse ANK domain and fly ANK domain.

108

109

buckling. While these geometric features can disrupt CSL-DNA interactions, it is unlikely to be a severe enough change to abrogate all contacts between CSL and DNA.

However, it is possible flanking sequences could affect CSL binding more dramatically if they contained the binding site for another transcription factor, thus involving positive or negative combinatorial interactions to promote or inhibit CSL binding at that site. This would not hold true for all CSL binding sites, but may be important for CSL binding and function at a subset of sites in vivo.

Future Directions

A high resolution structure of the fly ternary complex is necessary to address a multitude of questions. To date, several attempts at crystallizing the fly ternary complex have been unsuccessful in the Kovall lab using native proteins. For future attempts, we could make surface entropy reduction mutants of the proteins or use proteins bearing a maltose binding fusion tag to promote crystallization. Both methods have been successfully employed in the Kovall lab for other CSL complexes, and I have already cloned, expressed, and purified a surface entropy reduction mutant of Su(H)—Su(H)

R155T N281G (Appendix B). This mutant has been used in crystallography by Zhenyu

Yuan to solve the structure of DNA-Su(H)-Hairless complex, demonstrating its usefulness as a crystallographic tool (Yuan and Kovall, unpublished data). A structure of the fly ternary complex would provide valuable details into the unique ANK domain-CTD interaction of the fly proteins as well as offer a useful tool for further examining the transition between repression and activation complexes. Furthermore, a fly ternary complex structure could be compared to those of mammals and worms, to determine

110

whether there are subtle differences between them that might explain species-specific phenomenon, such as ANK binding in the fly. We do not expect there to be major structural differences in the fly ternary complex compared to those of mammals and worms due to the high degree of sequence conservation of CSL, NICD, and Mastermind orthologs.

A structure of the DNA-Su(H)-NICD complex without Mastermind or of the CTD of Su(H)-ANK domain complex would also be useful for comparison against the fly ternary complex. Such a structure would reveal the residues involved in the ANK domain-CTD interaction, which will be important for further study of this binding event.

When compared to a structure of the fly ternary complex, it would be possible to determine whether the contacts between the ANK domain and CTD are similar or different, which could help further refine the model for ternary complex assembly.

Whether the contacts are different when Mastermind is present or not, a structure of

DNA-Su(H)-NICD will still be important for addressing questions regarding the ANK domain-CTD interaction. Any residues identified in the ANK domain-CTD interface could be further tested by mutation and ITC to determine whether they are critical for interaction or not. The mammalian orthologs could be mutated to possess any key residues in the fly ANK-CTD interface and their ability to interact assessed by ITC as well as their ability to activate transcription in a luciferase assay.

Alternatively, we could utilize the statistical coupling analysis program developed by Dr. Rama Ranganathan of UT Southwestern Medical Center to identify key residues in the CTD-ANK domain interaction. To use this approach, we work under the assumption that CSL and ANK domain of all insect species interact without Mastermind

111

while those of all mammalian species do not. The reason for this is that statistical coupling analysis extends the concept of conservation to include correlations between spatial positions, which will define the structure of functional interactions between amino acids. For the analysis, we would need to submit sequences of the CTD for several species of insects and for several species of mammals. The program will identify any residues with conserved functional interactions that are different between insects and mammals. We could also perform this analysis with the ANK domain from insects and from mammals. Any residues identified by the program could then be experimentally tested by mutating the residue in the CTD or ANK domain and measuring binding by

ITC or assessing functional output by a luciferase assay.

A highly significant area for future study will be to elucidate the molecular mechanism for how Hairless and NICD compete for binding to Su(H). Hairless has a much higher binding affinity (Kd 1 nM) for Su(H) compared to NICD (Kd 60 nM), so it should not be able to be displaced by NICD. Since NICD in the absence of Mastermind can completely displace Hairless from Su(H), we have demonstrated factors other than competition between relative binding affinities are involved. Again, a structure of the fly ternary or DNA-Su(H)-NICD complex will help address this issue, as it will enable us to visualize critical differences between the repression and ternary complexes. From the structure of the repression complex, we observe Hairless disrupts the local domain structure of the CTD. It is unlikely, based on sequence conservation and homology to the mammalian and worm structures, that the CTD will be similarly perturbed in the ternary complex. We also know that the Hairless binding site and the ANK domain binding site on the CTD of Su(H) only partially overlap. (Figure 3) This suggests a

112

possible explanation for how Hairless and NICD compete. When in complex with

Hairless, the CTD structure is a higher energy conformer not often present in solution whereas when in complex with NICD, the CTD structure will be likely to be in a lower energy conformer often present in solution. Since the lower energy CTD conformer is more readily accessible, NICD is more likely to bind Su(H) and is therefore able to outcompete Hairless for Su(H) binding. Hairless binding requires the CTD to adopt a higher energy conformer, which is less likely to happen in solution. In addition, the RAM domain of NICD has no competition for binding to the BTD of Su(H), unlike the mammalian corepressors MINT, KyoT2, and RITA. Once the BTD-RAM domain interaction has been established, the short linker between the RAM and ANK domains of NICD serves as a tether to the ANK domain, effectively increasing the local ANK domain concentration around Su(H) to millimolar levels. This puts the ANK domain in proximity to the CTD at a much higher concentration than Hairless is likely to be. These two factors-CTD conformer energy and local ANK domain concentration-increase the likelihood of interaction between the ANK domain and CTD and may offer a partial explanation as to how NICD can compete with Hairless for binding to the CTD.

However, further research is necessary to fully elucidate the mechanism of Su(H)- coregulator competition. Once the structure of the ANK-CTD interface is known, it will be possible to manipulate it with mutations to enhance or reduce binding of the ANK domain and/or the CTD. These mutations, particularly those made to the Hairless binding site in the CTD, will be useful in binding assays to determine which mutations preferentially affect ANK domain or Hairless binding and thus which residues are involved in the binding competition.

113

Figure 3: Comparison of CTD Binding Site for Hairless and the ANK Domain Structural representation of the DNA-Su(H)-Hairless structure (unpublished data) (left), showing Hairless (purple) wedged into the CTD of Su(H) (orange), disrupting the structure of the CTD. Compared to the predicted structure of the DNA-Su(H)-ANK domain-Mastermind complex (right), which is modeled with the worm ternary complex structure (PDB: 2FO1) and a partial human ternary complex, lacking only the RAM domain (PDB: 2F8X).10,126 In this structural model, the ANK domain (blue) binds the CTD of Su(H)(orange) while Mastermind (red) interacts with the ANK domain, the CTD, and the NTD. Note the binding site of the ANK domain on the CTD partially overlaps with the binding site of Hairless.

114

As for characterizing the CSL-DNA interaction, the most important task will be to analyze numerous variations to the eight nucleotide CSL binding site by quantitative binding methods and by functional output assays. Currently, only a handful of sequence variants have been tested by both approaches. If we are gain any understanding of how CSL recognizes and binds specific DNA sequences or how binding affinity relates to function, a much larger collection of data must be generated.

While it is possible to analyze every possible nucleotide variation to the eight nucleotide

CSL binding sequence, it would not offer the most useful data, since it eliminates the important context of DNA shape and every single sequence would not necessarily correspond to an actual CSL binding site. Therefore, it will be important to test binding sites identified in vivo, as they are valid CSL binding sites and provide a ready platform for both methodological approaches—binding and output. As a next step, it would be ideal to develop a method to test these DNA sequences in a more realistic in vivo setting, so the effect of other factors, like the combinatorial interactions of other transcription factors or chromatin structure, could be considered. Structures of CSL bound to different DNA sequences could be solved, though, as we observed with the structures of the Hes1 consensus and consensus variant sites, there may be little difference in contacts made between CSL and DNA. To further study the effect of sequences flanking the CSL binding site, it will be necessary to measure a benchmark for comparison, such as the Hes1 Consensus site. It will also be important to test several flanking sequences identified in vivo by ITC and by functional assay to determine if there is any correlation between binding and output.

115

It will take many steps before we fully understand the transition between repression and activation complexes at a molecular level, and even more steps before we identify all the factors influencing that transition in vivo, but it is imperative for understanding Notch signaling well enough to safely and effectively manipulate it with therapeutics.

116

Bibliography

1. Gordon, W.R., et al. Structure of the Notch1-negative regulatory region: implications for normal activation and pathogenic signaling in T-ALL. Blood 113, 4381-4390 (2009). 2. Gordon, W.R., et al. Structural basis for autoinhibition of Notch. Nat Struct Mol Biol 14, 295-300 (2007). 3. Brou, C., et al. A novel proteolytic cleavage involved in Notch signaling: the role of the disintegrin-metalloprotease TACE. Mol Cell 5, 207-216 (2000). 4. Mumm, J.S., et al. A ligand-induced extracellular cleavage regulates gamma-secretase-like proteolytic activation of Notch1. Mol Cell 5, 197-206 (2000). 5. Fortini, M.E. Gamma-secretase-mediated proteolysis in cell-surface-receptor signalling. Nat Rev Mol Cell Biol 3, 673-684 (2002). 6. Mumm, J.S. & Kopan, R. Notch signaling: from the outside in. Dev Biol 228, 151-165 (2000). 7. Selkoe, D. & Kopan, R. Notch and Presenilin: regulated intramembrane proteolysis links development and degeneration. Annu Rev Neurosci 26, 565-597 (2003). 8. Kovall, R.A. & Hendrickson, W.A. Crystal structure of the nuclear effector of Notch signaling, CSL, bound to DNA. EMBO J 23, 3441-3451 (2004). 9. Nam, Y., Sliz, P., Song, L., Aster, J.C. & Blacklow, S.C. Structural basis for cooperativity in recruitment of MAML coactivators to Notch transcription complexes. Cell 124, 973-983 (2006). 10. Wilson, J.J. & Kovall, R.A. Crystal structure of the CSL-Notch-Mastermind ternary complex bound to DNA. Cell 124, 985-996 (2006). 11. Kurooka, H. & Honjo, T. Functional interaction between the mouse notch1 intracellular region and histone acetyltransferases PCAF and GCN5. J Biol Chem 275, 17211-17220 (2000). 12. Petcherski, A.G. & Kimble, J. Mastermind is a putative activator for Notch. Curr Biol 10, R471-473 (2000). 13. Wu, L., et al. MAML1, a human homologue of Drosophila mastermind, is a transcriptional co- activator for NOTCH receptors. Nat Genet 26, 484-489 (2000). 14. Wallberg, A.E., Pedersen, K., Lendahl, U. & Roeder, R.G. p300 and PCAF act cooperatively to mediate transcriptional activation from chromatin templates by notch intracellular domains in vitro. Mol Cell Biol 22, 7812-7819 (2002). 15. Kopan, R. & Ilagan, M.X. The canonical Notch signaling pathway: unfolding the activation mechanism. Cell 137, 216-233 (2009). 16. Fryer, C.J., White, J.B. & Jones, K.A. Mastermind recruits CycC:CDK8 to phosphorylate the Notch ICD and coordinate activation with turnover. Mol Cell 16, 509-520 (2004). 17. Fryer, C.J., Lamar, E., Turbachova, I., Kintner, C. & Jones, K.A. Mastermind mediates chromatin- specific transcription and turnover of the Notch enhancer complex. Genes Dev 16, 1397-1411 (2002). 18. Gupta-Rossi, N., et al. Functional interaction between SEL-10, an F-box protein, and the nuclear form of activated Notch1 receptor. J Biol Chem 276, 34371-34378 (2001). 19. Wu, G., et al. SEL-10 is an inhibitor of notch signaling that targets notch for ubiquitin-mediated protein degradation. Mol Cell Biol 21, 7403-7415 (2001). 20. Castro, B., Barolo, S., Bailey, A.M. & Posakony, J.W. Lateral inhibition in proneural clusters: cis- regulatory logic and default repression by Suppressor of Hairless. Development 132, 3333-3344 (2005). 21. Morel, V., et al. Transcriptional repression by suppressor of hairless involves the binding of a hairless-dCtBP complex in Drosophila. Curr Biol 11, 789-792 (2001). 117

22. Nagel, A.C., et al. Hairless-mediated repression of notch target genes requires the combined activity of Groucho and CtBP corepressors. Mol Cell Biol 25, 10433-10441 (2005). 23. Kao, H.Y., et al. A histone deacetylase corepressor complex regulates the Notch signal transduction pathway. Genes Dev 12, 2269-2277 (1998). 24. Oswald, F., et al. RBP-Jkappa/SHARP recruits CtIP/CtBP corepressors to silence Notch target genes. Mol Cell Biol 25, 10379-10390 (2005). 25. Neves, A. & Priess, J.R. The REF-1 family of bHLH transcription factors pattern C. elegans embryos through Notch-dependent and Notch-independent pathways. Dev Cell 8, 867-879 (2005). 26. Guruharsha, K.G., Kankel, M.W. & Artavanis-Tsakonas, S. The Notch signalling system: recent insights into the complexity of a conserved pathway. Nat Rev Genet 13, 654-666 (2012). 27. D'Souza, B., Meloty-Kapella, L. & Weinmaster, G. Canonical and non-canonical Notch ligands. Curr Top Dev Biol 92, 73-129 (2010). 28. Heitzler, P. Biodiversity and noncanonical Notch signaling. Curr Top Dev Biol 92, 457-481 (2010). 29. Couturier, L., Vodovar, N. & Schweisguth, F. Endocytosis by Numb breaks Notch symmetry at cytokinesis. Nat Cell Biol 14, 131-139 (2012). 30. Sprinzak, D., et al. Cis-interactions between Notch and Delta generate mutually exclusive signalling states. Nature 465, 86-90 (2010). 31. Louvi, A. & Artavanis-Tsakonas, S. Notch and disease: a growing field. Semin Cell Dev Biol 23, 473-480 (2012). 32. Mazzone, M., et al. Dose-dependent induction of distinct phenotypic responses to Notch pathway activation in mammary epithelial cells. Proc Natl Acad Sci U S A 107, 5012-5017 (2010). 33. Artavanis-Tsakonas, S. & Muskavitch, M.A. Notch: the past, the present, and the future. Curr Top Dev Biol 92, 1-29 (2010). 34. Dorer, D.R. & Christensen, A.C. A recombinational hotspot at the triplo-lethal locus of Drosophila melanogaster. Genetics 122, 397-401 (1989). 35. Kageyama, R., Ohtsuka, T. & Kobayashi, T. The Hes gene family: repressors and oscillators that orchestrate embryogenesis. Development 134, 1243-1251 (2007). 36. Fischer, A. & Gessler, M. Delta-Notch--and then? Protein interactions and proposed modes of repression by Hes and Hey bHLH factors. Nucleic Acids Res 35, 4583-4596 (2007). 37. Fisher, A. & Caudy, M. The function of hairy-related bHLH repressor proteins in cell fate decisions. Bioessays 20, 298-306 (1998). 38. Chin, M.T., et al. Cardiovascular basic helix loop helix factor 1, a novel transcriptional repressor expressed preferentially in the developing and adult cardiovascular system. J Biol Chem 275, 6381-6387 (2000). 39. Kokubo, H., Lun, Y. & Johnson, R.L. Identification and expression of a novel family of bHLH cDNAs related to Drosophila hairy and enhancer of split. Biochem Biophys Res Commun 260, 459-465 (1999). 40. Leimeister, C., Externbrink, A., Klamt, B. & Gessler, M. Hey genes: a novel subfamily of hairy- and Enhancer of split related genes specifically expressed during mouse embryogenesis. Mech Dev 85, 173-177 (1999). 41. Hirata, H., Tomita, K., Bessho, Y. & Kageyama, R. Hes1 and Hes3 regulate maintenance of the isthmic organizer and development of the mid/hindbrain. EMBO J 20, 4454-4466 (2001). 42. Ishibashi, M., et al. Targeted disruption of mammalian hairy and Enhancer of split homolog-1 (HES-1) leads to up-regulation of neural helix-loop-helix factors, premature neurogenesis, and severe neural tube defects. Genes Dev 9, 3136-3148 (1995).

118

43. Ohtsuka, T., et al. Hes1 and Hes5 as notch effectors in mammalian neuronal differentiation. EMBO J 18, 2196-2207 (1999). 44. Cau, E., Gradwohl, G., Casarosa, S., Kageyama, R. & Guillemot, F. Hes genes regulate sequential stages of neurogenesis in the olfactory epithelium. Development 127, 2323-2332 (2000). 45. Jensen, J., et al. Control of endodermal endocrine development by Hes-1. Nat Genet 24, 36-44 (2000). 46. Lee, H.Y., et al. Multiple requirements for Hes 1 during early eye formation. Dev Biol 284, 464- 478 (2005). 47. Tomita, K., et al. The bHLH gene Hes1 is essential for expansion of early T cell precursors. Genes Dev 13, 1203-1210 (1999). 48. Johnson, J.E., Birren, S.J., Saito, T. & Anderson, D.J. DNA binding and transcriptional regulatory activity of mammalian achaete-scute homologous (MASH) proteins revealed by interaction with a muscle-specific enhancer. Proc Natl Acad Sci U S A 89, 3596-3600 (1992). 49. Fukuda, A., et al. Ectopic pancreas formation in Hes1 -knockout mice reveals plasticity of endodermal progenitors of the gut, bile duct, and pancreas. J Clin Invest 116, 1484-1493 (2006). 50. Lee, J.C., et al. Regulation of the pancreatic pro-endocrine gene neurogenin3. Diabetes 50, 928- 936 (2001). 51. Hirata, H., et al. Oscillatory expression of the bHLH factor Hes1 regulated by a negative feedback loop. Science 298, 840-843 (2002). 52. Bessho, Y., Hirata, H., Masamizu, Y. & Kageyama, R. Periodic repression by the bHLH factor Hes7 is an essential mechanism for the somite segmentation clock. Genes Dev 17, 1451-1456 (2003). 53. Hirata, H., et al. Instability of Hes7 protein is crucial for the somite segmentation clock. Nat Genet 36, 750-754 (2004). 54. Bessho, Y., et al. Dynamic expression and essential functions of Hes7 in somite segmentation. Genes Dev 15, 2642-2647 (2001). 55. Massari, M.E. & Murre, C. Helix-loop-helix proteins: regulators of transcription in eucaryotic organisms. Mol Cell Biol 20, 429-440 (2000). 56. Cordle, J., et al. Localization of the delta-like-1-binding site in human Notch-1 and its modulation by calcium affinity. J Biol Chem 283, 11785-11793 (2008). 57. Sasai, Y., Kageyama, R., Tagawa, Y., Shigemoto, R. & Nakanishi, S. Two mammalian helix-loop- helix factors structurally related to Drosophila hairy and Enhancer of split. Genes Dev 6, 2620- 2634 (1992). 58. Iso, T., et al. HERP, a novel heterodimer partner of HES/E(spl) in Notch signaling. Mol Cell Biol 21, 6080-6089 (2001). 59. Dawson, S.R., Turner, D.L., Weintraub, H. & Parkhurst, S.M. Specificity for the hairy/enhancer of split basic helix-loop-helix (bHLH) proteins maps outside the bHLH domain and suggests two separable modes of transcriptional repression. Mol Cell Biol 15, 6923-6931 (1995). 60. Taelman, V., et al. Sequences downstream of the bHLH domain of the Xenopus hairy-related transcription factor-1 act as an extended dimerization domain that contributes to the selection of the partners. Dev Biol 276, 47-63 (2004). 61. Iso, T., Kedes, L. & Hamamori, Y. HES and HERP families: multiple effectors of the Notch signaling pathway. J Cell Physiol 194, 237-255 (2003). 62. Fischer, A., et al. Hey bHLH factors in cardiovascular development. Cold Spring Harb Symp Quant Biol 67, 63-70 (2002). 63. Andersson, E.R., Sandberg, R. & Lendahl, U. Notch signaling: simplicity in design, versatility in function. Development 138, 3593-3612 (2011).

119

64. Jeffries, S., Robbins, D.J. & Capobianco, A.J. Characterization of a high-molecular-weight Notch complex in the nucleus of Notch(ic)-transformed RKE cells and in a human T-cell leukemia cell line. Mol Cell Biol 22, 3927-3941 (2002). 65. Krejci, A., Bernard, F., Housden, B.E., Collins, S. & Bray, S.J. Direct response to Notch activation: signaling crosstalk and incoherent logic. Sci Signal 2, ra1 (2009). 66. Palomero, T., et al. NOTCH1 directly regulates c-MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth. Proc Natl Acad Sci U S A 103, 18261- 18266 (2006). 67. Ronchini, C. & Capobianco, A.J. Induction of cyclin D1 transcription and CDK2 activity by Notch(ic): implication for cell cycle disruption in transformation by Notch(ic). Mol Cell Biol 21, 5925-5934 (2001). 68. Rangarajan, A., et al. Notch signaling is a direct determinant of keratinocyte growth arrest and entry into differentiation. EMBO J 20, 3427-3436 (2001). 69. Bray, S.J. Notch signalling: a simple pathway becomes complex. Nat Rev Mol Cell Biol 7, 678-689 (2006). 70. Lai, E.C. & Orgogozo, V. A hidden program in Drosophila peripheral neurogenesis revealed: fundamental principles underlying sensory organ diversity. Dev Biol 269, 1-17 (2004). 71. Perdigoto, C.N. & Bardin, A.J. Sending the right signal: Notch and stem cells. Biochim Biophys Acta (2012). 72. Cordle, J., et al. A conserved face of the Jagged/Serrate DSL domain is involved in Notch trans- activation and cis-inhibition. Nat Struct Mol Biol 15, 849-857 (2008). 73. Axelrod, J.D. Delivering the lateral inhibition punchline: it's all about the timing. Sci Signal 3, pe38 (2010). 74. Barad, O., Rosin, D., Hornstein, E. & Barkai, N. Error minimization in lateral inhibition circuits. Sci Signal 3, ra51 (2010). 75. Guo, M., Jan, L.Y. & Jan, Y.N. Control of daughter cell fates during asymmetric division: interaction of Numb and Notch. Neuron 17, 27-41 (1996). 76. Blanpain, C., Lowry, W.E., Pasolli, H.A. & Fuchs, E. Canonical notch signaling functions as a commitment switch in the epidermal lineage. Genes Dev 20, 3022-3035 (2006). 77. Burns, C.E., Traver, D., Mayhall, E., Shepard, J.L. & Zon, L.I. Hematopoietic stem cell fate is established by the Notch-Runx pathway. Genes Dev 19, 2331-2342 (2005). 78. Fre, S., et al. Notch signals control the fate of immature progenitor cells in the intestine. Nature 435, 964-968 (2005). 79. Micchelli, C.A. & Perrimon, N. Evidence that stem cells reside in the adult Drosophila midgut epithelium. Nature 439, 475-479 (2006). 80. Mourikis, P., et al. A critical requirement for notch signaling in maintenance of the quiescent skeletal muscle stem cell state. Stem Cells 30, 243-252 (2012). 81. Powell, B.C., Passmore, E.A., Nesci, A. & Dunn, S.M. The Notch signalling pathway in hair growth. Mech Dev 78, 189-192 (1998). 82. Rock, J.R., et al. Notch-dependent differentiation of adult airway basal stem cells. Cell Stem Cell 8, 639-648 (2011). 83. Ellisen, L.W., et al. TAN-1, the human homolog of the Drosophila notch gene, is broken by chromosomal translocations in T lymphoblastic neoplasms. Cell 66, 649-661 (1991). 84. Koch, U. & Radtke, F. Notch in T-ALL: new players in a complex disease. Trends Immunol 32, 434- 442 (2011). 85. Weng, A.P., et al. Activating mutations of NOTCH1 in human T cell acute lymphoblastic leukemia. Science 306, 269-271 (2004). 120

86. Lee, S.Y., et al. Gain-of-function mutations and copy number increases of Notch2 in diffuse large B-cell lymphoma. Cancer Sci 100, 920-926 (2009). 87. Demehri, S., Turkoz, A. & Kopan, R. Epidermal Notch1 loss promotes skin tumorigenesis by impacting the stromal microenvironment. Cancer Cell 16, 55-66 (2009). 88. Koch, U. & Radtke, F. Notch signaling in solid tumors. Curr Top Dev Biol 92, 411-455 (2010). 89. Fre, S., et al. Notch and Wnt signals cooperatively control cell proliferation and tumorigenesis in the intestine. Proc Natl Acad Sci U S A 106, 6309-6314 (2009). 90. Li, L., et al. Alagille syndrome is caused by mutations in human Jagged1, which encodes a ligand for Notch1. Nat Genet 16, 243-251 (1997). 91. Oda, T., et al. Mutations in the human Jagged1 gene are responsible for Alagille syndrome. Nat Genet 16, 235-242 (1997). 92. Turnpenny, P.D. & Ellard, S. Alagille syndrome: pathogenesis, diagnosis and management. Eur J Hum Genet 20, 251-257 (2012). 93. McDaniell, R., et al. NOTCH2 mutations cause Alagille syndrome, a heterogeneous disorder of the notch signaling pathway. Am J Hum Genet 79, 169-173 (2006). 94. McCright, B., Lozier, J. & Gridley, T. A mouse model of Alagille syndrome: Notch2 as a genetic modifier of Jag1 haploinsufficiency. Development 129, 1075-1082 (2002). 95. Kusumi, K., et al. The mouse pudgy mutation disrupts Delta homologue Dll3 and initiation of early somite boundaries. Nat Genet 19, 274-278 (1998). 96. Bulman, M.P., et al. Mutations in the human delta homologue, DLL3, cause axial skeletal defects in spondylocostal dysostosis. Nat Genet 24, 438-441 (2000). 97. Turnpenny, P.D., et al. Novel mutations in DLL3, a somitogenesis gene encoding a ligand for the Notch signalling pathway, cause a consistent pattern of abnormal vertebral segmentation in spondylocostal dysostosis. J Med Genet 40, 333-339 (2003). 98. Chapman, G., Sparrow, D.B., Kremmer, E. & Dunwoodie, S.L. Notch inhibition by the ligand DELTA-LIKE 3 defines the mechanism of abnormal vertebral segmentation in spondylocostal dysostosis. Hum Mol Genet 20, 905-916 (2011). 99. Sparrow, D.B., et al. Mutation of the LUNATIC FRINGE gene in humans causes spondylocostal dysostosis with a severe vertebral phenotype. Am J Hum Genet 78, 28-37 (2006). 100. Sparrow, D.B., Guillen-Navarro, E., Fatkin, D. & Dunwoodie, S.L. Mutation of Hairy-and- Enhancer-of-Split-7 in humans causes spondylocostal dysostosis. Hum Mol Genet 17, 3761-3766 (2008). 101. Sparrow, D.B., Sillence, D., Wouters, M.A., Turnpenny, P.D. & Dunwoodie, S.L. Two novel missense mutations in HAIRY-AND-ENHANCER-OF-SPLIT-7 in a family with spondylocostal dysostosis. Eur J Hum Genet 18, 674-679 (2010). 102. Cornier, A.S., et al. Mutations in the MESP2 gene cause spondylothoracic dysostosis/Jarcho- Levin syndrome. Am J Hum Genet 82, 1334-1341 (2008). 103. Whittock, N.V., et al. Mutated MESP2 causes spondylocostal dysostosis in humans. Am J Hum Genet 74, 1249-1254 (2004). 104. Isidor, B., et al. Serpentine fibula-polycystic kidney syndrome caused by truncating mutations in NOTCH2. Hum Mutat 32, 1239-1242 (2011). 105. Isidor, B., et al. Truncating mutations in the last exon of NOTCH2 cause a rare skeletal disorder with osteoporosis. Nat Genet 43, 306-308 (2011). 106. Gray, M.J., et al. Serpentine fibula polycystic kidney syndrome is part of the phenotypic spectrum of Hajdu-Cheney syndrome. Eur J Hum Genet 20, 122-124 (2012). 107. Hilton, M.J., et al. Notch signaling maintains bone marrow mesenchymal progenitors by suppressing osteoblast differentiation. Nat Med 14, 306-314 (2008). 121

108. Engin, F., et al. Dimorphic effects of Notch signaling in bone homeostasis. Nat Med 14, 299-305 (2008). 109. MacGrogan, D., Nus, M. & de la Pompa, J.L. Notch signaling in cardiac development and disease. Curr Top Dev Biol 92, 333-365 (2010). 110. McElhinney, D.B., et al. Analysis of cardiovascular phenotype and genotype-phenotype correlation in individuals with a JAG1 mutation and/or Alagille syndrome. Circulation 106, 2567- 2574 (2002). 111. Eldadah, Z.A., et al. Familial Tetralogy of Fallot caused by mutation in the jagged1 gene. Hum Mol Genet 10, 163-169 (2001). 112. Krantz, I.D., et al. Jagged1 mutations in patients ascertained with isolated congenital heart defects. Am J Med Genet 84, 56-60 (1999). 113. Garg, V., et al. Mutations in NOTCH1 cause aortic valve disease. Nature 437, 270-274 (2005). 114. Chabriat, H., Joutel, A., Dichgans, M., Tournier-Lasserve, E. & Bousser, M.G. Cadasil. Lancet Neurol 8, 643-653 (2009). 115. Dziewulska, D. & Lewandowska, E. Pericytes as a new target for pathological processes in CADASIL. Neuropathology 32, 515-521 (2012). 116. Lewandowska, E., Dziewulska, D., Parys, M. & Pasennik, E. Ultrastructure of granular osmiophilic material deposits (GOM) in arterioles of CADASIL patients. Folia Neuropathol 49, 174-180 (2011). 117. Joutel, A., et al. Strong clustering and stereotyped nature of Notch3 mutations in CADASIL patients. Lancet 350, 1511-1515 (1997). 118. Opherk, C., et al. CADASIL mutations enhance spontaneous multimerization of NOTCH3. Hum Mol Genet 18, 2761-2767 (2009). 119. Duering, M., et al. Co-aggregate formation of CADASIL-mutant NOTCH3: a single-particle analysis. Hum Mol Genet 20, 3256-3265 (2011). 120. Arboleda-Velasquez, J.F., et al. Linking Notch signaling to ischemic stroke. Proc Natl Acad Sci U S A 105, 4856-4861 (2008). 121. Kovall, R.A. & Blacklow, S.C. Mechanistic insights into Notch receptor signaling from structural and biochemical studies. Curr Top Dev Biol 92, 31-71 (2010). 122. Lei, L., Xu, A., Panin, V.M. & Irvine, K.D. An O-fucose site in the ligand binding domain inhibits Notch activation. Development 130, 6411-6421 (2003). 123. Panin, V.M., Papayannopoulos, V., Wilson, R. & Irvine, K.D. Fringe modulates Notch-ligand interactions. Nature 387, 908-912 (1997). 124. Greenwald, I. & Seydoux, G. Analysis of gain-of-function mutations of the lin-12 gene of Caenorhabditis elegans. Nature 346, 197-199 (1990). 125. Kopan, R., Schroeter, E.H., Weintraub, H. & Nye, J.S. Signal transduction by activated mNotch: importance of proteolytic processing and its regulation by the extracellular domain. Proc Natl Acad Sci U S A 93, 1683-1688 (1996). 126. Nam, Y., Weng, A.P., Aster, J.C. & Blacklow, S.C. Structural requirements for assembly of the CSL.intracellular Notch1.Mastermind-like 1 transcriptional activation complex. The Journal of biological chemistry 278, 21232-21239 (2003). 127. Tamura, K., et al. Physical interaction between a novel domain of the receptor Notch and the transcription factor RBP-J kappa/Su(H). Current biology : CB 5, 1416-1423 (1995). 128. Chitnis, A. Why is delta endocytosis required for effective activation of notch? Dev Dyn 235, 886- 894 (2006). 129. Le Borgne, R., Bardin, A. & Schweisguth, F. The roles of receptor and ligand endocytosis in regulating Notch signaling. Development 132, 1751-1762 (2005). 122

130. Le Borgne, R., Remaud, S., Hamel, S. & Schweisguth, F. Two distinct E3 ubiquitin ligases have complementary functions in the regulation of delta and serrate signaling in Drosophila. PLoS Biol 3, e96 (2005). 131. Pavlopoulos, E., et al. neuralized Encodes a peripheral membrane protein involved in delta signaling and endocytosis. Dev Cell 1, 807-816 (2001). 132. Hagedorn, E.J., et al. Drosophila melanogaster auxilin regulates the internalization of Delta to control activity of the Notch signaling pathway. J Cell Biol 173, 443-452 (2006). 133. Wang, W. & Struhl, G. Distinct roles for Mind bomb, Neuralized and Epsin in mediating DSL endocytosis and signaling in Drosophila. Development 132, 2883-2894 (2005). 134. Lai, E.C. Protein degradation: four E3s for the notch pathway. Curr Biol 12, R74-78 (2002). 135. Qiu, L., et al. Recognition and ubiquitination of Notch by Itch, a hect-type E3 ubiquitin ligase. J Biol Chem 275, 35734-35737 (2000). 136. Hori, K., et al. Drosophila deltex mediates suppressor of Hairless-independent and late- endosomal activation of Notch signaling. Development 131, 5527-5537 (2004). 137. Matsuno, K., Diederich, R.J., Go, M.J., Blaumueller, C.M. & Artavanis-Tsakonas, S. Deltex acts as a positive regulator of Notch signaling through interactions with the Notch ankyrin repeats. Development 121, 2633-2644 (1995). 138. Mukherjee, A., et al. Regulation of Notch signalling by non-visual beta-arrestin. Nat Cell Biol 7, 1191-1201 (2005). 139. Sestan, N., Artavanis-Tsakonas, S. & Rakic, P. Contact-dependent inhibition of cortical neurite growth mediated by notch signaling. Science 286, 741-746 (1999). 140. Sasamura, T., et al. neurotic, a novel maternal neurogenic gene, encodes an O- fucosyltransferase that is essential for Notch-Delta interactions. Development 130, 4785-4795 (2003). 141. Shi, S. & Stanley, P. Protein O-fucosyltransferase 1 is an essential component of Notch signaling pathways. Proc Natl Acad Sci U S A 100, 5234-5239 (2003). 142. Okajima, T., Xu, A., Lei, L. & Irvine, K.D. Chaperone activity of protein O-fucosyltransferase 1 promotes notch receptor folding. Science 307, 1599-1603 (2005). 143. Haines, N. & Irvine, K.D. Glycosylation regulates Notch signalling. Nat Rev Mol Cell Biol 4, 786- 797 (2003). 144. Hicks, C., et al. Fringe differentially modulates Jagged1 and Delta1 signalling through Notch1 and Notch2. Nat Cell Biol 2, 515-520 (2000). 145. Shimizu, K., et al. Manic fringe and lunatic fringe modify different sites of the Notch2 extracellular region, resulting in different signaling modulation. J Biol Chem 276, 25753-25758 (2001). 146. Berdnik, D., Torok, T., Gonzalez-Gaitan, M. & Knoblich, J.A. The endocytic protein alpha-Adaptin is required for numb-mediated asymmetric cell division in Drosophila. Dev Cell 3, 221-231 (2002). 147. McGill, M.A. & McGlade, C.J. Mammalian numb proteins promote Notch1 receptor ubiquitination and degradation of the Notch1 intracellular domain. J Biol Chem 278, 23196- 23203 (2003). 148. Chien, C.T., Wang, S., Rothenberg, M., Jan, L.Y. & Jan, Y.N. Numb-associated kinase interacts with the phosphotyrosine binding domain of Numb and antagonizes the function of Numb in vivo. Mol Cell Biol 18, 598-607 (1998). 149. O'Connor-Giles, K.M. & Skeath, J.B. Numb inhibits membrane localization of Sanpodo, a four- pass transmembrane protein, to promote asymmetric divisions in Drosophila. Dev Cell 5, 231- 243 (2003). 123

150. D'Souza, B., Miyamoto, A. & Weinmaster, G. The many facets of Notch ligands. Oncogene 27, 5148-5167 (2008). 151. Logeat, F., et al. The Notch1 receptor is cleaved constitutively by a furin-like convertase. Proc Natl Acad Sci U S A 95, 8108-8112 (1998). 152. Blaumueller, C.M., Qi, H., Zagouras, P. & Artavanis-Tsakonas, S. Intracellular cleavage of Notch leads to a heterodimeric receptor on the plasma membrane. Cell 90, 281-291 (1997). 153. Seugnet, L., Simpson, P. & Haenlin, M. Requirement for dynamin during Notch signaling in Drosophila neurogenesis. Dev Biol 192, 585-598 (1997). 154. Parks, A.L., Klueg, K.M., Stout, J.R. & Muskavitch, M.A. Ligand endocytosis drives receptor dissociation and activation in the Notch pathway. Development 127, 1373-1385 (2000). 155. Komatsu, H., et al. OSM-11 facilitates LIN-12 Notch signaling during Caenorhabditis elegans vulval development. PLoS Biol 6, e196 (2008). 156. Okochi, M., et al. Presenilins mediate a dual intramembranous gamma-secretase cleavage of Notch-1. EMBO J 21, 5408-5416 (2002). 157. Aster, J.C. Deregulated NOTCH signaling in acute T-cell lymphoblastic leukemia/lymphoma: new insights, questions, and opportunities. Int J Hematol 82, 295-301 (2005). 158. de Celis, J.F. & Bray, S. Feed-back mechanisms affecting Notch activation at the dorsoventral boundary in the Drosophila wing. Development 124, 3241-3251 (1997). 159. Fehon, R.G., et al. Molecular interactions between the protein products of the neurogenic loci Notch and Delta, two EGF-homologous genes in Drosophila. Cell 61, 523-534 (1990). 160. Rebay, I., et al. Specific EGF repeats of Notch mediate interactions with Delta and Serrate: implications for Notch as a multifunctional receptor. Cell 67, 687-699 (1991). 161. de Celis, J.F. & Bray, S.J. The Abruptex domain of Notch regulates negative interactions between Notch, its ligands and Fringe. Development 127, 1291-1302 (2000). 162. Okajima, T., Xu, A. & Irvine, K.D. Modulation of notch-ligand binding by protein O- fucosyltransferase 1 and fringe. J Biol Chem 278, 42340-42345 (2003). 163. Pourquie, O. The segmentation clock: converting embryonic time into spatial pattern. Science 301, 328-330 (2003). 164. Sato, Y., Yasuda, K. & Takahashi, Y. Morphological boundary forms by a novel inductive event mediated by Lunatic fringe and Notch during somitic segmentation. Development 129, 3633- 3644 (2002). 165. Kiernan, A.E., et al. The Notch ligand Jagged1 is required for inner ear sensory development. Proc Natl Acad Sci U S A 98, 3873-3878 (2001). 166. Tsai, H., et al. The mouse slalom mutant demonstrates a role for Jagged1 in neuroepithelial patterning in the organ of Corti. Hum Mol Genet 10, 507-512 (2001). 167. Beres, T.M., et al. PTF1 is an organ-specific and Notch-independent basic helix-loop-helix complex containing the mammalian Suppressor of Hairless (RBP-J) or its paralogue, RBP-L. Mol Cell Biol 26, 117-130 (2006). 168. Minoguchi, S., et al. RBP-L, a transcription factor related to RBP-Jkappa. Mol Cell Biol 17, 2679- 2687 (1997). 169. Friedmann, D.R., Wilson, J.J. & Kovall, R.A. RAM-induced allostery facilitates assembly of a notch pathway active transcription complex. The Journal of biological chemistry 283, 14781-14791 (2008). 170. Murzin, A.G., Lesk, A.M. & Chothia, C. beta-Trefoil fold. Patterns of structure and sequence in the Kunitz inhibitors interleukins-1 beta and 1 alpha and fibroblast growth factors. J Mol Biol 223, 531-543 (1992).

124

171. Bailey, A.M. & Posakony, J.W. Suppressor of hairless directly activates transcription of enhancer of split complex genes in response to Notch receptor activity. Genes Dev 9, 2609-2622 (1995). 172. Lecourtois, M. & Schweisguth, F. The neurogenic suppressor of hairless DNA-binding protein mediates the transcriptional activation of the enhancer of split complex genes triggered by Notch signaling. Genes Dev 9, 2598-2608 (1995). 173. Nellesen, D.T., Lai, E.C. & Posakony, J.W. Discrete enhancer elements mediate selective responsiveness of enhancer of split complex genes to common transcriptional activators. Dev Biol 213, 33-53 (1999). 174. Tun, T., et al. Recognition sequence of a highly conserved DNA binding protein RBP-J kappa. Nucleic Acids Res 22, 965-971 (1994). 175. Del Bianco, C., et al. Notch and MAML-1 complexation do not detectably alter the DNA binding specificity of the transcription factor CSL. PLoS One 5, e15034 (2010). 176. Hamidi, H., Gustafason, D., Pellegrini, M. & Gasson, J. Identification of novel targets of CSL- dependent Notch signaling in hematopoiesis. PLoS One 6, e20022 (2011). 177. Collins, K.J., Yuan, Z. & Kovall, R.A. Structure and function of the CSL-KyoT2 corepressor complex: a negative regulator of Notch signaling. Structure 22, 70-81 (2014). 178. Friedmann, D.R. & Kovall, R.A. Thermodynamic and structural insights into CSL-DNA complexes. Protein Sci 19, 34-46 (2010). 179. Arnett, K.L., et al. Structural and mechanistic insights into cooperative assembly of dimeric Notch transcription complexes. Nat Struct Mol Biol 17, 1312-1317 (2010). 180. Bray, S. & Bernard, F. Notch targets and their regulation. Curr Top Dev Biol 92, 253-275 (2010). 181. Hamaguchi, Y., Matsunami, N., Yamamoto, Y. & Honjo, T. Purification and characterization of a protein that binds to the recombination signal sequence of the immunoglobulin J kappa segment. Nucleic Acids Res 17, 9015-9026 (1989). 182. Amsen, D., et al. Direct regulation of Gata3 expression determines the T helper differentiation potential of Notch. Immunity 27, 89-99 (2007). 183. Fang, T.C., et al. Notch directly regulates Gata3 expression during T helper 2 cell differentiation. Immunity 27, 100-110 (2007). 184. Lake, R.J., Tsai, P.F., Choi, I., Won, K.J. & Fan, H.Y. RBPJ, the major transcriptional effector of Notch signaling, remains associated with chromatin throughout mitosis, suggesting a role in mitotic bookmarking. PLoS Genet 10, e1004204 (2014). 185. Bertagna, A., Toptygin, D., Brand, L. & Barrick, D. The effects of conformational heterogeneity on the binding of the Notch intracellular domain to effector proteins: a case of biologically tuned disorder. Biochemical Society transactions 36, 157-166 (2008). 186. Del Bianco, C., Aster, J.C. & Blacklow, S.C. Mutational and energetic studies of Notch 1 transcription complexes. Journal of molecular biology 376, 131-140 (2008). 187. Lubman, O.Y., Ilagan, M.X., Kopan, R. & Barrick, D. Quantitative dissection of the Notch:CSL interaction: insights into the Notch-mediated transcriptional switch. Journal of molecular biology 365, 577-589 (2007). 188. Johnson, S.E., Ilagan, M.X., Kopan, R. & Barrick, D. Thermodynamic analysis of the CSL x Notch interaction: distribution of binding energy of the Notch RAM region to the CSL beta-trefoil domain and the mode of competition with the viral transactivator EBNA2. J Biol Chem 285, 6681-6692 (2010). 189. Zweifel, M.E., Leahy, D.J., Hughson, F.M. & Barrick, D. Structure and stability of the ankyrin domain of the Drosophila Notch receptor. Protein Sci 12, 2622-2632 (2003). 190. Kovall, R.A. More complicated than it looks: assembly of Notch pathway transcription complexes. Oncogene 27, 5099-5109 (2008). 125

191. Johnson, S.E. & Barrick, D. Dissecting and circumventing the requirement for RAM in CSL- dependent Notch signaling. PLoS One 7, e39093 (2012). 192. Wharton, K.A., Yedvobnick, B., Finnerty, V.G. & Artavanis-Tsakonas, S. opa: a novel family of transcribed repeats shared by the Notch locus and other developmentally regulated loci in D. melanogaster. Cell 40, 55-62 (1985). 193. Kelly, D.F., Lake, R.J., Walz, T. & Artavanis-Tsakonas, S. Conformational variability of the intracellular domain of Drosophila Notch and its interaction with Suppressor of Hairless. Proc Natl Acad Sci U S A 104, 9591-9596 (2007). 194. Petcherski, A.G. & Kimble, J. LAG-3 is a putative transcriptional activator in the C. elegans Notch pathway. Nature 405, 364-368 (2000). 195. Maillard, I., et al. Mastermind critically regulates Notch-mediated lymphoid cell fate decisions. Blood 104, 1696-1702 (2004). 196. Alves-Guerra, M.C., Ronchini, C. & Capobianco, A.J. Mastermind-like 1 Is a specific coactivator of beta-catenin transcription activation and is essential for colon carcinoma cell survival. Cancer Res 67, 8690-8698 (2007). 197. Shen, H., et al. The Notch coactivator, MAML1, functions as a novel coactivator for MEF2C- mediated transcription and is required for normal myogenesis. Genes Dev 20, 675-688 (2006). 198. Zhao, Y., et al. The notch regulator MAML1 interacts with and functions as a coactivator. J Biol Chem 282, 11969-11981 (2007). 199. Duan, H., et al. Insensitive is a corepressor for Suppressor of Hairless and regulates Notch signalling during neural development. EMBO J 30, 3120-3133 (2011). 200. Goodfellow, H., et al. Gene-specific targeting of the histone chaperone asf1 to mediate silencing. Dev Cell 13, 593-600 (2007). 201. Liefke, R., et al. Histone demethylase KDM5A is an integral part of the core Notch-RBP-J repressor complex. Genes Dev 24, 590-601 (2010). 202. Moshkin, Y.M., et al. Histone chaperones ASF1 and NAP1 differentially modulate removal of active histone marks by LID-RPD3 complexes during NOTCH silencing. Mol Cell 35, 782-793 (2009). 203. Shi, Y., et al. Sharp, an inducible cofactor that integrates nuclear receptor repression and activation. Genes Dev 15, 1140-1151 (2001). 204. Zhou, S., et al. SKIP, a CBF1-associated protein, interacts with the ankyrin repeat domain of NotchIC To facilitate NotchIC function. Mol Cell Biol 20, 2400-2410 (2000). 205. Bang, A.G., Hartenstein, V. & Posakony, J.W. Hairless is required for the development of adult sensory organ precursor cells in Drosophila. Development 111, 89-104 (1991). 206. Schweisguth, F. & Posakony, J.W. Antagonistic activities of Suppressor of Hairless and Hairless control alternative cell fates in the Drosophila adult epidermis. Development 120, 1433-1441 (1994). 207. Maier, D., et al. In vivo structure-function analysis of Drosophila Hairless. Mech Dev 67, 97-106 (1997). 208. Brou, C., et al. Inhibition of the DNA-binding activity of Drosophila suppressor of hairless and of its human homolog, KBF2/RBP-J kappa, by direct protein-protein interaction with Drosophila hairless. Genes Dev 8, 2491-2503 (1994). 209. Barolo, S., Stone, T., Bang, A.G. & Posakony, J.W. Default repression and Notch signaling: Hairless acts as an adaptor to recruit the corepressors Groucho and dCtBP to Suppressor of Hairless. Genes Dev 16, 1964-1976 (2002). 210. Maier, D., et al. Structural and functional analysis of the repressor complex in the Notch signaling pathway of Drosophila melanogaster. Mol Biol Cell 22, 3242-3252 (2011). 126

211. Maier, D. Hairless: the ignored antagonist of the Notch signalling pathway. Hereditas 143, 212- 221 (2006). 212. Ariyoshi, M. & Schwabe, J.W. A conserved structural motif reveals the essential transcriptional repression function of Spen proteins and their role in developmental signaling. Genes Dev 17, 1909-1920 (2003). 213. Kuroda, K., et al. Regulation of marginal zone B cell development by MINT, a suppressor of Notch/RBP-J signaling pathway. Immunity 18, 301-312 (2003). 214. Oswald, F., et al. SHARP is a novel component of the Notch/RBP-Jkappa signalling pathway. EMBO J 21, 5417-5426 (2002). 215. VanderWielen, B.D., Yuan, Z., Friedmann, D.R. & Kovall, R.A. Transcriptional repression in the Notch pathway: thermodynamic characterization of CSL-MINT (Msx2-interacting nuclear target protein) complexes. J Biol Chem 286, 14892-14902 (2011). 216. Salat, D., Liefke, R., Wiedenmann, J., Borggrefe, T. & Oswald, F. ETO, but not leukemogenic fusion protein AML1/ETO, augments RBP-Jkappa/SHARP-mediated repression of notch target genes. Mol Cell Biol 28, 3502-3512 (2008). 217. Taniguchi, Y., Furukawa, T., Tun, T., Han, H. & Honjo, T. LIM protein KyoT2 negatively regulates transcription by association with the RBP-J DNA-binding protein. Mol Cell Biol 18, 644-654 (1998). 218. Wacker, S.A., et al. RITA, a novel modulator of Notch signalling, acts via nuclear export of RBP-J. EMBO J 30, 43-56 (2011). 219. Gordon, W.R., Arnett, K.L. & Blacklow, S.C. The molecular logic of Notch signaling--a structural and biochemical perspective. J Cell Sci 121, 3109-3119 (2008). 220. Kovall, R.A. Structures of CSL, Notch and Mastermind proteins: piecing together an active transcription complex. Current opinion in structural biology 17, 117-127 (2007). 221. Aster, J.C., et al. Essential roles for ankyrin repeat and transactivation domains in induction of T- cell leukemia by notch1. Mol Cell Biol 20, 7505-7515 (2000). 222. Jeffries, S. & Capobianco, A.J. Neoplastic transformation by Notch requires nuclear localization. Mol Cell Biol 20, 3928-3941 (2000). 223. Roehl, H., Bosenberg, M., Blelloch, R. & Kimble, J. Roles of the RAM and ANK domains in signaling by the C. elegans GLP-1 receptor. EMBO J 15, 7002-7012 (1996). 224. Jarriault, S., et al. Signalling downstream of activated mammalian Notch. Nature 377, 355-358 (1995). 225. Hori, K., Sen, A. & Artavanis-Tsakonas, S. Notch signaling at a glance. Journal of cell science 126, 2135-2140 (2013). 226. Fortini, M.E. Introduction--Notch in development and disease. Semin Cell Dev Biol 23, 419-420 (2012). 227. Liu, J., Sato, C., Cerletti, M. & Wagers, A. Notch signaling in the regulation of stem cell self- renewal and differentiation. Curr Top Dev Biol 92, 367-409 (2010). 228. Radtke, F., Fasnacht, N. & Macdonald, H.R. Notch signaling in the immune system. Immunity 32, 14-27 (2010). 229. Ntziachristos, P., Lim, J.S., Sage, J. & Aifantis, I. From fly wings to targeted cancer therapies: a centennial for notch signaling. Cancer cell 25, 318-334 (2014). 230. Borggrefe, T. & Oswald, F. The Notch signaling pathway: transcriptional regulation at Notch target genes. Cellular and molecular life sciences : CMLS 66, 1631-1646 (2009). 231. Bray, S. & Furriols, M. Notch pathway: making sense of suppressor of hairless. Current biology : CB 11, R217-221 (2001).

127

232. Fortini, M.E. & Artavanis-Tsakonas, S. The suppressor of hairless protein participates in notch receptor signaling. Cell 79, 273-282 (1994). 233. Yuan, Z., Friedmann, D.R., VanderWielen, B.D., Collins, K.J. & Kovall, R.A. Characterization of CSL (CBF-1, Su(H), Lag-1) mutants reveals differences in signaling mediated by Notch1 and Notch2. The Journal of biological chemistry 287, 34904-34916 (2012). 234. Artavanis-Tsakonas, S., Rand, M.D. & Lake, R.J. Notch signaling: cell fate control and signal integration in development. Science 284, 770-776 (1999). 235. Jencks, W.P. On the attribution and additivity of binding energies. Proc Natl Acad Sci U S A 78, 4046-4050 (1981). 236. Siggers, T. & Gordan, R. Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res 42, 2099-2111 (2014). 237. Liu, F. & Posakony, J.W. Role of architecture in the function and specificity of two Notch- regulated transcriptional enhancer modules. PLoS Genet 8, e1002796 (2012). 238. Jennings, B., Preiss, A., Delidakis, C. & Bray, S. The Notch signalling pathway is required for Enhancer of split bHLH protein expression during neurogenesis in the Drosophila embryo. Development 120, 3537-3548 (1994). 239. Rebeiz, M., Reeves, N.L. & Posakony, J.W. SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation. Proc Natl Acad Sci U S A 99, 9888-9893 (2002). 240. Krejci, A. & Bray, S. Notch activation stimulates transient and selective binding of Su(H)/CSL to target enhancers. Genes Dev 21, 1322-1327 (2007). 241. Ong, C.T., et al. Target selectivity of vertebrate notch proteins. Collaboration between discrete domains and CSL-binding site architecture determines activation probability. J Biol Chem 281, 5106-5119 (2006). 242. Castel, D., et al. Dynamic binding of RBPJ is determined by Notch signaling status. Genes Dev 27, 1059-1071 (2013). 243. Wang, H., et al. Genome-wide analysis reveals conserved and divergent features of Notch1/RBPJ binding in human and murine T-lymphoblastic leukemia cells. Proc Natl Acad Sci U S A 108, 14908-14913 (2011). 244. Cave, J.W. Selective repression of Notch pathway target gene transcription. Dev Biol 360, 123- 131 (2011). 245. Housden, B.E., et al. Transcriptional dynamics elicited by a short pulse of notch activation involves feed-forward regulation by E(spl)/Hes genes. PLoS Genet 9, e1003162 (2013). 246. Housden, B.E., Terriente-Felix, A. & Bray, S.J. Context-dependent enhancer selection confers alternate modes of notch regulation on argos. Mol Cell Biol 34, 664-672 (2014). 247. Gordan, R., et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep 3, 1093-1104 (2013). 248. Coumailleau, F. & Schweisguth, F. Insensible is a novel nuclear inhibitor of Notch activity in Drosophila. PLoS One 9, e98213 (2014). 249. Abhiman, S., Iyer, L.M. & Aravind, L. BEN: a novel domain in chromatin factors and DNA viral proteins. Bioinformatics 24, 458-461 (2008). 250. Korutla, L., Degnan, R., Wang, P. & Mackler, S.A. NAC1, a cocaine-regulated POZ/BTB protein interacts with CoREST. J Neurochem 101, 611-618 (2007). 251. Korutla, L., Wang, P.J. & Mackler, S.A. The POZ/BTB protein NAC1 interacts with two different histone deacetylases in neuronal-like cultures. J Neurochem 94, 786-793 (2005).

128

252. Kaul-Ghanekar, R., Jalota, A., Pavithra, L., Tucker, P. & Chattopadhyay, S. SMAR1 and Cux/CDP modulate chromatin and act as negative regulators of the TCRbeta enhancer (Ebeta). Nucleic Acids Res 32, 4862-4875 (2004). 253. Rampalli, S., Pavithra, L., Bhatt, A., Kundu, T.K. & Chattopadhyay, S. Tumor suppressor SMAR1 mediates cyclin D1 repression by recruitment of the SIN3/histone deacetylase 1 complex. Mol Cell Biol 25, 8415-8429 (2005). 254. Reeves, N. & Posakony, J.W. Genetic programs activated by proneural proteins in the developing Drosophila PNS. Dev Cell 8, 413-425 (2005). 255. Roegiers, F., et al. Frequent unanticipated alleles of lethal giant larvae in Drosophila second chromosome stocks. Genetics 182, 407-410 (2009). 256. Dai, Q., et al. The BEN domain is a novel sequence-specific DNA-binding domain conserved in neural transcriptional repressors. Genes Dev 27, 602-614 (2013). 257. Newberry, E.P., Latifi, T. & Towler, D.A. The RRM domain of MINT, a novel Msx2 binding protein, recognizes and regulates the rat osteocalcin promoter. Biochemistry 38, 10678-10690 (1999). 258. Doroquez, D.B., Orr-Weaver, T.L. & Rebay, I. Split ends antagonizes the Notch and potentiates the EGFR signaling pathways during Drosophila eye development. Mech Dev 124, 792-806 (2007). 259. Abdel-Rahman, N., Martinez-Arias, A. & Blundell, T.L. Probing the druggability of protein-protein interactions: targeting the Notch1 receptor ankyrin domain using a fragment-based approach. Biochem Soc Trans 39, 1327-1333 (2011). 260. Ehebauer, M.T., Chirgadze, D.Y., Hayward, P., Martinez Arias, A. & Blundell, T.L. High-resolution crystal structure of the human Notch 1 ankyrin domain. Biochem J 392, 13-20 (2005).

129

Appendix A: A Combination of Computational and Experimental Approaches Identifies DNA Sequence Constraints Associated with Target Site Binding Specificity of the Transcription Factor CSL

In this work I contributed quantitative binding data discussed in Chapter 3.

Published as:

Torella, R.; Li, J.; Kinrade, E.; Cerda-Moya, G.; Contreras, A.N.; Foy, R.; Stojnic, R.; Glen, R.C.; Kovall, R.A.; Adryan, B.; and Bray, S.J. (2014) A Combination of Computational and Experimental Approaches Identifies DNA Sequence Constraints Associated with Target Site Binding Specificity of the Transcription Factor CSL. Nucleic Acids Research 42 (16): 10550-10563.

130

Nucleic Acids Research Advance Access published August 11, 2014 Nucleic Acids Research, 2014 1 doi: 10.1093/nar/gku730 A combination of computational and experimental approaches identifies DNA sequence constraints associated with target site binding specificity of the transcription factor CSL Rubben Torella1, Jinghua Li2, Eddie Kinrade3,4, Gustavo Cerda-Moya2, Ashley N. Contreras5, Robert Foy3,4, Robert Stojnic2,4, Robert C. Glen1,*, Rhett A. Kovall5, Boris Adryan3,4,* and Sarah J. Bray2,*

1Department of Chemistry, University of Cambridge, Cambridge, UK, 2Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK, 3Department of Genetics, University of Cambridge, Downloaded from Cambridge, UK, 4Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK and 5University of Cincinnati College of Medicine, Department of Molecular Genetics, Biochemistry and Microbiology, 231 Albert Sabin Way CARE4836, Cincinnati, OH 45267-0524, USA

Received April 11, 2014; Revised July 29, 2014; Accepted July 29, 2014 http://nar.oxfordjournals.org/

ABSTRACT increasing our understanding of CSL–DNA interac- tions and how these may impact on its transcriptional Regulation of transcription is fundamental to devel- control. opment and physiology, and occurs through bind- ing of transcription factors to specific DNA se- / quences in the genome. CSL (CBF1 Suppressor of INTRODUCTION at University of Cincinnati on August 12, 2014 / Hairless LAG-1), a core component of the Notch sig- Transcription is controlled through a number of mecha- naling pathway, is one such transcription factor that nisms including the specific interactions between DNA and acts in concert with co-activators or co-repressors to regulatory proteins. Sequence-specific protein–DNA recog- control the activity of associated target genes. One nition (1–3) occurs through both direct (base) and indi- fundamental question is how CSL can recognize and rect (shape) readout of a DNA sequence by transcription select among different DNA sequences available in factors (TFs). Base readout involves the recognition of vivo and whether variations between selected se- base-specific groups (principally through hydrogen bond quences can influence its function. We have there- donor/acceptor and hydrophobic interactions) by comple- fore investigated CSL–DNA recognition using com- mentary groups present on amino acid side chains in the tar- putational approaches to analyze the energetics of geting protein (4). Shape readout describes base pairs that are not directly in contact with the protein, but likely influ- CSL bound to different DNAs and tested the in sil- ence the DNA structure and shape and hence the protein– ico predictions with in vitro and in vivo assays. Our DNA interactions (5). This combination results in each TF results reveal novel aspects of CSL binding that having a specific repertoire of DNA sequences with which may help explain the range of binding observed in it can form complexes of varying stabilities. While consid- vivo. In addition, using molecular dynamics simu- erable progress has been made in understanding the basic lations, we show that domain–domain correlations principles of protein–DNA recognition, it is often unclear within CSL differ significantly depending on the DNA what spectrum of binding sites in vivo is functionally rele- sequence bound, suggesting that different DNA se- vant for a given TF. Furthermore, the consequences of dif- quences may directly influence CSL function. Taken ferent DNA target sites on the (dynamic) behavior of the together, our results, based on computational chem- bound protein, and hence the functional outcome of bind- istry approaches, provide valuable insights into tran- ing, are rarely considered. The TF CSL provides a powerful and important model scription factor-DNA binding, in this particular case to investigate the relevance of protein dynamics in DNA

*To whom correspondence should be addressed. Tel:+44 (0)1223 765222 Fax: +44 (0)1223 333840; Email: [email protected] Correspondence may also be addressed to Robert C. Glen. Tel: +44 (0)1223 336472 Fax: +44 (0)1223 763076; Email: [email protected] Correspondence may also be addressed to Boris Adryan. Tel: +44 (0)1223 760209 Fax: +44 (0)1223 760241; Email: [email protected]

C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

131 2 Nucleic Acids Research, 2014 bound complexes. CSL is the nuclear effector of the Notch (CGTGTGAC) motif or to the unbound state. Such corre- signaling pathway (6). In un-stimulated cells CSL binds lations could influence the recruitment of other proteins, in- DNA in the presence of a co-repressor, blocking transcrip- cluding NICD, MAM and co-repressors. An important ob- tion. The interaction of the Notch receptor with its ligands servation is that DNA may influence protein-binding events initiates a cascade of cleavage reactions that culminate in the remote from the DNA-binding domain, by changing the dy- release of the intracellular domain of the Notch receptor, namic regime of the complex. NICD (7–9). This fragment binds directly to CSL, recruit- ing the co-activator Mastermind (MAM) (10–12)topro- MATERIALS AND METHODS mote active transcription (13,14). Thus, the protein–protein interactions CSL makes with different binding partners are Preparation of the starting structure critical in determining the regulatory outcome of DNA The starting structure for the analysis was mouse CSL binding. bound to DNA (HES-1 (hairy and enhancer of split-1) site Structural studies have illuminated how CSL interacts 5’-TGTGGGAA-3’; PDB code 3BRG, resolution 2.2 A),˚ with both co-regulators and DNA. Its conserved core con- which contains all of the CSL conserved features but lacks sists of three domains: the N-terminal domain (NTD), the the variable N-terminal residues (52 residues in mouse). ␤ -trefoil domain (BTD) and the C-terminal domain (CTD) The structure was checked for missing residues using Pro- Downloaded from (15). NTD and BTD both contribute to DNA recognition, tein Preparation Wizard (Schrodinger, MAESTRO pack- interacting with base pairs in the major and minor grooves age, version 9.0). Two missing gaps (K197-L200, E255- of the DNA helix, respectively. The CTD binds the ankyrin- T262, using the crystallographic numbering) were modeled repeat (ANK) domain of NICD and, in the tertiary com- using MODELLER (20,21) (version 9.8), aligning the pri- plex, both CTD and NTD contact MAM through a long mary sequence with the missing loops. The pKa’s of the ␣ http://nar.oxfordjournals.org/ -helix (10,16). High affinity binding of the so-called RAM residues of the protein were calculated using the PROPKA region (RBP-Jk-associated-molecule) of NICD to BTD is (22) and H++ (23) software. Residues were assigned a pro- also important for recruiting NICD to CSL and in facili- tonation state consistent with physiological pH (7.4). Five tating the binding of MAM (17). Finally, CTD and BTD hundred iterations of molecular mechanics were performed also participate in binding to co-repressors (18). This high- and the lowest energy structure was taken as a starting point resolution picture provides a powerful starting point for for further calculations. molecular simulation studies to explore the binding space Changes to the DNA consensus sequences were per- and its effects on the molecular dynamics (MD) of CSL. formed using the FOLDX software (24,25), a tool for en- As for many TFs, there already exists considerable infor- ergy calculations and macromolecular design. The consen- at University of Cincinnati on August 12, 2014 mation about DNA motifs that can be recognized by CSL. sus complex was energy minimized and the selected muta- However, much of this derives from sequences that are evo- tions in the DNA sequence were performed, adapting the lutionarily related to the Hairy-Enhancer of split (HES) conformations of the side chains of the residues interacting target genes, which may distort the position weight ma- with the DNA fragment to a new low energy state in the trix (PWM) generated for CSL. Indeed, such PWMs can- presence of the base pairs of the new fragment. The - not fully account for all of the sites occupied in vivo by tive binding energy result for each DNA sequence has been CSL, based on genome-wide chromatin immunoprecipita- also normalized taking into consideration possible defor- tion studies. Additionally, a recent in vitro analysis (19)has mations that may have occurred on the protein and on the shown that CSL may bind a large repertoire of DNA se- DNA. Logos for MotifMap/TRANSFAC and FOLDX se- quences. Therefore, one question is whether the physico- quences were created using Weblogo (26). chemical properties evident in the structure of CSL can be used for predicting the range of plausible DNA binding mo- MD simulation protocol tifs. A second question is whether different DNA sequences bound by CSL can influence its inter-domain dynamics, MD simulations were performed with the AMBER11 thereby potentially impacting CSL function. package (http://ambermd.org/)(27), using the AMBER One approach to investigate TF-target recognition and FF99SB force field (28). All calculations were made with specificity is to apply computational methods that model the CUDA enabled version of PMEMD (29), using TESLA the chemical interactions between the protein and DNA, GPUs at the High Performance Computing cluster (Uni- using information from the crystal structure. Here we have versity of Cambridge). Four TESLA-GPU’s perform ca. 4 applied this strategy to CSL, performing an in silico mu- ns/day, when computing a system of ca. 67 500 atoms. A tational analysis of the binding motif and modeling the dodecahedral box of water (TIP3P (30)) was built around dynamics of the protein when it is bound to different the complexes and a physiological concentration of 0.15 M DNA motifs. The results highlight two features of CSL se- of NaCl was added to the box using the following equation: quence recognition that could not be inferred by sequence- N(ions) = N(water molecules) × 0.15/55.555. A 1 nm non- based bioinformatics, and whose relevance we have demon- bonded cutoff was used for the van der Waals interactions, strated using biochemical and transcriptional assays. Fur- while the Particle Mesh Ewald summation method was used thermore, the simulations show that CSL responds differ- to deal with long-range Coulomb interactions. The Berend- ently depending on the precise sequence of the DNA: large sen thermostat was used to control temperature and pres- differences between the domain–domain correlation path- sure (31). ways occur when CSL binds a canonical DNA binding The following protocol was used for all the simulations: motif CGTGGGAA compared to a lower affinity motif (i) in vacuo minimization (1000 steps); (ii) minimization,

132 Nucleic Acids Research, 2014 3 keeping the complexes fixed, allowing water molecules and (Supplementary Table S1) were added as specific competi- ions to equilibrate (2000 steps); (iii) minimization of all the tors. Binding reactions were incubated on ice for 30 min. system, without restrictions (2000 steps); (iv) equilibration, Oligonucleotide–protein complexes were then separated on 1 ns; (v) production phase of 48 ns. To improve the sam- 5% native polyacrylamide gels at 75V in 0.5× TBE buffer. pling of the MD simulation, three simulation replicas of 48 The products were then transferred to a Biodyne B mem- ns were performed with different starting velocities, for an brane (Pierce) at 30V in 0.5× TBE buffer followed by UV overall simulation time of 720 ns. For the analysis, the first 8 cross-linking. The biotin-labeled reaction products were ns of each trajectory were eliminated (as the system is equi- then visualized by incubation with streptavidin horseradish librating) and the following 40 ns of simulation from each peroxidase conjugate and subsequent incubation with ECL trajectory were concatenated into a macro-trajectory of 120 chemiluminescent reagents. ns for each system. All the analyses were performed with packages from the AMBERTOOLS 1.4 and GROMACS package, after the Luciferase assays trajectories were transformed into a suitable format. Root Oligonucleotides containing the indicated CSL motifs (Sup- Mean Square Deviation (RMSD) and Root Mean Square plementary Table S2) were used to generate fragments for

Fluctuation (RMSF) of the MD trajectories have been per- Downloaded from cloning into a luciferase vector containing a minimal pro- formed as in (32–34). The annotation of protein residues moter from the hsp70 gene (pGL3::Min(39)) (Supplemen- and DNA bases follows the order of residues and DNA tary Table S3). Drosophila S2 cell culture and transfec- specified in the original PDB file; the first residue isthe tions were as described previously (39) and expression lev- 53rd residue of the original structure. The unbound struc- els were compared in the presence and absence of pMT- ture of the protein was created from the 3BRG crystallo-

NICD after 16–20 h induction with 0.5 mM CuSO . Empty http://nar.oxfordjournals.org/ graphic structure of the protein bound to DNA, by strip- 4 pMT was used as a control and the fold change was calcu- ping DNA from the complex followed by optimization (en- lated as a ratio between values obtained with pMT-NICD ergy minimization) in implicit solvent. After this step, the and empty pMT. As the reporters also contained binding unbound structure followed the protocols described previ- sites for Grainyhead (Grh), (40), to analyze repression cells ously for the other structures. were co-transfected with pMT-Grh in place of pMT-NICD in combination with the indicated luciferase reporter con- Internal coordination and rigidity struct. For assays investigating CSL mutants, the transfec- In order to provide a simple and sequence-related one- tions contained the indicated pMT-CSL constructs in com- at University of Cincinnati on August 12, 2014 dimensional descriptor of the contribution of each residue bination with pMT-NICD. Controls with pMT-NICD or contributing toward the connectivity/cooperativity of the pMT-CSLwt were combined with empty pMT in the appro- motion within the protein, an analysis based upon signal priate ratios. At least three biological replicates were per- propagation was used (35). In order to describe the cor- formed for all experiments. relation between atom pairs undergoing dynamics, a ma- trix ICRM (Internal Coordination and Rigidity Matrix) was used for protein–ligand (36,37) and protein–DNA in- AAA mutants and in vivo rescue assays teraction (32). A threshold distance of 30 A˚ was considered, The choice of residues for mutagenesis involved taking into keeping in consideration the distance between the residues consideration different factors: the number of communi- that interact with DNA and the residues in the CTD do- cation pathways, structural factors and interactions with main. other macromolecules. Three residues have been selected for mutation to alanine: T365, F366 and Y367, based on the fact that they are involved in domain–domain commu- In vitro binding experiments (electro-mobility shift assay) nication and are not involved in interactions with other GST-Su(H) [110–594] (38) fusion protein was purified proteins/DNA or are structurally relevant for CSL. For from 500 ml transformed Escherichia coli strain BL21 cells the modeling experiments, mutations were introduced into using Glutathione-Agarose (Sigma-Aldrich) and concen- the CSL structure, bound to the consensus TGTGGGAA trated, by centrifuging at room temperature for 10 min, DNA sequence (PDB ID: 3BRG), using MODELLER 9v8. using AmiconR Ultra-0.5 Centrifugal Filter Units (Milli- The same protocol described before was then used for ana- pore) to ∼1mg/ml. The oligonucleotide 5-ACCGAAA lyzing the 413AAA mutant. CCGTGGGAACTGGTAGAAAG-3 and its reverse com For the in vivo experiments, a CSL genome rescue con- plement, 5-CTTTCTACCAGTTCCCACGGTTTCGGT struct was produced by amplifying the CSL locus delineated -3, were labeled using the biotin 3-end labeling kit fol- by the neighboring genes lethal(2)35Bg (NM 080199.2) and lowing the manufacturer’s instructions (Pierce). The two crinkled (NM 165099.2) (see Supplementary Table S3). The single-stranded oligonucleotides were then annealed. The ∼6.3 kb of Drosophila CSL locus was then cloned into a DNA-binding reactions contained 1 ␮l of GST-Su(H) and pAttB plasmid (41) to generate CSLwt. In order to generate 25 fmol of biotin labeled DNA in a 5 ␮l volume binding the genomic rescue mutant CSL413AAA, site-directed mu- reaction (150 mM KCl, 50 mM Tris pH 7.4, 1 mM DTT, tagenesis was performed to replace the residues Q413, F414 2% polyvinyl alcohol). In total 30 ng/␮l of poly(dI·dC) and Y415 with AAA (Supplementary Table S4). The same was also included as a non-specific competitor. Differ- mutagenesis was performed to generate the AAA mutated ent amount of unlabeled double-stranded oligonucleotides version in pMT for luciferase assays.

133 4 Nucleic Acids Research, 2014

Drosophila transgenic lines were generated for both to provide information about the full repertoire of binding CSLwt and CSL413AAA by inserting the genomic res- sites. To achieve this, the FOLDX software (24)wasused cue construct into 86Fb located on the third chro- to calculate changes in binding energy when varying the nu- mosome using C31 integrase-mediated system (41,42). cleotides within the constraints of the CSL–DNA co-crystal carrying the genomic rescue constructs structure (Figure 1A). While energy minimization strategies were then crossed into the Su(H)[SF8] (43) lethal al- on protein force fields were initially used to calculate the en- lele for Drosophila CSL. Rescue experiments were per- ergy function for small molecules bound to proteins (46), formed by crossing Su(H)[SF8]/CyO; CSLwt/CSLwt FOLDX has been used to study the impact of mutations or Su(H)[SF8]/CyO: CSL[413AAA]/CSL[413AAA] to on protein stability and has subsequently been refined to Su(H)[AR9]/CyO (43) and scoring the percentage of vi- encompass the possibility of studying protein–protein and able progeny without the CyO marker. protein–DNA interactions (25). The success of such an ap- proach was shown for high-affinity binding sites of Pax6 Isothermal titration calorimetry of CSL–DNA complexes (47), but has not been explored more widely and in par- ticular its ability to detect lower affinity sites has not been The cloning, expression and purification of Mus musculus assessed. CSL (amino acids 53–474) was described previously (17).

Starting from the X-ray structure where CSL binds its Downloaded from Briefly, CSL was overexpressed as a GST-fusion protein highest-affinity DNA motif comprised of eight nucleotides in bacteria and isolated from crude lysate by affinity chro- CGTGGGAA (PDB code 3BRG (17), Figure 1B), all 48 matography. After cleaving off the fusion tag, CSL was pu- permutations of an 8-nt motif were tested and the resulting rified to homogeneity using a combination of ion exchange 65 536 relative binding energies calculated (Figure 1A). Af- and size exclusion chromatography. ter ranking these, a threshold of 3 kcal/mol, computed by Oligonucleotides were ordered from Eurofins MWG the difference from the top predicted DNA sequence, was http://nar.oxfordjournals.org/ Operon. Each single-stranded DNA was resuspended in used as an initial cutoff, separating 220 putative ‘bound’ water and purified over a GE Healthcare Life Sciences Re- motifs from the residual 65 316 DNA sequences. These 220 source Q ion exchange column. The resulting peak was ‘bound’ sequences were used to generate a binding logo, pooled, concentrated and exchanged into a buffer of 10 mM where the sequences were weighted according to their com- Tris (pH 7.6), 500 mM NaCl and 1 mM MgCl2 in an Am- puted energies (Figure 1C). A similar logo was generated icon Ultra centrifugal filter (3000 MWCO). Concentrated when the sequences were compiled without the weighting single-stranded DNAs were spectroscopically quantified at scheme (Supplementary Figure S1A) and neither logo was 260 nm, combined in equimolar amounts, boiled for 10 min substantially altered by extending the number of sequences at University of Cincinnati on August 12, 2014 and allowed to cool to room temperature to ensure optimal included (see Supplementary Figure S1B and the explo- duplex annealing. rative tool at http://webtools.sysbiol.cam.ac.uk/MotifTool/ Purified components to be used in isothermal titration for further assessment of CSL FOLDX predictions and calorimetry (ITC) experiments were degassed and buffer- how different free energy cutoffs vary the resulting energy matched by running over a size exclusion column into the logo). Finally to assess whether interaction effects between experimental buffer of 50 mM sodium phosphate (pH 6.5) any of the positions in the motif were detected, the condi- and 150 mM sodium chloride. Concentrations of the com- tional probability of finding a nucleotide at a certain po- ponents were determined by UV absorbance at 260 nm sition when a given nucleotide was present at another posi- (DNA) and 280 nm (CSL protein). All experiments were tion was calculated (Supplementary Figure S1C). Although performed with a MicroCal VP-ITC microcalorimeter at ◦ some specific interactions were detected, these were only 10 C using 10 ␮M macromolecule (CSL protein) in the cell found for nucleotides with a low likelihood of being present and 100 ␮M ligand (DNA) in the syringe. Data was ana- in the bound sequences, such that the overall probability for lyzed using the ORIGIN software and fitted to a one-site interactions was small. binding model. The reported binding data is the average of Several striking features are evident when comparing at least three individual experiments (n = 3). the FOLDX binding logo with the energy logo obtained from empirical binding analysis (Figure 1C;48,49). First, RESULTS FOLDX indicates a strong preference for a cytosine at po- sition 1. Second, while there is thought to be a strong pref- Prediction of CSL binding repertoire from structural calcu- erence for guanine at positions 2 and 6, FOLDX indicated lations using FOLDX much greater sequence tolerance at these positions with lit- While there is substantial evidence that the DNA consen- tle preference at position 2 and tolerance for G or A at po- sus sequences recognized with highest affinity by CSL are sition 6. Finally, FOLDX results also suggest that consider- [T/C]GTG[G/T]GAA, these are insufficient to explain the able variability could be accepted at position 5, where there full repertoire of in vivo binding events. For example, in vitro is conventionally thought to be a preference for G or A. CSL can bind motifs with any nucleotide in position 5 with To ascertain whether these features only emerged when nanomolar binding affinity (44). Furthermore CSL appears motifs with low predicted binding energies were included, to accept a broader range of nucleotides than the consensus the nucleotide frequencies were plotted for different thresh- guanine and adenine at positions 2 and 8 (45). In addition, olds. This revealed that the characteristics of C preference at many of the sites occupied in vivo do not conform to this position 1 and sequence tolerance at positions 2 and 6 were strict consensus. We therefore set out to investigate whether evident even when only the motifs with the top 15–20 ener- protein structure-based in silico approaches could be used gies were considered (Figure 1D). We therefore further scru-

134 Nucleic Acids Research, 2014 5

ABStarting PDB Structure: 3BRG Complex

Minimization

Complex Minimized FOLDX Mutation structure 48 = 65536

Complex Mutated C 2 Downloaded from

FOLDX energy Bits 1 FOLDX calculation

Position Calculation ΔG binding 2 152 36784

Bits MotifMap/ + Deformation energies 1 TRANSFAC http://nar.oxfordjournals.org/ D 152 36784 Position Position 1 Position 2 Position 3 Position 4 Position 5 Position 6 Position 7 Position 8 at University of Cincinnati on August 12, 2014

Figure 1. Overview of the FOLDX strategy and results. (A) Flow chart summarizing the FOLDX computational strategy.(B) CSL–DNA structure used for the analysis (CSL domains NTD cyan; BTD green, CTD orange). The position of the residues that were mutated to perturb inter-domain communication (see Figure 5) are indicated by orange spheres. (C) Comparison of information logos obtained from the >3kcal/mol FOLDX predictions, weighted according to a Boltzmann distribution of energies, and from empirical binding analysis (RBPJ M01112 MotifMap/Transfac ). tinized the FOLDX predictions to investigate the strong cy- the structure of mouse CSL bound to a slightly differ- tosine bias at position 1 (Table 1). The energetic distribution ent DNA sequence CGTGTGAA (PDB ID: 3IAG) (44), shows that FOLDX penalizes the presence of a thymine in we found a similar preference for cytosine in position 1 position 1 by ca. 2 kcal/mol. This difference can be mainly (Supplementary Table S5). Secondly, the FOLDX calcula- attributed to the H-bond energy between side chains and tion using the Lag-1 (the CSL homologue in Caenorhabdi- could be explained by the contacts made by a glutamine tis elegans) structure (PDB ID 1TTU) bound to the con- residue (222 in murine CSL) within the BTD of CSL. This sensus DNA sequence TGTGGGAA (15) also yielded a glutamine residue can make hydrogen bond contacts with preference for cytosine in position 1 (Supplementary Ta- the complementary base in position 1; guanine when posi- ble S5). However, in this case, the difference between cy- tion 1 is a cytosine, adenine when position 1 is a thymine tosine and thymine at position 1 was smaller than for the (Figure 2A). The guanine can make a bidentate interaction two mouse CSL–DNA structures analyzed. Nevertheless, (NH2 and aromatic N) with the glutamine, while the ade- the additional calculations suggest that the preference for nine can only present one group (aromatic N) to hydrogen cytosine over thymine at position 1 is a feature conserved for bond with the glutamine, providing a physico-chemical ex- all CSL orthologs, which is consistent with previous studies planation for the preference of cytosine over thymine in po- demonstrating that CSL from mouse, Drosophila and worm sition 1. Indeed, the single X-ray structure where CSL is all have similar binding characteristics (44). bound to a DNA motif with cytosine in position 1 (PDB ID 3IAG, -CGTGTGAA-, (44)) shows such a bidentate in- teraction between the glutamine and guanine, confirming Functional relevance of FOLDX predictions: cytosine at po- that this contact occurs. sition 1 To exclude the possibility that the preference for cyto- sine in position 1 was a consequence of specific elements The FOLDX results suggest that CSL motifs with C at posi- within the 3BRG structure, we similarly investigated two tion 1 will have higher binding energies. A preference for cy- other CSL–DNA structures with FOLDX. Firstly, using tosine in position 1 is not represented in the canonical PWM but was indicated by results from protein binding microar-

135 6 Nucleic Acids Research, 2014

S1 Con S2 S3 A B CGTGGGAA CGTTTTAA Con TGTGGGAA CGTGTGAA CYT (P1) P 5x 25x 125x 5x 25x 125x 0 5x 25x 125x 5x 25x 125x P GLN

GUA THY (P1)

GLN Downloaded from ADE

S1 S2 Con C D 12 CGTGGGAA Con TGTGGGAA CGTTTTAA P 2.5x 5x 10x 20x 0 2.5x 5x 10x 20x 2.5x 5x 10x

10 http://nar.oxfordjournals.org/

8

6

4 at University of Cincinnati on August 12, 2014

2 Fold change in luciferase activity with NICD

0

S1 Con S2 S3

E CGTGGGAACGTTTTAATGTGGGAACGTGTGAA

12

10

8

6

4

2

Fold change in luciferase activity with NICD 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 Su(H) S1: CGTGGGAA Con: CGTTTTAA S2: TGTGGGAA S3: CGTGTGAA

Figure 2. Preference for cytosine over thymine in position 1 of the CSL DNA consensus sequence. (A) Physico-chemical explanation for the preference of cytosine over thymine in position 1: the complementary guanine can offer two functional groups for making hydrogen-bond contacts (NH2 and aromatic N), while the adenine can only offer one functional group (aromatic N) for an interaction with glutamine (GLN: residue 222 in murine CSL, residue 401 in Lag-1). (B) EMSA measuring CSL binding in the presence of different concentrations of the indicated competitor oligonucleotides. Con: no competitor, P: probe only. The white arrow indicates the position of unbound probe and the black arrow indicates the position of bound probe in CSL complexes. (C) EMSA measuring CSL binding in the presence of lower dilutions of S1 and S2 competitors as indicated, labeled as in B. (D) Response of reporters containing the indicated oligonucleotides to NICD, measured as the fold change in luciferase activity relative to controls with pMT in extracts from transfected cells. Activities were normalized to co-transfected renilla plasmid to control for transfection efficiencies. Error bars depict the standard error of the mean from >3 biological replicate experiments. (E) Response of reporters containing the indicated oligonucleotides to NICD, in the presence of varying concentrations of additional CSL as indicated. Fold change in luciferase activity was measured as in D. Error bars depict the standard error ofthe mean from >3 biological replicate experiments.

136 Nucleic Acids Research, 2014 7

Table 1. FOLDX analysis of energetics: difference between thymine and cytosine in position 1 of the DNA binding sequence (energy expressed in kcal/mol) DNA sequence Clash DNA Clash protein Interaction energy Sidechain H-bond CGTGGGAA 11.16 37.67 −12.94 −16.75 TGTGGGAA 11.36 37.74 −10.93 −14.76 rays (PBM; (19)) and by bacterial 1-hybrid experiments test- MotifMap/TRANSFAC-derived CSL PWMs. For exam- ing Lag-1 binding-site specificities (50). ple the sequence AATGGGAA (S4) scored highly in PBM To investigate CSL binding characteristics further, we (19) and is in the top 1% of motifs predicted by the PWM compared the C/T motif variants in two assays. First, we (e.g. using Patser (51)). This motif is also present at quite tested the ability of the C (S1) and T (S2) variants to high frequency in regions occupied by CSL in ChIP (fourth compete for binding using an electro-mobility shift assay most enriched motif, 5.19% peaks contain this motif) and (EMSA). In these experiments, binding of Drosophila CSL functional CATG(G/A)GAA motifs are present in sev- to labeled S1 sequence in the presence of differing concen- eral CSL regulated enhancers (52). Another FOLDX vari- trations of unlabeled S1 and S2 was measured. For compari- ant, CCTGAGAA (S5), was above the threshold of bind- son, we used a variant with a T at position 5 (CGTGTGAA; ing in PBM analysis (19) and included as a contributor Downloaded from S3), because this motif is not often represented in sequences to some canonical PWM. Comparing the activity of these contributing to the PWM. While both S1 and S2 were able two sequences in the EMSA competition assays, we found to out-compete most of the labeled sequence when present that both had intermediate capability to compete with the in 25× excess, there was a marked (3.67-fold) difference be- canonical sequence (detectable at 125× excess; Figure 4A). tween the two (Figure 2B, Supplementary Figure S2). Simi- Likewise, both demonstrated functional activity in the re- lar differences were also evident at lower dilutions (e.g. 2.84- porter assays, exhibiting >2× stimulation in the presence http://nar.oxfordjournals.org/ fold difference at 10× excess; Figure 2C). The S3 sequence of NICD, indicating the potential for variations at this posi- competed less well than S2, exhibiting a 9-fold difference tion (Figure 4B). Although the activities were low, they nev- with S1 and a 2.4-fold difference with S2 when present at ertheless were significantly different from control sequences 25× excess (Figure 2B). where all critical residues were replaced by T (CGTTTTAA: Second, the activity of luciferase reporters containing Con) and from a motif where positions 4/5 were substi- four copies of each motif (arranged as two paired sites) was tuted (CGTAAGAA; S6); Figure 4B. Surprisingly, the lat- measured (Figure 2D). Again the S2, T variant, was con- ter was previously reported to have stronger binding than siderably less active than the S1, C variant, exhibiting 36% the S1, TGTGGGAA position 1 variant (53). The ITC anal- at University of Cincinnati on August 12, 2014 of the luciferase signal, although it performed significantly ysis confirms the low affinity binding for CGTAAGAA (Kd better than the S3 sequence. Furthermore, the differences in of ∼50 ␮M; Figure 3F) although, notably, this is still signif- the activities of the S1 and S2 sequences were not amelio- icantly better than detected for a control ‘unbound’ DNA rated by increasing the amount of CSL over the endogenous sequence (TCATACCT; Figure 3C). Finally, a related se- levels (Figure 2E). All these data therefore support the no- quence (TATAAGAA; S7), which was previously reported tion that sequences containing C at position 1 have stronger to be bound by CSL (Su(H)) (54), failed to compete at binding/function than those with T. Nevertheless, it is evi- 125× in the EMSA experiments and showed no response dent that the S2 motif with T at position 1 still retains sig- to NICD in the reporter assays (Figure 4A and B). nificant binding, contrary to the FOLDX predictions. FOLDX also predicts that adenine at position 6 would be As the functional studies support the notion that the C energetically favourable, although conventionally only gua- at position 1 is more favorable for CSL activity, we subse- nine is considered at this position, with CGTGAAAA (S8) quently measured the binding affinities of CSL for the C/T one of its top ranked motifs. In EMSA experiments the S8 variants by ITC (Figure 3). Using purified murine CSL pro- sequence exhibited very weak competition, but nevertheless tein with chemically synthesized oligomeric DNA duplexes, its activity was similar to the position 2 variant motifs in the energetics of binding to C and T variant forms of the the luciferase experiments (Figure 4B), and was consider- consensus motif was measured. The results clearly demon- ably better than sequences with position 4/5 substitutions. strated that CSL binds more strongly to the sequence with Furthermore, in ITC assays there was measurable, albeit C at position 1, exhibiting a >8-fold difference in the cal- weak, binding to CSL (Kd >50 ␮M). Such CGTGAAAA culated dissociation constant (Kd). Thus, C at position 1, motifs may therefore have some functional binding in vivo, which is highlighted by the FOLDX predictions, does make although the data do not support their high energetic rank- a significant contribution to the binding preference of CSL. ing by FOLDX.

Functional relevance of FOLDX predictions: tolerance at po- Relevance of other motif variants sitions 2 and 6 In addition, we tested several other sequences that fell The FOLDX predictions suggest that, over the range of within the top 1% of FOLDX predictions, including several the top 220 motifs, there is no strong sequence preference that had been identified as binding/functional by previous at position 2. Canonical CSL sequence logos indicate that assays. For example, it has previously been shown how mu- guanine is preferred at this position, although variations tations at positions 5 (6)and8(45) of the DNA sequence with high scores are found in both the PBM dataset of recognized by CSL can be tolerated, mainly due to a re- CSL bound DNA sequences (19) and are represented in duced number of specific side chain–nitrogen base interac-

137 8 Nucleic Acids Research, 2014 Downloaded from http://nar.oxfordjournals.org/ at University of Cincinnati on August 12, 2014

Figure 3. CSL–DNA ITC binding experiments. (A–E) Figure shows representative thermograms (raw heat signal and nonlinear least-squares fit to the integrated data) for CSL binding to different DNA duplexes. Relative affinities and specific DNA sequences are shown for each experiment. Twenty titrations were performed per experiment, consisting of 14 ␮l injections that were spaced 120 s apart. tions (15). For this reason, the sequence CGTGTGAC (S9) A second sequence in the top 1% of FOLDX motifs, but has been tested using ITC, showing that it is bound by CSL not predicted based on MotifMap/TRANSFAC-derived with a moderate Kd (1.7 ␮M; Figure 3E), which is signif- PWMs nor classified as bound in the PBM analysis (19), icantly better than the CGTAAGAA (S5) motif. In agree- is CATGGGGA (S10). This exhibited similar low levels of ment, this sequence exhibited intermediate levels of binding activity in response to NICD in the luciferase assays, al- in the EMSA competition assay and gave rise to ∼2-fold though the ability to compete for binding in EMSA was levels of activation in response to NICD, similar to other not detectable at 125× excess (Figure 4A and B). Finally, motifs with intermediate binding (Figure 4A and B). CTGGGGAA was ranked 85th on the basis of FOLDX

138 Nucleic Acids Research, 2014 9

S4 S5 S6 S7 S8 S9 S10 A AATGGGAA CCTGAGAA Con CGTAAGAA TATAAGAA CGTGAAAA CGTGTGAC Con CATGGGGA P P 5x 25x 125x 5x 25x 125x 0 5x 25x125x 5x 25x 125x P 5x 25x 125x 5x 25x 125x 0 5x 25x 125x Downloaded from

B 3 C D 8 * * 2.5 ** *** 8

2 6 http://nar.oxfordjournals.org/ 6 1.5 4 4 1

2 2 0.5

0 fold change in luciferase activity with Grh Fold change in luciferase activity with NICD Fold change in luciferase activity with NICD Con S4 S5 S6 S7 S8 S9 S10 S1 S9 S6 S2 S1 Con S9 S2 at University of Cincinnati on August 12, 2014

CGTTTTAA CGTAAGAA CGTGGGAACGTTTTAACGTGTGACCGTGTGACTGTGGGAATGTGGGAA AATGGGAACCTGAGAACGTAAGAATATAAGAA CGTGAAAACGTGTGACCATGGGGA CGTGGGAA CGTGTGAC TGTGGGAA

Figure 4. Functional relevance of FOLDX predictions. (A) EMSA measuring CSL binding in the presence of different concentrations of the indicated competitor oligonucleotides. Con: no competitor, P: probe only. White arrow indicates the position of unbound probe and black arrow indicates the position of bound probe in CSL complexes. (B) Response of reporters containing the indicated oligonucleotides to NICD, measured as the fold change in luciferase activity in extracts from transfected cells. Activities were normalized to co-transfected renilla plasmid to control for transfection efficiencies. Error bars depict the standard error of the mean from >3 biological replicates. * indicates that the response was significantly different from the control (P < 0.05, paired t-test). (C) Assays were as in B, reporters contained indicated oligonucleotides in palindromic (SPS, black) or parallel (gray) orientations. The difference between the SPS and parallel orientations was not significant (P = 0.053; paired t-test). (D) Response of reporters containing the indicated oligonucleotides to Grh, measured as the fold change in luciferase activity. Lower levels of activity indicate that the oligonucleotide confers repression by CSL. S1, S9 and S2 were all significantly different from the control (P < 0.01, paired t-test); * indicates that repression from S1 was significantly different from S2 (P < 0.01) and from S9 (P < 0.05). predicted binding energies, but was inactive in the reporter fore that some less favored motifs may exhibit activity when assays and failed to compete even at 500× excess in EMSA in the SPS configuration, because of the added stabiliza- assays (data not shown). Thus, it appears that only some of tion from NICD:NICD interactions. Such sequences might the structure-based predictions are informative about po- function less well when in a head-to-tail orientation and/or tentially relevant site variants. in conferring repression. To determine whether the DNA orientation affects the transcriptional activation from the weak sites, we re-tested Relevance of sequence orientation and contribution to repres- different DNA sequences in parallel orientations for their sion ability to respond to NICD (Figure 4C). Under these condi- Finally, we tested whether there was a significant difference tions, most sequences gave rise to similar activity whether in in the activity of sequence motifs depending on their ori- SPS or in parallel arrangements. Only the strong S1 consen- entation and on whether motifs behaved similarly in re- sus site with C at position 1 showed a slight difference in ac- pression assays. Previous experiments have demonstrated tivity, conferring 1.5× higher activity in the SPS orientation that so-called paired sites (SPS), where two CSL motifs (Figure 4C; P = 0.053). Strikingly, the T (S2) variant did are arranged palindromically, often exhibit greater response not exhibit such a difference nor did the others tested (S5, to NICD because they favor dimerization between NICD S8, S7; Figure 4C). Thus, the sequence arrangements do molecules in adjacent complexes (55). It is possible there- not appear to account for the ability of ‘weaker’ sequences

139 10 Nucleic Acids Research, 2014 to confer transcriptional activation; under these conditions the equivalent matrix from unbound CSL, which exhibits of high NICD expression the CSL-mediated regulation is different (relative) domain–domain movements (Figure 5A) largely unaffected by the orientation of the motifs (although and strongly reduced domain–domain correlations (Figure it should be noted that these experiments do not address the 5A’ , Tabl e 3). These results imply that the bound DNA con- importance of NICD dimerization per se). figures the system such that the domains within CSLare To further explore the activity of different CSL binding strongly correlated. Furthermore, the internal correlation motifs, they were tested in a repression assay. The reporters of CSL is reduced in proportion to the binding affinity (Ta- also contain binding sites for the Drosophila transcriptional ble 3). Interestingly, the two mutated sequences analyzed, activator Grainyhead (Grh). The ability of different se- CGTGTGAC (S9) and CGTAAGAA (S6), gave rise to dif- quence motifs to confer CSL-mediated inhibition of Grh ferent behaviors (Figure 5D and E). With the intermedi- can therefore be assessed based on the expression levels in ate functional motif, CGTGTGAC, some internal corre- the presence of Grh: effective repression is evident as a re- lations were retained, producing a profile that is interme- duction in the expression levels, comparing the effects of dif- diate between the consensus sequences and the unbound ferent CSL motifs with a control (CGTTTTAA) sequence state (Figure 5D). With the non-functional motif, CGTAA- (Figure 4D). As for activation, the most robust repression GAA, very limited internal correlation remained, generat-

was detected with reporters containing CGTGGGAA (S1). ing a profile similar to CSL in its unbound state (Figure 5E). Downloaded from Substitution of T at position 1 (S2) resulted in a decrease in The observation that the internal correlation of CSL is the magnitude of repression, indeed the position 1 variant reduced in proportion to the binding affinity may be one TGTGGGAA behaved more similarly to the non-canonical factor that helps in understanding how CSL selects differ- sequence CGTGTGAA (S3). Overall, it appears that the ent DNA sequences. In addition, if the internal dynamics motif requirements for CSL-mediated repression are sim- change depending on the sequence, this might impact on ilar to those for activation with NICD, with the position 1 subsequent interactions between CSL and other proteins. http://nar.oxfordjournals.org/ C variant being the most effective. In order to test possible impacts of such internal dynam- ics, we assessed the consequences of mutating three residues (413AAA) that contribute to the terminal part of the beta MD simulations demonstrate effects of different DNA se- strand, which links the BTD domain with the CTD do- quences on internal CSL dynamics main (Figure 1B). MD simulations indicated that such a The EMSA and reporter assays indicate that, although CSL mutated protein has reduced domain–domain interactions has a strong preference for the two consensus sequences, its (Table 3). However, the ability of the 413AAA mutant to stimulate transcription in the presence of NICD was sim- activity is sensitive to the base at position 1, with preference at University of Cincinnati on August 12, 2014 for a C. Furthermore, probing a range of sequences has re- ilar to that of the native protein in the context of both vealed that a broad repertoire of motifs can be bound by CGTGGGAA and CGTGTGAC reporter constructs (Fig- CSL (e.g. Figures 3 and 4), with many having quite simi- ure 5F). Likewise, mutation of a single residue (A415) had lar functional activities. These subtle differences in binding no impact on CSL activity in these assays. Finally, the con- versus functional activities prompted us to investigate the sequence of these mutations on the function of CSL in vivo effects of different DNA sequences on CSL dynamical be- was assessed in transgenic rescue experiments (Figure 5G). havior, by performing MD simulations of CSL in the pres- A genomic fragment expressing the wild-type protein is ence of four different DNA sequences. able to fully rescue flies with loss of endogenous CSL func- Simulations of the two DNA ‘consensus’ sequences, tion (Su(H)SF8/Su(H)AR9 transheterozygote, Figure 5G). CGTGGGAA (S1) and TGTGGGAA (S2), were com- In contrast, a similar fragment carrying the 413AAA mu- pared to unbound CSL and to two mutated DNA se- tated version shows reduced function, viability is compro- quences: CGTGTGAC (S9), which exhibits intermediate mised and the surviving flies have wing venation defects binding and transcriptional regulation, and CGTAAGAA (Figure 5G). Although this suggests that the protein may (S6), which exhibits little/no binding or activity but has a function less effectively when the inter-domain protein dy- Kd significantly different from negative controls (Table 2, namics are perturbed, it is also possible that the 413AAA Figure 3C, E, F). mutation perturbs other aspects of CSL function. As the RMSD and RMSF analyses (Supplementary Fig- ures S3 and S4) did not show large differences between the DISCUSSION complexes with the four test sequences, the intra-domain correlation of the complexes and CSL in its unbound state One of the challenges in understanding how TFs regulate was calculated using the ICRM. This is a widely used genes resides in our limited ability to predict where they tool to study dynamical differences between protein–DNA, will bind in the genome. Even taking into consideration protein–protein and protein–small molecules interactions the numerous regulatory layers that influence the acces- (32,37,56). The results reveal that both ‘consensus’ com- sibility of binding sites in chromatin, TFs are frequently plexes produce a similar intra-domain correlation (Figure found to occupy different sites from those predicted. One 5B and C), as is also confirmed by the calculation (Ta- reason for this disparity is the extent of knowledge about ble 3) and by the direction and the intensity of the prin- the full spectrum of recognition motifs. For example, PWM cipal motions made by the CSL residues during the sim- libraries are often biased due to the historical manner in ulations (shown by the projection of the eigenvalue corre- which many were constructed, based around the first known sponding to the first eigenvector on the structure; Figure motifs for a given TF (57). Furthermore, classic PWM- 5B’ and C’). This intra-domain correlation is not seen in based approaches treat all nucleotides along the sequence

140 Nucleic Acids Research, 2014 11

Table 2. Calorimetric data for various DNA sequences binding to CSL

−1 ◦ ◦ ◦ Ligand (syringe) K (M ) Kd (␮M) G (kcal/mol) H (kcal/mol) −TS (kcal/mol) CGTGGGAA 2.2 ± 1.4×107 0.06 −9.4 ± 0.3 5.1 ± 1.0 −14.5 ± 0.7 TGTGGGAA 2.0 ± 0.5×106 0.50 −8.2 ± 0.1 8.8 ± 1.2 −17.0 ± 1.0 CGTGTGAC 5.9 ± 1.0×105 1.72 −7.5 ± 0.1 4.8 ± 1.0 −12.3 ± 1.0 CGTAAGAA –– >50 –– –– –– TCATACCT* NBD NBD NBD NBD NBD

*Negative control. All experiments were performed at 10◦C. Values are the mean of at least three independent experiments and errors represent the standard deviations of multiple experiments. NBD = no binding detected. Downloaded from http://nar.oxfordjournals.org/ at University of Cincinnati on August 12, 2014

Figure 5. Effects of DNA sequences on internal CSL dynamics and functional implications. (A–E) Results from MD simulations showing inter-domain correlations when CSL is bound to different sequences as indicated. In particular, A–C show the comparison between the projection of the first eigenvector on the CSL structure, in the presence or absence of DNA sequences. The spikes of the porcupine plots indicate the principal motions (i.e. the motion described by the first eigenvector) for each C-alpha, while the length indicates the intensity of the motion. A’–C’ and D–E show a comparison oftheICRM matrices between the different systems studied. (F) Consequences of mutations in CSL on activity of reporter genes containing the indicated sequences. Unmutated CSL (WT), single mutant CSL (415A) and triple mutant CSL (413AAA) were co-transfected with NICD (ratio of DNAs was 1:5) and the expression of the indicated reporter measured relative to co-transfected renilla control. Error bars depict the standard error of the mean from >3 biological replicates. (G) Adult wings from flies with either wild-type or mutated CSL rescue plasmids, right panel is a higher magnification of the region withaltered venation (blue circles). Tables indicate the proportion of flies that were viable in each case, and the proportion of viable flies whose wings hadvenation defects. N > 200.

Table 3. Ranked eigenvalues from the ICRM matrices of the complexes ing, based on protein structural properties, to probe the studied specificity of CSL binding. In doing so we have clarified im- Complex First eigenvalue portant features. For example positions that were thought to be biased toward a specific nucleotide (positions 2 and TGTGGGAA 23 938 6), as illustrated by MotifMap/TRANSFAC PWMs, were CGTGGGAA 23 716 CGTGTGAC 20 934 predicted by the modeling to accommodate a wider spec- CGTAAGAA 19 073 trum of nucleotides. Some of these differences, notably the UNBOUND 17 821 variability at position 2, were also detected by PBM analy- 413AAA 16 100 sis (19) and motifs with these variations were demonstrably functional in our in vitro binding and reporter assays. Conversely, the modeling predicted a strong preference at motif independently and cannot utilize information that position 1, which was quantified experimentally. Thus, mo- arises from correlation analysis (58). Confronted by these tifs with a C at position 1 performed consistently better than challenges, our strategy was to use computational model-

141 12 Nucleic Acids Research, 2014 those with T. Together, the results demonstrate that com- tion being important for full activity of CSL under physi- putational modeling from the crystal structure can expand ological conditions, although there are also other possible the knowledge about functional target sites, even in cases explanations. of otherwise well-characterized TFs such as CSL. However, In summary, in silico approaches to investigate the mech- it is evident that the computational predictions also have anisms of CSL binding have revealed additional features, biases. For example variations at position 1 were energeti- increasing our understanding of the repertoire of sequences cally penalized. As a consequence, the results are best used that may be functional. Furthermore, the results suggest in combination with other data rather than as a predictive that the specific sequence bound may in turn impact onthe tool on their own. One possible reason may be related to the outcome of the binding event, although our experiments fact that the FOLDX calculation is based on the assump- could not confirm a direct effect on transcriptional out- tion that CSL binds with a similar conformation to each come. Still, such dynamics may be important for the func- and every DNA sequence, while there is evidence to sug- tional TF binding to be distinguished from non-functional, gest that TFs can slightly change their conformation when by yet unidentified factors. bound to high or low affinity DNA sequences (59). One important question is how TFs select the correct SUPPLEMENTARY DATA

binding site amongst others that are very close energetically. Downloaded from Indeed, there is a notion that many of the lower affinity Supplementary Data are available at NAR Online. interactions between TFs and DNA primarily represent a buffering mechanism to retain those molecules close to the ACKNOWLEDGEMENT DNA, while only a few binding events play an actual reg- ulatory role (60). On the atomic level there must be mech- We thank members of the Adryan, Bray, Glen and Kovall anisms to discern these different forms of binding events. groups for helpful discussions. R.T.also thanks Prof. Vendr- http://nar.oxfordjournals.org/ To investigate whether the specific DNA sequence present uscolo, Dr Colombo, Dr Bond and Prof. Mancera for help- could have an impact on the way the protein behaves, we ful discussions. used MD simulations, to determine the influence of bound DNA on the ability of CSL to transmit a dynamic signal FUNDING within its structure. Our results predict a profound effect of DNA binding on the inter-domain correlated motions, with BBSRC [BB/J008842/1 to S.J.B., B.A., Dr Steve Russell.]; lower affinity sequences demonstrating a reduced correla- National Institutes of Health (NIH) [CA178974 to R.A.K.] tion compared to high affinity sites. For example, although and a Leukemia and Lymphoma Society Scholar Award at University of Cincinnati on August 12, 2014 the structural modeling suggests that compensatory interac- (to R.A.K.); Unilever (to R.T. and R.G.). China Scholar- tions can occur when specific DNA contacts are lost, such ship Council Cambridge (to J.L.); BBSRC Studentship (to as in the CSL complexes with either CGTGTGAC or CG- R.A.F.); NIH training [5T32ES007250 to A.N.C.]. Fund- TAAGAA, nevertheless these interactions do not give rise ing for open access charge: University RCUK Open Access to the same long-range domain–domain communication. Fund. By revealing that different DNA sequences can propagate Conflict of interest statement. None declared. different dynamic signals through the protein, this approach suggests the possibility of an emergent behavior that trans- REFERENCES duces a dynamic signal modulating gene expression. The inter-domain signaling within CSL that is elicited by DNA 1. Rohs,R., Jin,X.S., West,S.M., Joshi,R., Honig,B. and Mann,R.S. (2010) Origins of specificity in protein-DNA recognition. Annu. Rev. binding could thus be important to distinguish functional Biochem., 79, 233–269. from non-functional DNA interaction sites and could in 2. Rohs,R., West,S.M., Liu,P. and Honig,B. (2009) Nuance in the turn affect the recruitment of other factors to the bound double-helix and its role in protein-DNA recognition. Curr. Opin. TF. Indeed, such allosteric changes in CSL have been pro- Struc. Biol., 19, 171–177. posed to affect the formation of the tertiary complex with 3. Rohs,R., West,S.M., Sosinsky,A., Liu,P., Mann,R.S. and Honig,B. (2009) The role of DNA shape in protein-DNA recognition. Nature, NICD based on other modeling strategies (61). Further- 461, 1248–1253. more, DNA-induced allosteric changes in TFs have been 4. Seeman,N.C., Rosenberg,J.M. and Rich,A. (1976) Sequence-specific proposed to play important roles in transcriptional regu- recognition of double helical nucleic-acids by proteins. Proc. Natl lation (62). Thus, depending on the protein–DNA binding Acad. Sci. U.S.A., 73, 804–808. 5. Otwinowski,Z., Schevitz,R.W., Zhang,R.G., Lawson,C.L., event, different communication regimes could be generated Joachimiak,A., Marmorstein,R.Q., Luisi,B.F. and Sigler,P.B. (1988) and influence the motion and the energy landscape within Crystal-structure of Trp repressor operator complex at atomic the protein to modulate its interactions. Despite these in- resolution. Nature, 335, 321–329. triguing models, mutations in residues that should perturb 6. Kovall,R.A. and Blacklow,S.C. (2010) Mechanistic insights into the inter-domain correlations have at best modest conse- Notch receptor signaling from structural and biochemical studies. Curr.Top.Dev.Biol., 92, 31–71. quences for the function of CSL in the assays used. Thus, 7. Doyle,T.G., Wen,C.H. and Greenwald,I. (2000) SEL-8, a nuclear no specific differences were detected in CSL’s ability to protein required for LIN-12 and GLP-1 signaling in Caenorhabditis stimulate transcription from different sequences when the elegans. Proc. Natl Acad. Sci. U.S.A., 97, 7877–7881. domain–domain communications were compromised un- 8. Petcherski,A.G. and Kimble,J. (2000) LAG-3 is a putative transcriptional activator in the C-elegans Notch pathway. Nature, der conditions with high levels of expressed proteins. How- 405, 364–368. ever, such a mutated protein did have reduced function in 9. Nam,Y., Weng,A.P., Aster,J.C. and Blacklow,S.C. (2003) Structural vivo, which is consistent with the inter-domain communica- requirements for assembly of the CSL center dot Intracellular Notch1

142 Nucleic Acids Research, 2014 13

center dot Mastermind-like 1 transcriptional activation complex. J. 31. Berendsen,H.J.C., Postma,J.P.M., Vangunsteren,W.F., Dinola,A. and Biol. Chem., 278, 21232–21239. Haak,J.R. (1984) Molecular-dynamics with coupling to an external 10. Wilson,J.J. and Kovall,R.A. (2006) Crystal structure of the bath. J. Chem. Phys., 81, 3684–3690. CSL-Notch-Mastermind ternary complex bound to DNA. Cell, 124, 32. Torella,R., Moroni,E., Caselle,M., Morra,G. and Colombo,G. (2010) 985–996. Investigating dynamic and energetic determinants of protein nucleic 11. Mathelier,A., Zhao,X., Zhang,A.W., Parcy,F., Worsley-Hunt,R., acid recognition: analysis of the zif268-DNA complexes. Arenillas,D.J., Buchman,S., Chen,C.Y., Chou,A., Ienasescu,H. et al. BMC Struct. Biol., 10, 42. (2013) JASPAR 2014: an extensively expanded and updated 33. Ragona,L., Colombo,G., Catalano,M. and Molinari,H. (2005) open-access database of transcription factor binding profiles. Nucleic Determinants of protein stability and folding: comparative analysis of Acids Res., 42, D142–D147. beta-lactoglobulins and liver basic fatty acid binding protein. 12. Bryne,J.C., Valen,E., Tang,M.H., Marstrand,T., Winther,O., da Proteins, 61, 366–376. Piedade,I., Krogh,A., Lenhard,B. and Sandelin,A. (2008) JASPAR, 34. Perera,R.L., Torella,R., Klinge,S., Kilkenny,M.L., Maman,J.D. and the open access database of transcription factor-binding profiles: new Pellegrini,L. (2013) Mechanism for priming DNA synthesis by yeast content and tools in the 2008 update. Nucleic Acids Res., 36, DNA polymerase alpha. eLife, 2, e00482. D102–D106. 35. Morra,G., Potestio,R., Micheletti,C. and Colombo,G. (2012) 13. Fryer,C.J., Lamar,E., Turbachova,I., Kintner,C. and Jones,K.A. Corresponding functional dynamics across the Hsp90 chaperone (2002) Mastermind mediates chromatin-specific transcription and family: insights from a multiscale analysis of MD simulations. Plos turnover of the Notch enhancer complex. Genes Dev., 16, 1397–1411. Comput. Biol., 8, e1002433. 14. Wallberg,A.E., Pedersen,K., Lendahl,U. and Roeder,R.G. (2002) 36. Morra,G., Verkhivker,G. and Colombo,G. (2009) Modeling signal p300 and PCA act cooperatively to mediate transcriptional activation propagation mechanisms and ligand-based conformational dynamics Downloaded from from chromatin templates by Notch intracellular domains in vitro. of the Hsp90 molecular chaperone full-length dimer. PLoS Comput. Mol. Cell. Biol., 22, 7812–7819. Biol., 5, e1000323. 15. Kovall,R.A. and Hendrickson,W.A. (2004) Crystal structure of the 37. Pagano,K., Torella,R., Foglieni,C., Bugatti,A., Tomaselli,S., Zetta,L., nuclear effector of Notch signaling, CSL, bound to DNA. EMBO J., Presta,M., Rusnati,M., Taraboletti,G., Colombo,G. et al.(2012) 23, 3441–3451. Direct and allosteric inhibition of the FGF2/HSPGs/FGFR1 16. Nam,Y., Sliz,P., Song,L., Aster,J.C. and Blacklow,S.C. (2006) ternary complex formation by an antiangiogenic, Structural basis for cooperativity in recruitment of MAML thrombospondin-1-mimic small molecule. Plos One, 7, e36990. http://nar.oxfordjournals.org/ coactivators to Notch transcription complexes. Cell, 124, 973–983. 38. Brou,C., Logeat,F., Lecourtois,M., Vandekerckhove,J., Kourilsky,P., 17. Friedmann,D.R., Wilson,J.J. and Kovall,R.A. (2008) RAM-induced Schweisguth,F. and Israel,A. (1994) Inhibition of the DNA-binding allostery facilitates assembly of a notch pathway active transcription activity of Drosophila suppressor of hairless and of its human complex. J. Biol. Chem., 283, 14781–14791. homolog, Kbf2/Rbp-J-Kappa, by direct protein-protein interaction 18. Kurth,P., Preiss,A., Kovall,R.A. and Maier,D. (2011) Molecular with Drosophila hairless. Genes Dev., 8, 2491–2503. analysis of the Notch repressor-complex in Drosophila: 39. Krejci,A. and Bray,S. (2007) Notch activation stimulates transient characterization of potential hairless binding sites on suppressor of and selective binding of Su(H)/CSL to target enhancers. Genes Dev., hairless. Plos One, 6, e27986. 21, 1322–1327. 19. Del Bianco,C., Vedenko,A., Choi,S.H., Berger,M.F., Shokri,L., 40. Furriols,M. and Bray,S. (2000) Dissecting the mechanisms of

Bulyk,M.L. and Blacklow,S.C. (2010) Notch and MAML-1 suppressor of hairless function. Dev. Biol., 227, 520–532. at University of Cincinnati on August 12, 2014 complexation do not detectably alter the DNA binding specificity of 41. Housden,B., Millen,K. and Bray,S.J. (2012) Drosophila reporter the transcription factor CSL. Plos One, 5, e15034. vectors compatible with Phi C31 integrase transgenesis techniques 20. Sali,A. and Blundell,T.L. (1993) Comparative protein modelling by and their use to generate new Notch reporter fly lines. G3 (Bethesda), satisfaction of spatial restraints. J. Mol. Biol., 234, 779–815. 2, 79–82. 21. Fiser,A., Do,R.K. and Sali,A. (2000) Modeling of loops in protein 42. Bischof,J., Maeda,R.K., Hediger,M., Karch,F. and Basler,K. (2007) structures. Protein Sci., 9, 1753–1773. An optimized transgenesis system for Drosophila using 22. Li,H., Robertson,A.D. and Jensen,J.H. (2005) Very fast empirical germ-line-specific phi C31 integrases. Proc. Natl Acad. Sci. U.S.A., prediction and rationalization of protein pK(a) values. Proteins, 61, 104, 3312–3317. 704–721. 43. Schweisguth,F. and Posakony,J.W. (1992) Suppressor of hairless, the 23. Hornak,V., Abel,R., Okur,A., Strockbine,B., Roitberg,A. and Drosophila homolog of the mouse recombination signal Simmerling,C. (2006) Comparison of multiple amber force fields and binding-protein gene, controls sensory organ cell fates. Cell, 69, development of improved protein backbone parameters. Proteins, 65, 1199–1212. 712–725. 44. Friedmann,D.R. and Kovall,R.A. (2010) Thermodynamic and 24. Guerois,R., Nielsen,J.E. and Serrano,L. (2002) Predicting changes in structural insights into CSL-DNA complexes. Protein Sci., 19, 34–46. the stability of proteins and protein complexes: a study of more than 45. Lai,E.C., Bodner,R. and Posakony,J.W. (2000) The enhancer of split 1000 mutations. J. Mol. Biol., 320, 369–387. complex of Drosophila includes four Notch-regulated members of the 25. Nadra,A.D., Serrano,L. and Alibes,A. (2011) DNA-binding Bearded gene family. Development, 127, 3441–3455. specificity prediction with FoldX. Methods Enzymol., 498, 3–18. 46. Schymkowitz,J.W.H., Rousseau,F., Martins,I.C., Ferkinghoff-Borg,J., 26. Crooks,G.E., Hon,G., Chandonia,J.M. and Brenner,S.E. (2004) Stricher,F. and Serrano,L. (2005) Prediction of water and metal WebLogo: a sequence logo generator. Genome Res., 14, 1188–1190. binding sites and their affinities by using the Fold-X force field. Proc. 27. Case,D.A., Cheatham,T.E.I., Darden,T.A., Gholke,R., Luo,R., Natl Acad. Sci. U.S.A., 102, 10147–10152. Merz,K.M., Onufriev,A., Simmerling,C.L., Wang,J. and 47. Alibes,A., Nadra,A.D., De Masi,F., Bulyk,M.L., Serrano,L. and Woods,R. (2005) The Amber biomolecularsimulation programs. J. Stricher,F. (2010) Using protein design algorithms to understand the Computat. Chem. , 26, 1668–1688. molecular basis of disease caused by protein-DNA interactions: the 28. Hornak,V., Abel,R., Okur,A., Strockbine,B., Roitberg,A. and Pax6 example. Nucleic Acids Res., 38, 7422–7431. Simmerling,C. (2006) Comparison of multiple Amber force fields and 48. Daily,K., Patel,V.R., Rigor,P., Xie,X. and Baldi,P. (2011) MotifMap: development of improved protein backbone parameters. Proteins, 65, integrative genome-wide maps of regulatory motif sites for model 712–725. species. BMC Bioinformatics, 12, 495. 29. Gotz,A.W.,¨ Williamson,M.J., Xu,D., Poole,D., Le Grand,S. and 49. Xie,X., Rigor,P. and Baldi,P. (2009) MotifMap: a human Walker,R.C. (2012) Routine microsecond molecular dynamics genome-wide map of candidate regulatory motif sites. Bioinformatics, simulations with AMBER on GPUs. 1. Generalized born. J. Chem. 25, 167–174. Theory Comput., 8, 1542–1555. 50. Meng,X., Brodsky,M.H. and Wolfe,S.A. (2005) A bacterial 30. Jorgensen,W.L., Chandrasekhar,J., Madura,J.D., Impey,R.W. and one-hybrid system for determining the DNA-binding specificity of Klein,M.L. (1983) Comparison of simple potential functions for transcription factors. Nat. Biotechnol., 23, 988–994. simulating liquid water. J. Chem. Phys., 79, 926–935. 51. Hertz,G.Z. and Stormo,G.D. (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 563–577.

143 14 Nucleic Acids Research, 2014

52. Nellesen,D.T., Lai,E.C. and Posakony,J.W. (1999) Discrete enhancer family: insights from a multiscale analysis of MD simulations. PLoS elements mediate selective responsiveness of enhancer of split Comput. Biol., 8, e1002433. complex genes to common transcriptional activators. Dev. Biol., 213, 57. Stormo,G.D. (2000) DNA binding sites: representation and 33–53. discovery. Bioinformatics, 16, 16–23. 53. Tun,T., Hamaguchi,Y., Matsunami,N., Furukawa,T., Honjo,T. and 58. Stormo,G.D. and Zhao,Y. (2010) Determining the specificity of Kawaichi,M. (1994) Recognition sequence of a highly conserved protein-DNA interactions. Nat. Rev. Genet., 11, 751–760. DNA binding protein RBP-J kappa. Nucleic Acids Res., 22, 965–971. 59. Ma,B.Y., Tsai,C.J., Pan,Y.P. and Nussinov,R. (2010) Why does 54. Tsuda,L., Kaido,M., Lim,Y.M., Kato,K.K., Aigaki,T.S. and binding of proteins to DNA or proteins to proteins not necessarily Hayashi,S.G. (2006) An NRSF/REST-like repressor downstream of spell function? ACS Chem. Biol., 5, 265–272. Ebi/SMRTER/Su(H) regulates eye development in Drosophila. 60. Biggin,M.D. (2011) Animal transcription networks as highly EMBO J., 25, 3191–3202. connected, quantitative continua. Dev. Cell, 21, 611–626. 55. Arnett,K.L., Hass,M., McArthur,D.G., Ilagan,M.X.G., Aster,J.C., 61. Weinkam,P., Pons,J. and Sali,A. (2012) Structure-based model of Kopan,R. and Blacklow,S.C. (2010) Structural and mechanistic allostery predicts coupling between distant sites. Proc. Natl Acad. Sci. insights into cooperative assembly of dimeric Notch transcription U.S.A., 109, 4875–4880. complexes. Nat. Struct. Mol. Biol., 17, 1312–1317. 62. Siggers,T. and Gordan,R. (2013) Protein-DNA binding: complexities 56. Morra,G., Potestio,R., Micheletti,C. and Colombo,G. (2012) and multi-protein codes. Nucleic Acids Res., 42, 2099–2111. Corresponding functional dynamics across the Hsp90 Chaperone Downloaded from http://nar.oxfordjournals.org/ at University of Cincinnati on August 12, 2014

144 Appendix B: Structural and Biophysical Characterization of the Su(H)-Hairless Repression Complex

In this work I contributed protein expression, purification, and optimization of experimental conditions.

Data presented in this section has been published as:

Maier, D.; Kurth, P.; Schulz, A.; Russell, A.; Yuan, Z.; Gruber, K.; Kovall, R. A.; and Preiss, A. (2011) Structural and Functional Analysis of the Repressor Complex in the Notch Signaling Pathway of Drosophila melanogaster. Molecular Biology of the Cell 22; 3242-3252.

145

Hairless is the primary antagonist of Notch signaling in Drosophila. It is expressed throughout the fly and has no known orthologs in mammals. Until recently, with the identification of Insensitive and Insensible in fly peripheral nervous system,

Hairless was the only known Notch pathway repressor in Drosophila.199,248 Hairless achieves repression of Notch target gene transcription by interacting with Su(H) as well as with two additional co-repressors, Groucho and CtBP, who, in turn, associate with histone deacetylases that remodel the local chromatin environment to repress transcription.210 (Figure 1) The Su(H)-Hairless complex also recruits histone chaperones to the target gene promoter, resulting in chromatin inactivation.210 Despite extensive studies of Hairless and its role in repressing Notch signaling, the molecular details of the repression complex remained unknown

However, the Kovall lab, in collaboration with the labs of Dr. Dieter Maier and Dr.

Anette Preiss, recently determined the structure and interaction details of the Su(H)-

Hairless complex. Our collaborators performed yeast two-hybrid and yeast three-hybrid assays that revealed a region in the N-terminal half of Hairless interacts with the CTD of

Su(H).210 They also performed several genetic and tissue culture assays, which can be found in our publication Maier et al. 2011.210 Zhenyu Yuan of the Kovall lab performed isothermal titration calorimetry (ITC) with Su(H) and a SMT fusion tagged construct of

Hairless, corresponding to amino acids 232-358. (Figure 2A) He observed a binding affinity of 1 nM, indicating very tight binding between Su(H) and Hairless. When he repeated the experiment with the same construct of Hairless and the CTD of Su(H), he recapitulated all of the binding observed with full length Su(H) as the K d = 2 nM. This means Hairless binds solely to the CTD of Su(H), thus confirming the data from our 146

collaborators. Additional analysis by the Kovall lab included far-UV circular dichroism of

Hairless 232-358, which had a minimum at 200 nm characteristic of a random coil structure. Zhenyu Yuan also performed luciferase assays in cultured mammalian cells containing endogenous CSL and observed increasing concentrations of Hairless 232-

269 resulted in strong inhibition of the reporter.

In the same publication, another member of the Kovall lab, Andrew Russell, executed electrophoretic mobility shift assays (EMSAs) to examine whether there was competition between NICD and Hairless for binding to Su(H). In these EMSAs, complexes of Su(H) and Hairless 232-269 were pre-formed before titrating in an increasing amount of fly NICD. (Figure 2B) The complexes were then separated o n native gels and revealed higher concentrations of fly NICD were able to displace

Hairless from Su(H) and form a complex of Su(H)-NICD. This conversion from Su(H)-

Hairless to Su(H)-NICD was complete; no complexes containing Hairless were visible on the gel, suggesting NICD can completely displace Hairless from Su(H). The fact that

NICD can do this without Mastermind, the third component of the ternary complex, is remarkable. When similar EMSAs were performed with mammalian orthologs,

Mastermind was required for mammalian NICD to displace the mammalian co-repressor

MINT from mammalian CSL, RBP-J. Interestingly, the reverse EMSA experiment, in which increasing amounts of Hairless are titrated into pre-formed complexes of Su(H)-

NICD, did not show an identical outcome. (Figure 2C) Hairless was able to displace

NICD at higher concentrations, but it was unable to do so completely. Both complexes,

Su(H)-NICD and Su(H)-Hairless, were present on the gel, suggesting Hairless could displace NICD, but that it did so less efficiently. This, in light of our binding data,

147

presented a conundrum. How could NICD displace tightly bound Hairless from Su(H), and how could it do so without Mastermind to secure the ternary complex? We knew the binding affinity of the RBP-J-NICD complex in mammals, but we did not yet possess that information for the fly proteins. This prompted my research into the Su(H)-NICD interaction in Drosophila, which was discussed in Chapter 2. While my data revealed the ANK domain contributes to NICD binding to Su(H), it does not answer the question of how NICD can displace Hairless from Su(H), particularly since Hairless binds with much higher affinity. More studies are necessary to address this issue.

Additionally, Zhenyu Yuan solved the crystal structure of the DNA-Su(H)-Hairless complex, though this data is currently unpublished. (Figure 3) His structure, which contains a 15 nucleotide oligomer of DNA in complex with a surface entropy reduction mutant of Su(H) R155T N281G and Hairless 232-269. Hairless was in fusion with maltose binding protein (MBP) from the pMalE vector, which likely aided in crystallization of the complex as numerous attempts had been unsuccessful before use of the MBP fusion. The structure was resolved to 2.14 angstroms and shows Hairless wedging into the CTD, noticeably disrupting local CTD structure. While this structure does not provide a clear answer as to how Hairless and NICD compete for binding to

Su(H), it does depict the unique binding mode of Hairless when compared to other mammalian co-repressors, none of which bind solely to the CTD.

During my graduate career, I was part of the research into Hairless in the Kovall lab. Over several years, I expressed and purified many of the proteins used in these studies, including Su(H) wild-type, Su(H) R155T N281G, the CTD of Su(H) wild-type, and Hairless 232-269. I was responsible for the initial development of Su(H) R155T

148

Figure 1: Hairless is Corepressor of Notch Signaling in Drosophila A) Domain schematic of the Hairless protein, showing the CSL interacting domain (CID, purple), the Groucho binding domain (GBD, blue), and the C-terminal binding protein (CtBP, brown) binding domain (CBD). B) Diagram of the Hairless repression complex. When the Notch pathway is not active, Su(H) (shown as three domain in green, blue, and orange)on the DNA of target genes is bound by Hairless (purple). Hairless recruits two additional corepressors, Groucho (pink) and CtBP (brown), which, in turn, recruit histone deacetylase complexes (HDACs) that remodel the local chromatin environment to repress target gene transcription.

149

Figure 2: Hairless Interacts with Su(H) A) Representative thermograms of Hairless binding to Su(H) (left) or to the CTD of Su(H) (right).210 The Hairless construct used corresponds to amino acids 232-358 and bears a SMT fusion tag that did not affect binding (data not shown). The disassociation constant (Kd) is shown. Experiments were performed at 25°C and consisted of 40 injections of 7μL each. B) EMSA gel showing NICD can displace Hairless from Su(H) in the absence of Mastermind.210 Pre-formed complexes of DNA-Su(H)-Hairless were titrated with increasing amounts of NICD. When resolved on a native gel, there are no remaining complexes of Su(H)-Hairless at the higher concentrations of NICD. Instead, there are only complexes of Su(H)-NICD. This experiment was repeated with a constant amount of Mastermind present, yielding identical results (data not shown). C) EMSA gel showing Hairless cannot displace NICD from Su(H) as efficiently.210 Pre- formed complexes of DNA-Su(H)-NICD were titrated with increasing amounts of Hairless. When resolved on a native gel, complexes of either Su(H)-NICD or Su(H)- Hairless were present for all concentrations of Hairless.

150

151

N281G, performing Quikchange mutagenesis on Su(H) wild-type to make the two point mutations. I also worked to optimize expression and purification of various Hairless constructs we have tested over the years, including two constructs that were not used to obtain data—Hairless 210-273, which had severe issues expression and purification, and Hairless 210-269, which had issues with protein stability during purification. We eventually determined Hairless 210-273 had a degron sequence at its C-terminus that triggered its degradation in the bacterial cell and during the initial stages of purification, explaining why we turned to the slightly shorter construct Hairless 210-269. It was able to be expressed and purified, but I was unable to do more than test its binding ability with an EMSA before I began my qualifying exam and turned the project over to Zhenyu

Yuan. Also, I performed the initial EMSAs with the fly proteins. My EMSAs were done to optimize pH conditions and component concentrations as well as to demonstrate binding and complex formation by the components. Once the conditions were optimized, I performed the initial iterations of the competition EMSAs, where I titrated in

Hairless to pre-formed complexes of Su(H) and NICD or I titrated in NICD to pre-formed complexes of Su(H) and Hairless. I also used EMSAs to test the binding competency of different Hairless constructs.

152

Figure 3: Structure of the DNA-Su(H)-Hairless Repression Complex Diagram (left) and cartoon representation (right) of the structure of the Hairless repression complex (Yuan and Kovall, unpublished data). Hairless (purple) binds solely to the CTD of Su(H) (orange) while the NTD (blue) and BTD (green) of Su(H) contact DNA. Hairless wedges into the CTD, disrupting its structure.

153

Appendix C: Cloning, Purification, and Binding Studies of the Novel Notch Co- Repressor, Drosophila Insensitive

In this work I performed cloning, protein expression and purification, and binding studies.

154

During analysis of domains involved in transcriptional regulation and chromatin function, Abhiman et al. identified an uncharacterized region in the vertebrate POZ- domain protein NAC1.249 Using this region in several database searches, they identified homolgous segments in numerous animal and viral proteins.249 The conserved region was present in one or more copies in a protein and often appeared with other domains that have functions related to transcriptional regulation and chromatin structure, such as

POZ, C4DM, C2H2 fingers, and MCAF N-terminal domain.249 Abhiman et al. named the conserved region the BEN domain after three experimentally characterized proteins in which it was present—BANP/SMAR1 (human), E5R (virus), and NAC1 (vertebrate).249

A JPRED secondary structure prediction of the BEN domain revealed an entirely α-fold structure of four conserved helices.249 Several residues are conserved, particularly those with hydrophobic side chains that facilitate helix-helix packing and thus overall fold stability.249 Two characteristic features are an LhxxlFs motif in helix 2 (l, aliphiatic; s, small; x, any residue) and an aliphatic residue, typically leucine, at the beginning of helix 3.249

With regards to BEN domain function, Abhiman et al. proposed it might function as a DNA binding protein or as an adaptor for chromatin modifying complexes.249 Their prediction is based off sequence analysis and assessing the function of other domains present in a protein with the BEN domain. For example, the BEN domain is present in

NAC1 in the region required for interaction with histone deacetylases HDAC3 and

HDAC4.250-251 Similarly, the region of BANP/SMAR1 containing the BEN domain has been implicated in interactions with the MAR-binding protein Cux/CDP and the SIN3- histone deacetylase complex.252-253 They also noticed the BEN domain was frequently

155

present in proteins with a POZ domain, suggesting there may be a functional association between the two domains.249 POZ domains are protein-protein interaction domains often present with DNA binding domains such as C2H2 and WRKY fingers, bZIP, AT-hooks, and C4DM domains.249 This underlies their prediction that the BEN domain may serve as a DNA binding domain. However, Abhiman et al. did not experimentally test their hypothesis, so BEN domain function remained unknown.

A molecular screen for genes specifically expressed in proneural clusters identified several genes, including the uncharacterized Drosophila gene, insensitive.254

Null mutants of insensitive were shown to be lethal and to exhibit Notch gain-of-function phenotypes in notum clones.254 Genetic interaction analysis of these insensitive null mutants revealed the Insensitive protein was localized to the nucleus and restrained

Notch signaling during multiple cell fate decisions, including sensory organ precursor specification, pIIA-pIIB decision, and socket-shaft decision.255 Furthermore, ectopic

Insensitive resulted in multiple Notch loss-of-function phenotypes, strongly repressed the expression of several Notch target genes in the Enhancer of split-Complex, and suppressed Notch-mediated activation of a reporter construct in tissue culture.255 In

2011, Lai et al. characterized the Drosophila Insensitive protein, whose only domain is a single BEN domain, making it a prototypical BEN-solo protein.199,249 Their paper showed Insensitive (Insv) was a repressor of the Notch pathway in the fly peripheral nervous system, antagonizing Notch signaling at three steps: sensory organ precursor specification, pIIA-pIIB decision, and socket-shaft decision.199 They employed several experimental techniques to demonstrate Insensitive genetically interacts with the Notch pathway.199 They also demonstrated Insensitive represses Notch-mediated

156

transcriptional activation and that it does so independent of Hairless, the only other known co-repressor of the Notch pathway in flies.199 Lai et al. showed by co- immunoprecipitation and by GST pulldown that Insv directly interacted with Su(H) through its C-terminal half containing the BEN domain (amino acids 179-361).199

Based on this evidence, we chose to further characterize the Insensitive-Su(H) interaction in the Kovall lab. I requested and received an Insensitive construct (pGEX-

5X-2 Insensitive, amino acids 179-361) from the Lai lab. Initially, I attempted to co- express this construct with pET28b+ Su(H) R155T N281G (amino acids 98-523) in E. coli, but was unable to visualize separate bands after crude affinity column purification.

The Su(H) R155T N281G mutant is a surface entropy reduction mutant of Su(H) that we have used in other situations without affecting its interaction with other co-activators or co-repressors. I also attempted to express and purify their Insensitive construct from E. coli cells, but was unable to obtain a substantial enough yield for experiments.

Lai et al. determined the BEN-solo domain corresponded to amino acids 259-356 in Insensitive.199 Using SABLE secondary structure prediction, I designed primers to clone a region slightly larger than the BEN-solo domain (amino acids 241-361) into the two vectors frequently used in the Kovall lab for protein expression, pGEX-6P-1 and pSMT. I also attempted to clone this region into a pMalE vector for crystallography.

After several attempts, I successfully cloned the construct into all three vectors.

Genewiz sequencing revealed there was a missense mutation in the pGEX-5X-2

Insensitive construct I had received from the Lai lab. This mutation changed amino acid

299 from a methionine into valine. We did not feel this mutation would affect protein structure or function and permitted it to remain in our new Insensitive constructs.

157

Next, I reattempted co-expression, this time using pGEX-6P-1 Su(H) wt (amino acids 58-523) with pSMT Insensitive (amino acids 241-361) in E. coli cells, but was unable to clearly determine whether there was any interaction between the two proteins.

I also expressed and purified both vectors containing Insensitive 241-361, obtaining protein from both. However, it was difficult to completely separate the GST fusion tag from Insensitive, prompting us to use the pSMT vector construct for any future expression and purification. In short, purification of pSMT Insensitive 241-361 entailed cell lysis, crude affinity chromatography with nickel beads, proteolytic removal of the

SMT fusion tag with ULP protease, ion exchange chromatography over a nickel column, and size exclusion chromatography to put the protein into a specific experimental buffer.

Using the purified Insensitive 241-361 construct, I performed three different types of experiments to analyze Insensitive-Su(H) binding. For a GST pulldown experiment, I incubated GST beads with either GST and Insensitive or with GST-Su(H) wt and

Insensitive. Though we observed a strong band for GST-Su(H) wt from the pulldown experiment, we saw an extremely faint band for Insensitive, which, without further confirmation, was not conclusive evidence of Su(H)-Insensitive interaction. I also performed electrophoretic mobility shift assays (EMSA) to determine whether

Insensitive could bind Su(H). For this, I incubated at room temperature for 15 minutes the purified Insensitive 241-361 construct with a 19 nucleotide oligomer of DNA and with

Su(H) wt or Su(H) R155T N281G. The DNA contained one CSL binding site, corresponding to the Hes1 consensus sequence of CGTGGGAA, in the center of the oligomer. The resulting complexes were separated on a 7% native gel for three hours at 4°C and bands were visualized by staining with Cyber-Gold, a DNA intercalating stain

158

with a fluorescent probe. While optimizing the EMSA conditions, I performed the experiment at pH 6.0, 7.0, and 8.0 with both sodium phosphate buffer and Hepes buffer.

I also tested Tris buffer at pH 8.0. Both sodium phosphate buffer, pH 7.0, and Hepes buffer, pH 7.0, showed similar results, which were an indistinct and incomplete shift of complexes containing higher concentrations of Insensitive and either Su(H) construct.

Again, these results are not conclusive proof that Insensitive is interacting with Su(H).

For our last technique, we used isothermal titration calorimetry (ITC) to analyze the

Insensitive-Su(H) interaction. A typical ITC experiment consisted of 40 injections of 7 μL each of a 100 μM ligand in the syringe into 10 μM macromolecule in the cell. Prior to the experiment, the purified Insensitive 241-361 construct and either Su(H) wt or Su(H)

R155T N281G were buffer matched by dialysis or sizing. For the combination of

Insensitive 241-361 with Su(H) R155T N281G, we saw identical results—low signal of less than 0.02 units and no titration—for all four runs, which included different temperatures (15°C, 25°C, 30°C). All of these experiments were done in our standard

ITC buffer, sodium phosphate buffer, pH 6.5, (50 mM sodium phosphate, 150 mM sodium chloride). For the combination of Insensitive 241-361 and Su(H) wt, we performed all the experiments in Hepes buffer, pH 8.0, (20 mM Hepes, 150 mM sodium chloride) at 25°C. The buffer change was prompted by the successful use of the buffer in a lab mate’s ITC experiments and the fact that these proteins did not seem negatively affected by the buffer in EMSA experiments. In our first run, we observed moderate signal strength of 0.2 units and titration during the experiment—hallmarks of interaction detected by ITC. However, all subsequent runs, including my control run, did not replicate this data. Rather, they had very low signal and absolutely no titration. Even

159

after we executed a cleaning routine and calibration on the instrument, the results were the same. Either something was wrong with my proteins and the first run was an anomaly or there were issues with the instrument that would require service.

We did not pursue additional ITC experiments with Insensitive 241-361 and

Su(H) wt, however, because the Lai lab published a subsequent paper about

Insensitive. In it, they determined the BEN-solo domain of Insensitive binds directly to the DNA sequence TCYAATHRGAA.256 They revealed a crystal structure of the BEN- solo domain (amino acids 251-365) of Insensitive bound to DNA, as well as several assays proving the DNA binding and repression activities by the BEN-solo domain of

Insensitive as well as a mammalian ortholog, BEND5.256 Additionally, they demonstrated Insensitive bound to its cognate DNA sequence in neural genes in the

Notch-regulated Enhancer of split-Complex and in thousands of genomic regions that does not contain consensus Su(H) binding sites.256 This suggests Insensitive can function independent of the Notch pathway when necessary, though it is possible these other genes may possess non-consensus Su(H) binding sites. Though the results of their second paper, demonstrating the BEN-solo domain of Insensitive directly binds

DNA, contradict the results in their first paper, the data presented in the second paper offers more conclusive evidence. In their first paper, Lai et al. showed interaction between the BEN-solo domain of Insensitive with Su(H) by co-immunoprecipitation and GST pulldown, which can be prone to false positives. Our own inability to clearly detect binding between the BEN-solo domain of Insensitive and

Su(H) by various methods—co-expression, GST pulldown, EMSA, and ITC—support their claim that the BEN-solo domain interacts with DNA, not Su(H).

160

Appendix D: Purification and Quantitative Binding Analysis of the SPOC Domain of Drosophola Split Ends Protein

In this work I performed protein expression, protein purification, and binding studies.

161

Drosophila Split Ends (Spen) and related proteins play an important role in cell fate specification during development, since Spen mutations in the fly embryo perturb the Notch, EGFR, and Ras/MAP kinase pathways.212 Spen proteins have been identified in and function similarly in C. elegans, Drosophila, and mammals.212 In mice, the Spen protein MINT (Msx2 interacting nuclear target protein) mediates repression by

Msx2 in skeletal and neuronal development.257 As MINT is also a co-repressor in the

Notch pathway, delineating the role MINT plays in transcriptional repression is critical to fully understanding Notch target gene regulation. The human ortholog of MINT, SHARP

(SMRT/HDAC1-associated repressor protein), is a component of transcriptional repression complexes for both nuclear receptors and the Notch pathway.203,214 In flies,

Spen antagonizes the Notch pathway and potentiates the EGFR-signaling pathway during Drosophila eye development.258 Spen proteins are characterized by three N- terminal RNA binding motifs and a C-terminal SPOC domain that is essential for their function.212 (Figure 1A)

SPOC (Spen paralog and ortholog C-terminal) domains are approximately 165 amino acids and mediate interactions with other co-repressor proteins, including NCoR,

HDAC1, and SMRT.212 (Figure 1B) The crystal structure of the SPOC domain from human SHARP was solved to 1.8 angstroms by Ariyoshi and Schwabe, showing the domain consists of a seven-stranded β-barrel framed by six α-helices.212 Two possible sites for protein-protein interactions are visible in the structure—a groove with two proline loops on a distorted side of the β-barrel and a slightly acidic and hydrophobic patch in the α-helices.212 A sequence alignment of the SPOC domain of SHARP with other Spen proteins had 26-54% sequence identity, with most of the conserved residues

162

mapping to the protein surface where they are likely to be involved in function.212

Interestingly, seven of these absolutely conserved residues map to a basic, and thus positively charged, patch and two more absolutely conserved residues lie very close to the basic patch. (Figure 2A) This pattern of conservation suggests the SPOC domain may electrostatically interact with a negatively charged molecule through this conserved basic patch. Ariyoshi and Schwabe demonstrated the SPOC domain of SHARP specifically binds to the LSD motif of NCoR through this basic patch.212 The LSD motif is a conserved acidic, and thus negatively charged, motif at the very C-terminus of

SMRT and NCoR co-repressor proteins.212 (Figure 2B) Previous studies showed the

SPOC domain of SHARP also interacts with HDAC1 and the co-repressor SMRT.203

Together, the structure and interaction data suggest the positively charged basic patch of the SPOC domain mediates interactions with negatively charged proteins.

Functionally, the SPOC domain serves as a platform in recruiting and assembling large multiprotein complexes like Class I and II HDACs responsible for transcriptional repression.

Based on the SPOC domain structure and several of their studies on SHARP and MINT, two of our collaborators, Dr. Franz Oswald and Dr. Tilman Borggrefe, predicted SHARP and MINT may help assemble a repression complex on CSL. To isolate different complexes containing the SPOC domain of SHARP, they performed pulldown assays using tagged SPOC domain and identified the binding partners by mass spectrometry (this and the subsequent data is unpublished). Their results listed several different repressor proteins, including SMRT, and several chromatin modifiers, such as methyltransferases and demethylases. One result was particularly surprising,

163

as the H3K4 methyltransferase MLL2 is not a repressor, but rather puts activating marks onto chromatin. Our collaborators confirmed MLL2 and the SPOC domain of

SHARP interact by performing the pulldown experiment in reverse with identical results.

They chose to further characterize this interaction, narrowing the region of MLL2 that interacts with the SPOC domain. They also demonstrated MLL2 knockdown in cultured cells resulted in differences in Notch target gene expression levels by microarray and components of the MLL2 complex are present at Notch target gene promoters by ChIP.

They even performed immunoprecipitation and GST pulldown experiments with fly orthologs, the SPOC domain of SPEN and two fly isoforms of MLL2, Trithorax and

Trithorax-related, and proved the interaction was conserved in flies.

In the Kovall lab, two former graduate students had characterized the MINT-CSL interaction in terms of structural and binding data. With this new information from our collaborators, we began to study the SPOC domain from murine MINT and fly Spen.

My lab mates, Kelly Collins and Nassif Tabaja, had great success with the mouse

SPOC domain, determining its binding affinity for several partners as well as solving a crystal structure of it in complex with a binding partner (unpublished data). Their work confirmed the SPOC domain of MINT interacts with negatively charged molecules, namely negatively charged phosphorylations on the binding partner.

While my lab mates studied mouse SPOC, I sought to identify a binding partner(s) for fly SPOC and characterize any interactions I observed. Initially, I was interested in the fly orthologs shown by our collaborators to interact by immunoprecipitation and GST pulldown—the SPOC domain of fly Spen (dSPOC) with

Trithorax (Trx) or Trithorax-related (Trr). Zhenyu Yuan of the Kovall lab had cloned

164

Figure 1: The SPOC Domain in MINT, Spen, and Nito A) These three proteins are members of the Spen protein family. The mouse protein MINT possesses four RNA recognition motifs (RRM1-4, pink), several nuclear localization signals (NLS, black), a domain that interacts with the transcription factor Msx2 (RID, green), a CSL interaction domain (CID, blue), and a SPOC domain (orange). The fly proteins Split Ends, or Spen, and Nito each have three RNA recognition motifs (RRM) and a SPOC domain. B) MINT is a corepressor of Notch signaling. As such, it interacts with CSL present on DNA in target genes. The SPOC domain of MINT interacts with additional corepressors, such as ETO, SMRT/NCoR, and CtIP/CtBP, assembling a repression complex around CSL.

165

Figure 2: The SPOC Domain Interacts with NCoR Corepressors A) Electrostatic surface representation of the SPOC domain of SHARP, the human ortholog of MINT. Blue represents positively charged residues while red rep resents negatively charged residues. A positively charged patch is formed by seven absolutely conserved residues of the SPOC domain, which interact with the negatively charged LSD motif of NCoR in a structure solved by Ariyoshi and Schwabe.212 B) Diagram schematic of NCoR, showing three repression domains (RD, green), three nuclear receptor interacting domains (RID, pink) and two SANT domains (yellow). At the C- terminus is a conserved LSD motif (blue) that interacts with the SPOC domain of MINT. C) The LSD motif is conserved between fly SMRTER and mammalian SMRT and NCoR. Absolutely conserved residues are highlighted in grey.

166

constructs of these three proteins into vectors favored by our lab. First, I co -expressed pGEX-dSPOC 5363-5561 with either pSMT-Trx 3372-3475 or pSMT-Trr 2022-2230, but the results were inconclusive, which we later determined was an inherent feature of this method. Next, I bacterially expressed and purified all three proteins using our sta ndard protocols. I acquired a significant yield of pure dSPOC 5363-5561, which was highly stable in any buffer used. Unfortunately, Trx 3372-3475 was not stable after removing the SMT fusion tag, as it precipitated out of solution. This problem persisted when I tested different expression approaches, so I ultimately used the fusion protein in binding experiments. As for Trr 2022-2230, I was able to purify it after removing the SMT fusion tag, but did not utilize it in experiments based upon the lack of binding we observed with

Trx 3372-3475.

Using isothermal titration calorimetry (ITC), I analyzed binding between dSPOC

5363-5561 and several possible binding partners. Alas, none showed any indication that binding occurred. Experiments consisted of 40 injections of 7 μL each and were conducted at 25°C unless otherwise noted. I ran ITC experiments with SMT-Trx 3372-

3475 and dSPOC in both NaPi 6.5 buffer (50 mM NaPi, pH 6.5, 150 mM NaCl) and

Hepes 7.5 buffer (20 mM Hepes, pH 7.5, 150 mM NaCl, 0.1 mM TCEP) at 10°C and

25°C. By this time, my lab mates had determined that mouse SPOC bound phosphorylated mouse SMRT. I performed a cross-species ITC experiment with dSPOC and phospho-mouse SMRT in Hepes 7.5 buffer, but again did not detect binding. I also used a phospho-mimetic mouse SMRT construct developed by Zhenyu

Yuan (SMT-mSMRT 2472-2507 SSEE) in an ITC experiment with dSPOC in Hepes 7.5 buffer, but did not see any interaction, whether dSPOC was in the syringe or the cell.

167

For the next set of ITC experiments involving dSPOC, I purchased two synthesized peptides—SMRTER and phospho-SMRTER. My rationale for this was two- fold. First, my lab mates had successfully used synthesized phospho-peptides in their mouse SPOC studies. Second, the LSD motif in mammalian NCoR, which is responsible for its interaction with the SPOC domain of human SHARP, is also present in mammalian SMRT and a Drosophila ortholog, SMRTER.203 (Figure 2C) My lab mates’ research had determined the S (serine) of the LSD motif bore the phosphorylation important for the SPOC-NCoR interaction. They also identified a second phospho-serine adjacent to the LSD motif that was important for binding the

SPOC domain. Thus, it was possible SMRTER could be a possible binding partner for dSPOC, particularly if modified in its conserved LSD motif with a phosphorylation. I purchased a crude preparation of 80-100 mg of the 19 amino acid SMRTER peptide from GenScript, which I purified by HPLC and verified by mass spectrometry. When used in ITC experiments with dSPOC in Hepes 7.5 buffer, no binding could be detected.

I purchased 30 mg of 96% HPLC pure phospho-SMRTER peptide from Peptide 2.0 and verified its purity and modification by mass spectrometry. The peptide was the same 19 amino acid sequence as the unmodified SMRTER peptide I had previously analyzed, but this new version possessed a single phosphorylation on the serine in the LSD motif at the C-terminal end. Using Hepes 8.0 buffer (20 mM Hepes, pH 8.0, 150 mM NaCl), I performed ITC with phospho-SMRTER and dSPOC at 25°C, but did not observe binding. In an attempt to determine whether the peptide could bind any SPOC domain,

I tested phospho-SMRTER with mouse SPOC 3474-3643, the construct my lab mates had used in their studies, but, once again, did not detect binding. It is entirely possible

168

this cross-species experiment would never work, but it is interesting to learn a phosphorylated LSD motif is not the only requirement for binding mouse SPOC. Lastly,

I ran an ITC experiment with phospho-SMRTER and dSPOC using a slightly lower salt concentration of 100 mM, based upon my lab mates’ research, but failed to observe interaction by ITC.

169

Appendix E: Purification and Crystallization of the Ankyrin Repeats of the Human Notch Intracellular Domain

In this work I contributed protein expression, protein purification, and crystal screening of the human Ankyrin repeats.

170

A collaborator, Dr. Anthony Capobianco, contacted our lab and asked if we could crystallize the ANK domain of the human Notch1 receptor with various small molecules.

The domain, which we refer to as hANK (human Ank), had already been crystallized by two other labs. The Blacklow lab crystallized hANK by itself to 1.55 angstroms as well as complexes of DNA-human CSL-hANK-Mastermind1.9 The Blundell lab crystallized hANK by itself to 1.9 angstroms and later crystallized the hANK domain with fragments of two different small molecules.259-260 Both small molecules bound at the interface between hANK and Mastermind; the first molecule at the upper helices of hANK repeats six and seven while the second molecule bound at the upper helices of hANK repeats five and six.259

With the directive of crystallizing hANK, we undertook the tasks of expressing, purifying, and crystallizing hANK from a construct provided by our collaborator. The construct, pGEX-TEV-hANK 1873-2127, corresponds to the construct used by the

Blacklow lab when they crystallized hANK. Thus we referred to the protein purification and crystallization methods from their publication. I worked on this project with our highly capable undergraduate, now pharmacy student, Nathan Miller, who eventually took full responsibility for the project. The methods I describe were done together, unless otherwise noted.

After receiving the DNA for the construct from our collaborator, we performed a maxi prep to establish a purified DNA stock, which we then transformed into competent cells and expressed in E.coli by IPTG induction. Following the methods of the Blacklow paper, we added 20% w/v sucrose to the cells during our standard lysis protocol and did not perform our typical 65% ammonium sulfate cut after lysis. We then employed

171

affinity chromatography to isolate the GST fusion tagged protein from the crude lysate.

The construct was not in our usual pGEX-6P-1 vector, meaning it did not have a

Prescission Protease site between the GST fusion tag and the hANK protein. Instead, the construct had a TEV protease site between them, so, with the guidance of graduate student Catie Shelton from Dr. Andrew Herr’s lab, we put our sample into dialysis tubing in a dialysis buffer of 20 mM Tris, pH 8.0, 100 mM NaCl, 14 mM BME, 1% EG. We dialyzed our sample overnight, with part of the sample at 4°C and the remaining part of our sample at room temperature. For both samples, we added 1 M DTT and TEV protease, as TEV requires DTT for proper activity. A gel of the dialyzed products showed there was some cleavage but still a significant amount of fusion protein remaining. We returned both samples to their respective dialysis overnight with a new addition of TEV protease and DTT. Again, a gel of the dialyzed products revealed cleavage had occurred, but significant amounts of fusion protein remained. After conferring with the Herr lab, we decided to add a larger amount of TEV to each sample and dialyze for a longer period of time. We combined our two samples and dialyzed for approximately 48 hours at 4°C. A gel showed most of our protein was cleaved, though a small population of fusion protein remained.

Once we had cleaved the GST tag from hANK, we removed much of the free

GST using affinity chromatography with GST beads and a gravity column. To remove the remaining free GST, we dialyzed the protein in low salt Q dialysis buffer (20 mM

Tris, pH 8.0, 10 mM NaCl, 1 mM EDTA, 1 mM DTT, 1% EG) overnight at 4°C before running over a Q ion exchange column on the FPLC system. From the Q column, we acquired two populations of protein: pure hANK or hANK plus GST-hANK. We

172

proceeded to concentrate and run the pure hANK population over a size exclusion column into Tris 8 sizing buffer (20 mM Tris, pH 8.0, 500 mM NaCl, 1% EG, 0.1 mM

TCEP). Unfortunately, a gel of the sizing column fractions revealed there were two populations present in the sample—hANK and free GST, which have similar molecular weights (~27 kDa for hANK and ~25 kDa for free GST). A sizi ng column is not sufficient for resolving two molecules with such similar molecular weights, so we returned our fractions back to low salt Q dialysis. We ran the sample over the Q ion exchange column with a very shallow salt gradient, resulting in better separation of hANK from free GST as evidenced by a gel. Some GST-hANK fusion was present in the fractions containing hANK, but we were able to remove them with a size exclusion column in a modified Tris 8 sizing buffer (20 mM Tris, pH 8.0, 150 mM NaCl, 0.1 mM TCEP) we made based on the methods of the Blacklow lab paper.9 The fractions from the sizing column containing pure hANK were concentrated to 25 mg/mL and aliquoted for storage at -80°C.

Using the purified and concentrated hANK, we set up crystallization screens with the Phoenix robot, including JCSG+ (hANK at 14 mg/mL) and Hampton Salt (hANK at 6,

12, and 24 mg/mL). Unfortunately, the robot was damaged and in repair for a few weeks, so we proceeded by making 24-well hanging drop and under oil trays based off the crystallization conditions of 100 mM Tris, pH 8.5, and 1 M ammonium sulfate used for hANK in the Blacklow lab paper.9 For these screens, we used hANK at 10mg/mL.

When we failed to obtain any crystal hits for all of these screens, we dialyzed hANK into the modified Tris buffer identical to the one used by the Blacklow lab, containing 5 mM

DTT instead of 0.1 mM TCEP.9 We had initially used TCEP as it persists as a reductant

173

in the buffer for a longer period of time than DTT. We repeated the custom 24-well screens with hANK at 10mg/mL. At this point, Nathan Miller took responsibility of the project over from me and has performed all the additional work I will mention. He also set up the Pro-Complex screen with hANK at 10 and 20 mg/mL. Still, no crystal hits appeared.

While formulating a new approach to hANK crystallography, Dr. Rhett Kovall noticed the hANK construct (1873-2127) listed in the methods of the Blacklow paper was longer than the construct actually crystallized in their structure (1905-2122).9 It is possible this shorter construct was a minor contaminant in the protein stock used for crystallization by the Blacklow lab. To test whether the longer construct we were using was stable, Nathan Miller performed a trypsin digest of hANK 1873-2127. The digest showed hANK can remain stable at significant trypsin concentrations, but there was an approximately 30 amino acid tail that was being removed. The identity of these portions was confirmed by mass spectrometry. Currently, Nathan Miller is in the process of cloning the shorter hANK construct (1905-2122) into pGEX-6P-1, with the intention of expressing and purifying it by the standard Kovall lab protocol and repeating crystallization screens with it.

174