Reading the Code: Methyl Mark Recognition by MBT and Royal Family Proteins

by

Nataliya Nady

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Medical Biophysics University of Toronto

© Copyright by Nataliya Nady 2012

Reading the : Methyl Mark Recognition by MBT and Royal Family Proteins

Nataliya Nady

Doctor of Philosophy, 2012

Department of Medical Biophysics, University of Toronto

Abstract The post-translational modifications (PTMs) of regulate many cellular processes including , replication, DNA repair, recombination, and segregation. A large number of combinations of PTMs are possible, with being one of the most complex, since it is found in three states and is recognized in a sequence specific context. Methylation of histones at key residues has been shown to work in concert with other modifications to provide a Histone Code that may determine heritable transcriptional conditions in normal and disease states. On the most basic level it is pivotal to understand how and by which proteins the numerous PTMs are recognized, as well as mechanisms for downstream signal propagation. To address this need we developed a high-throughput method that allows analysis of up to 600 PTMs in a single experiment. This approach was utilized to characterize macromolecules interacting with the specific modifications on histone tails and to screen for the marks that bound to Malignant Brain Tumor (MBT) proteins, important regulators implicated in cancer. All MBTs recognized either mono- or dimethyllysine histone marks, and using structure-based mutants we identified a triad of residues that were responsible for this discrimination. These results provide the foundation for the rational design of highly selective MBT inhibitors. Additionally, this thesis describes combinatorial recognition of histone modifications, as proposed in the original Histone Code hypothesis. We demonstrate that Tudor domains of UHRF1, a protein involved in epigenetic maintenance of DNA methylation, is able to read a dual modification state of in which it is trimethylated at lysine 9 and unmodified at lysine 4. This study provides an elegant example of the combinatorial readout of histone modification states by a single domain. Together, our findings offer mechanistic insights into the recognition of methylated histone tails by MBT domains and Royal Family in general.

ii Acknowledgements

Looking back, I am very grateful for all I have acquired throughout the years as a

graduate student. It has shaped me as a person and a scientist and would not be possible without

many great people surrounding me.

First and foremost I wish to thank my supervisor, Dr. Cheryl Arrowsmith. She accepted

me into her lab as a summer student and continued to support me ever since. I am grateful for her mentorship, guidance, and valuable advice. She has always encouraged my curiosity and ambitious aspirations which were pivotal to the success of my training as a scientist. I would also like to thank my supervisory committee members Dr. Peter Cheung and Dr. Gilbert Privé for their suggestions and encouragement which helped me to stay on track. I am also thankful to Dr.

Aled Edwards and the SGC for the plentiful resources and fresh perspective on the issue.

I am indebted to past and present members in the Arrowsmith lab for their help, insightful discussions and friendship. I would like to specifically thank my good friend Rob for teaching me how to appreciate good scientific work as well as interesting discussions and occasional beers; Lilia and Shili for their assistance with my project and creating a friendly lab environment; Sasha for teaching me protein NMR; and Liuba for her help over the summers.

The years spent as a graduate student would not be as enjoyable without my friends

Atoosa Mehrfar, Diana Purushotham, Nadiya Koshtura, Jocelyn Stewart, Fernando Amador,

Tharan Srikumar, Alison Aiken.

Finally, I wish to thank my family. I am forever indebted to my parents for their unconditional love and invaluable opportunities that allowed me to pursue graduate degree.

Lastly, I cannot thank enough my husband, Andrew. Not only he always ensured that my computer is up and running problem-free, but I am above all grateful for his love, endless patience, support, and words of encouragement when it was most required.

iii Table of contents

Abstract ii Acknowledgements iii Table of Contents iv List of Tables vi List of Figures vii List of Abbreviations viii

Chapter 1: Introduction 1 1.1 Thesis overview 2 1.2 Epigenetic mechanisms 2 1.3 Histone post-translational modifications and the Histone Code hypothesis 4 1.4 8 1.5 Methyl-reader modules 10 1.6 Royal family 12 1.6.1 Identification 12 1.6.2 Architecture 13 1.6.3 Biological function 16 1.7 Rationale and Intensions of the Thesis 18 1.8 References 19

Chapter 2: A SPOT on the chromatin landscape? Histone peptide arrays as a tool for epigenetic research 24 2.1 Summary 25 2.2 Introduction 25 2.3 Results 28 2.3.1 SPOT blotting 28 2.3.2 Application 1: Characterization of reagents for chromatin research 30 2.3.3 Application 2: Mapping the specificity of histone binding domains 32 2.3.4 Application 3: Demonstration of non-sequence specific interactions 36 2.4 Discussion 38 2.5 Materials and Methods 40 2.6 References 50

Chapter 3: Histone Substrate Recognition by Human MBT Domains 53 3.1 Summary 54 3.2 Introduction 54 3.3 Results 58 3.3.1 MBT domains recognize a variety of methylated on core histones 58 3.3.2 Recognition mechanism for mono- and dimethyllysine 60 3.3.3 Lack of sequence specific binding by L3MBTL1 and L3MBTL3 67 3.3.4 Potential recognition of non-histone proteins 70 3.3.5 Recognition of specific histone sequences 70 3.4 Discussion 79 3.5 Materials and Methods 83

iv 3.6 References 89

Chapter 4: Recognition of multivalent histone states associated with by Tandem Tudor Domains of UHRF1 93 4.1 Summary 94 4.2 Introduction 94 4.3 Results 97 4.3.1 UHRF1 contains a Tandem Tudor Domain that binds in vitro 97 4.3.2 TTD recognizes hallmarks of heterochromatin 100 4.3.3 Structural basis for the recognition of the H3K4me0/K9me3 signature 103 4.3.4 Reorientation of TTD subdomains upon histone H3 binding 107 4.3.5 Mutational analysis confirms localization of TTD to heterochromatin 110 4.4 Discussion 113 4.5 Materials and Methods 115 4.6 References 125

Chapter 5: Conclusions and Future Directions 130 5.1 Conclusions 131 5.2 Future Directions 132 5.3 Concluding remarks 137 5.4 References 138

v List of Tables Chapter 1 Table 1.1. Examples of the protein domains, the histone marks they recognize and their biological effect. 11

Chapter 2 Table 2.1. List of peptides with known modifications used in SPOT-blot synthesis. 42 Table 2.2. List of peptides used in SPOT-blot synthesis. 44

Chapter 3 Table 3.1. Overview of the highest affinity histone substrates for MBT domains within human proteins. 61 Table 3.2. Thermal stability of the wild type and mutant proteins as indicated by aggregation temperatures. 65 Table 3.3. Overview of the ability of MBT mutants to bind lower lysine methylation states. 66 Table 3.4. Crystallographic data collection and refinement statistics. 75

Chapter 4 Table 4.1. NMR data and refinement statistics. 105 Table 4.2. Intra- and intermolecular distance restraints used for TTD-H3K4me0/K9me3 calculations. 121

iv List of Figures

Chapter 1 Figure 1.1. Sites of histone post-translational modifications (PTMs). 5 Figure 1.2. Model representation of lysine and arginine methylation. 9 Figure 1.3. Recognition of by the Royal family protein domains. 14

Chapter 2 Figure 2.1. Schematic representation of the SPOT binding assay. 29 Figure 2.2. Confirmation of complete synthesis of modified histone peptides. 31 Figure 2.3. Characterization of reagents for chromatin research. 33 Figure 2.4. Mapping the specificity of histone-binding domains. 35 Figure 2.5. Demonstration of non-sequence-specific interactions. 37

Chapter 3 Figure 3.1. Domain architecture and analysis of the MBT-containing proteins. 56 Figure 3.2. Interactions of the MBT domains with histones in SPOT arrays. 59 Figure 3.3. Influence of the MBT pocket architecture on the recognition of the lysine methylation state. 63 Figure 3.4. L3MBTL1 and L3MBTL3 are promiscuous binders. 69 Figure 3.5. Poor recognition of histone substrates by MBTs. 71 Figure 3.6. Recognition of specific histone sequences. 72 Figure 3.7. Structural and functional analysis of the SCML2 binding. 74 Figure 3.8. Binding of SCML2 to . 77 Figure 3.9. SCML2 binds to DNA with its basic region. 78 Figure 3.10. SCML2 interacts specifically with the H2AK36me1-containing nucleosomes. 80

Chapter 4 Figure 4.1. A novel evolutionary conserved domain within UHRF1 corresponds to TTD. 96 Figure 4.2. A novel TTD domain within UHRF1 binds H3 histone tail with H3K9me3. 98 Figure 4.3. UHRF1 recognizes multivalent histone signatures associated with heterochromatin. 101 Figure 4.4. Minimal interactions between TTD and the short H3K9me3-containing peptide in solution. 103 Figure 4.5. Recognition of multivalent sites at the interface between the two Tudor subdomains. 106 Figure 4.6. Tandem tudor domain behaves as a single rigid body. 108 Figure 4.7. Structural re-adjustment of TTDC in order to accommodate the histone tail. 109 Figure 4.8. Residual dipolar couplings (RDCs) collected on the TTD/H3K4me0K9me3 complex fit poorly the apo crystal structure. 111 Figure 4.9. Mutational analysis confirms localization of TTD to heterochromatin in mouse ES cells. 112

vii List of Abbreviations

2D Two-dimensional 3D Three-dimensional ADP Adenosine diphosphate APS Advanced photon source BRCT BRCA1 C-terminal domain Bromo Brahma organization modifier BSA Bovine serum albumin ChIP Chromatin immunoprecipitation Chromo Chromatin organization modifier DNA Deoxyribonucleic acid DNMTL DNA methyltransferases DTT Dithiotreitol ELISA Enzyme-linked immunosorbent assay EMSA Electrophoretic mobility shift assays FP Fluorescence polarization HPLC High Pressure Liquid Chromatography HRP Horseradish peroxidase HSQC Heteronuclear single quantum coherence IP Immunoprecipitation IPTG isopropyl-1-thio-D-galactopyranoside MALDI-TOF Matrix Assisted Laser Desorption /Ionization- Time Of Flight MBT Malignant Brain Tumor MLL1 Mixed-Lineage Leukemia-1 Kme1 Monomethylated lysine Kme2 Dimethylated lysine Kme3 Trimethylated lysine NMR Nuclear Magnetic Resonance O-GlcNAc β-N-acetylglucosamine PBS Phosphate Buffered Saline PBS/T PBS+0.1%Tween20 PcG Polycomb Group PHD Plant Homeodomain PhoRc Pleiohomeotic Repressive Complex PMSF Phenylmethyl sulfonyl fluoride PRC1 Polycomb Repressive Complex 1 PRC2 Polycomb Repressive Complex 2 PRE Polycomb response elements PTM Post-translational modification Rme1 Monomethylated arginine Rme2(s) Symmetrically dimethylated arginine Rme2(a) Asymmetrically dimethylated arginine SAM Sterile Alpha Motif SAM S-adenosyl-L-methionine SDS-PAGE Sodium Dodecyl Sulfate PolyAcrylamide Gel Electrophoresis

viii SET Su(var)3-9, Enhancer-of-zeste, Trithorax SGC Structural Genomics Consortium Sph Phosphorylated SR1/2 Selectivity residue 1/2 SUMO Small -like MOdifier tRNA Transfer RiboNucleic Acid TrxG Trithorax Group TTD Tandem tudor domain

ix 1

Chapter 1

Introduction

“The major problem, I think, is chromatin. What determines whether a piece of DNA along the chromosome is functioning, since it’s covered with the histones? You can inherit something beyond the DNA sequence. That’s where the real excitement of genetics is now” (Watson and Crick 2003) 2

1.1 Thesis overview

Post-translational modifications (PTM) on histones regulate many fundamental cellular

processes. Numerous PTMs have been identified with methylation being the most complex modification, since mono-, di- and trimethylated states serve as binding sites for distinct protein

modules. Moreover, methylation of the key lysines works in concert with other modifications,

contributing to the combinatorial regulation often referred to as the Histone Code (Strahl and

Allis 2000; Turner 2000). Chapter 1 provides an introduction to these basic concepts in

epigentics together with a rationale of the aims for this thesis. Chapter 2 describes a high-

throughput method that allowed analysis of up to 600 PTMs in a single experiment. This

approach was utilized to screen for modifications that bound to Malignant Brain Tumor (MBT)

domains with the subsequent description of the molecular determinants for the interaction in

Chapter 3. Chapter 4 describes the studies of combinatorial histone PTM recognition by UHRF1

protein, as proposed in the Histone Code hypothesis. The dissertation then closes in a final

chapter with concluding remarks and describing new questions that arose from this work for

future investigations.

1.2 Epigenetic mechanisms

The ideas and the scientific data that contribute to our understanding of had

been accumulating since the early twentieth century. Following Flemming’s discovery of

in 1879 several genetic studies indicated that all somatic cells contained all of the

chromosomes (Allis, Jenuwein et al. 2007). However, the scientific community did not recognize

that the DNA carries the genetic information and is the same in all somatic cells of an organism.

Geneticists at that time could not explain many developmental phenomena and turned to look for 3

other mechanisms that dictate the cell type during development. In 1942 the term epigenetics

was conceived by Conrad Hal Waddington to refer to the developmental events that could not be

explained by genetic principles (Waddington 1942). Today, epigenetics is defined as “the study

of mitotically and/or meiotically heritable changes in gene function that cannot be explained by

changes in DNA sequence” (Riggs et al, 1996).

Epigenetic phenomena are diverse and have been studied in a variety of organisms. S.

cerevisiae and S. pombe are good model systems for studying mating-type switching and RNAi machinery, respectively. Purification of the first histone acetyltransferase from Tetrahymena was

possible due to the enriched actively transcribed chromatin in its macronuclei (Brownell, Zhou et

al. 1996). Numerous studies on imprinting, paramutation and transposon-induced gene silencing

were done on maize. With the emergence of multicellular organisms such as metazoan C.

elegans polycomb and trithorax genes arose, and Drosophila has provided a great framework to

study position-effect variegation (Allis, Jenuwein et al. 2007). Perhaps the most captivating

example of epigenetic phenomena includes development of a human being with over 200

different cell types from a single fertilized egg, or development of human monozygotic twins

that turn out to be increasingly different by the time they reach adulthood.

We now know that epigenetic phenomena are linked to the fact that DNA exists as a

complex with other macromolecules comprising chromatin. Chromatin is found in different

states, which regulate access to the genomic DNA and dictate differential . The

chromatin states are established through a variety of covalent and non-covalent mechanisms. In

humans, these include methylation and hydroxymethylation of cytosines on DNA, energy-

dependent , exchange of histone variants, interactions with small non-

coding RNAs, and post-translational histone modifications (Allis, Jenuwein et al. 2007). 4

1.1 Histone post-translational modifications and the Histone Code hypothesis

The basic repeating unit of chromatin is the , which on average consists of

146 basepairs of DNA wrapped around the core histone proteins (H2A, H2B, H3, and H4). The details of the intimate interaction between the histones and the DNA were unraveled after a crystal structure of the nucleosome was solved (Luger, Rechsteiner et al. 1997). Nucleosome density and access to the genomic DNA in eukaryotic cells is tightly regulated in order for processes such as transcription, DNA replication and repair to occur at the right place and time

(Wu and Grunstein 2000). One of the primary mechanisms regulating access to the DNA is the post-translational modifications (PTM) of histones.

A large number of PTMs on histone proteins have been described, and all known modifications found on the core histones are depicted in Figure 1.1. The prevalent majority is found on the N- and C-terminal tails, with the smaller numbers found within the central globular part of the histone (Zhang, Eugeni et al. 2003). Residues on the lateral surface of the histone octamer that are expected to interact with the DNA also have been reported to be modified, thereby weakening the histone-DNA interactions and regulating the nucleosome mobility

(Muthurajan, Bao et al. 2004). All modifications, except proline cis-trans isomerization, are covalently attached to histone residues lysine, arginine, glutamate, serine, threonine and tyrosine

(Nelson, Santos-Rosa et al. 2006). The size of the PTMs varies from the attachment of a smaller

(<100 Da) methyl, acetyl, propionyl, butyryl, crotonyl, formyl, and phosphoryl moiety or conversion to citrulline, to the larger (<1000 Da) biotinylation, ADP ribosylation, O-

GlcNAcylation, and even the attachment of ~10kDa proteins like ubiquitin and SUMO (Shiio and Eisenman 2003; Hassa, Haenni et al. 2006; Thompson and Fast 2006; Chen, Sprung et al.

2007; Kouzarides 2007; Sakabe, Wang et al. 2010; Tan, Luo et al. 2011).

5

Histone H3(human), NCBI accession CAB02546

B B B Cr R Cr Cr Cr Cr Pr Cr Fo Ci Ac Ci Ac Ac Ci Ac Ac Ci Ac Ac R Ac Me P Me P Me Me P P Me Me Me Me MeMe P Me Me P P MeMe Me ARTKQTARKS TGGKAPRKQL ATKVARKSAP ATGGVKKPHR YRPGTVALRE IRRYQKSTEL Fo αN Ac Ac Me Me Me Ac P LIRKLPFQRL MREIAQDFKT DLRFQSSAVM ALQEACEATL VGLFEDTNLC AIHAKRVTIM α1 α2 Fo L1 L2 Ac Me Me Me PKDIQLARRI RGERA α3

Histone H4(human), NCBI accession P62805 Su * B Cr B Ub Bu Cr Cr Fo R Fo Ac Glc Ci Pr Pr Bu Ac Ac Ac Fo Me Pr P Me Ac Ac Me Me Me Me P Me Me SGRGKGGKGL GKGGAKRHRK VLRDNIQGIT KPAIRRLARR GGVKRISGLI YEETRGVLKV Fo Ub α1 L1 α2 Ac Ac Fo Me Me Me Ac Me FLENVIRDAV TYTEHAKRKT VTAMDVVYAL KRQGRTLYGF GG α2 L2 α3

Histone H2A(human), NCBI accession CAB06037

B R Ac B Cr P Ci Ac Me Ac Ac Ac Me SGRGKQGGKA RAKAKTRSSR AGLQFPVGRV HRLLRKGNYA ERVGAGAPVY LAAVLEYLTA α1 α2 L1 Fo Ub Cr Cr P Me Me Me Me Me Me Glc Me Ac EILELAGNAA RDNKKTRIIP RHLQLAIRND EELNKLLGKV TIAQGGVLPN IQAVLLPKKT α2 α3 B L2 Cr B Me Me B ESHHKAKGK

Histone H2B(human), NCBI accession P62807 Cr Fo Cr Cr Cr Ub Fo Glc Ac Cr Ac Ac Cr Ac Cr Fo Ac R Me Ac Me P Me Ac Me Ac Ac R Cr P Me Me Me PEPAKSAPAP KKGSKKAVTK AQKKDGKKRK RSRKESYSVY VYKVLKQVHP DTGISSKAMG α2 α1 Fo L1 Fo Ub Ac Ac Ac Fo Me Me Me Me Me Ac IMNSFVNDIF ERIAGEASRL AHYNKRSTIT SREIQTAVRL LLPGELAKHA VSEGTKAVTK α2 L2 α3 αC YTSK

Modification Key Proline isomerization Me Methylation Glc Ac O-GlcNAcylation P Ub Pr Propionylation Ubiquitylation Ci Bu Butyrylation Su Sumoylation Cr Crotonylation R ADP ribosylation * Histone has been reported to be modified, Fo Formylation B Biotinylation but the exact attachment site is not known

Figure 1.1. Sites of histone post-translational modifications (PTMs). The positions of currently identified PTMs on human histones are indicated with the modification keys. Secondary structure elements in the globular core are indicated below the sequence, with cylinders representing alpha-helices. L1 and L2 correspond to loops 1 and 2. 6

These histone PTMs are established and removed by the catalytic action of the chromatin-associated enzymes. These antagonistic activities are carefully regulated to establish a steady-state for each modification. In a variety of biochemical and genetic studies a large number of enzymes have been identified. Although there are still many marks for which enzymes have not been identified, even a bigger challenge is to understand how these enzymes are regulated.

There are two effects of the histone PTMs – cis- and trans-effects. Cis-effects refer to the phenomenon where histones and histone PTMs are viewed only as the packaging molecules, helping to compact 2m of DNA into a ~10μm nucleus. Some covalent modifications on the histones result in altered structure or charge that manifests as a change in chromatin organization. For example, lysine acetylation neutralizes positive charge on the histones, so that the negatively charged DNA backbone phosphates can not bind as tightly, resulting in a more open chromatin organization. Supporting this observation Megee et al. showed that the primary sequence of the did not matter as long as the histone H4 contained the net charge similar to the wild type histone (Megee, Morgan et al. 1995).

The trans-effects refer to the hypothesis that histone modifications serve as docking sites for the chromatin-associated proteins which cause downstream alterations in chromatin structure.

This idea is not new and dates as far back as 1964 when Vincent Allfrey demonstrated that the presence of an acetyl mark on histones led to changes in RNA levels and activated genes

(Allfrey, Faulkner et al. 1964; Allfrey and Mirsky 1964). Today, ‘Histone Code’ is defined as the combinatorial pattern of histone modifications and the ability of various chromatin interacting proteins to recognize the presence and absence of specific PTMs, thereby inducing distinct downstream effects (Strahl and Allis 2000; Turner 2000). The term Histone Code is controversial and there has been much debate in the literature as well as the scientific meetings. The term code 7

implies that it is strict, like the genetic code which is predictable and nearly universal across the species. With the Histone Code we know that it is not universal and the modifications vary between the organisms. Often, it is not a single modification that is responsible for the downstream effect, but rather a combination of modifications. These PTMs can be either intra- or internucleosomal and found on the same or different histones (Ruthenburg, Li et al. 2007). The first example of histone crosstalk was demonstrated between the phosphorylation of Ser10 and the acetylation of Lys14 on histone H3 (Cheung, Tanner et al. 2000; Lo, Trievel et al. 2000). In spite of the debate over semantics, many studies have provided evidence for the concept of the

Histone Code.

Another subject of much controversy is the heritability of the Histone Code.

Trimethylation of Lys27 on histone H3 was the first example that a histone modification can be transmitted throughout the cell cycle from generation to generation and is a true epigenetic phenomenon (Hansen, Bracken et al. 2008). Once established, the mark can be perpetuated by the Polycomb Repressive Complex 2 (PRC2). PRC2 is associated with the replication fork, remains bound to the chromatin during mitosis, and can directly bind to

H3K27me3 in order to spread this mark to the neighboring new nucleosomes (Aoto, Saitoh et al.

2008; Hansen, Bracken et al. 2008; Margueron, Justin et al. 2009; Xu, Bian et al. 2010). This suggests that the H3K27me3 mark is subject to epigenetic inheritance. Other histone marks, like

H3K9me2 have been shown to be able to spread to the neighboring new nucleosomes (Hosey,

Chaturvedi et al. 2010), but whether there are mechanisms in place to perpetuate this and plethora of other histone PTMs throughout the entire cell cycle remains to be seen.

8

1.2 Histone methylation

The complexity of the Histone Code is enhanced by the fact that there are three forms of lysine methylation (mono-, di- and tri-) and three forms of arginine methylation (mono- and di-, which can be symmetrical or asymmetrical) (Fig. 1.2a). Unlike acetylation, which eliminates the lysine’s positive charge and generally correlates with transcriptional activation, lysine methylation preserves the positive charge on the lysine side chain and can act as either activating or repressing signal, depending on the sites of methylation (Zhang and Reinberg 2001).

The first enzyme responsible for the catalytic attachment of the methyl moiety,

SUV39H1, was identified in Thomas Jenuwein’s group in 2000 (Rea, Eisenhaber et al. 2000).

Since then the field of histone methylation has grown and numerous methyltransferases have been identified that participate in a variety of biological processes. All of these enzymes, except

Dot1L, contain an evolutionary conserved SET (Su(var)3-9, Enhancer-of-zeste, Trithorax) domain and require S-adenosyl-L-methionine (SAM) as a cofactor. The histone methylation mark is not permanent and can be removed by two types of enzymes with the histone lysine demethylase activity. First class reverses the methylation through the hydroxylation reaction (e.g.

JmjC domains) and is capable of demethylating mono-, di- and trimethylated states (Tsukada,

Fang et al. 2006). The second class reverses the methylation through the amine oxidase reaction and has a sole member LSD1/KDM1 (Shi, Lan et al. 2004). An important difference between this reaction mechanism and that of the JmjC-domain-containing proteins is that LSD1/KDM1 requires a protonated nitrogen as a hydrogen donor, and therefore cannot demethylate trimethylated lysine (Anand and Marmorstein 2007) (Fig. 1.2b).

There are many known methylated sites on the histones that have been identified using proteomics approaches, but only a handful have been well-characterized to date. Lysine

9

a b + + + + + + + +

K Kme1 Kme2 Kme3 R Rme1 Rme2(s) Rme2(a)

hydrophobicity H-bonding c

Figure 1.2. Model representation of lysine and arginine methylation. A stick model representing the three forms of lysine (a) and arginine (b) methylation that contribute to the complexity and the diversity of the Histone Code. Color coding: green, carbon atoms; blue, nitrogen; grey, hydrogen. Hydrogen atoms that can be substituted for a methyl group are shown. Kme1 – mono-, Kme2 – di-, Kme3 – trimethylated lysine; Rme1 – mono-, Rme2(s) – symmetrically di-, Rme2(a) – asymmetrically dimethylated arginine. (c). General enzymatic reactions depicting establishment of the various lysine methylation states. The methylation reaction is driven by lysine methyltransferases and SAM as a co-factor. Lysines are demethylated either through the hydroxylation reaction (mono-, di- and tri-) or through the amine oxidase reaction (mono- and dimethylated states). 10

methylation plays a role in heterochromatin formation (e. g. H3K9me3, H3K27me3,

H4K20me3), X-chromosome inactivation (e. g. H3K27me3), transcriptional regulation (e. g.

H3K4me3, H3K9me3, H3K27me3, ) and DNA repair (e. g. , H4K20me2)

(Martin and Zhang 2005). These biological functions are accomplished through the binding of protein domains that recognize a specific methylation site.

1.3 Methyl-reader modules

A major challenge is to define the molecular events that link histone methylation with the

specific biological outcomes. In part, this can be achieved by understanding the downstream effects that occur after the methylation. As a first step, lysine methylation acts as a signaling mark to recruit the proteins that are capable of “reading” this modification. Lysine methylation exists in three states, with the incremental addition of methyl groups the hydrophobicity and the cation radius increases, while its ability to donate hydrogen bonds decreases and vanishes in trimethyllysine (Fig. 1.2a). Thus, different lysine methylation states have varying physical and chemical properties enabling the different reader modules to distinguish between these states.

The common feature of all the methyl-reader recognition modules is the presence of the

so-called aromatic cage that harbors the methylammonium moiety. The major driving force for the recognition is the cation-π interaction between the quaternary ammonium of lysine and the aromatic π cloud. Hydrogen bonding, and steric effects of mono- and dimethyllysine play a substantial role, while the hydrophobic effect is believed to play an insignificant role in recognition (Hughes, Wiggins et al. 2007; Lu, Lai et al. 2009; Gao, Herold et al. 2011). Four general classes of protein folds have evolved to bind to methyllysine marks: ankyrin repeats,

WD-40 domain, PHD-type zinc fingers and Royal family proteins (Table 1.1). 11

Table 1.1. Examples of the protein domains, the histone marks they recognize and their biological effect.

Histone Biological role Selected references mark ankyrin Kme1/2 unknown (Collins, Northrop et al. 2008) (Song, Garlick et al. 2008; Kme0/1/2/3; WD-40 transcriptional regulation Margueron, Justin et al. 2009; Xu, dual PTMs Bian et al. 2010) transcriptional repression; (Wismar, Loffler et al. 1995; Li, MBT Kme1/2 chromatin compaction; tumor Fischle et al. 2007; Min, Allali-Hassani suppressor et al. 2007) Kme1/2/3; transcriptional regulation; DNA (Huyen, Zgheib et al. 2004; Botuyan, Tudor Rme2s; dual damage Lee et al. 2006; Yang, Lu et al. 2010) PTMs (Dhayalan, Rajavelu et al. 2010; PWWP Kme3 involvement in DNA methylation Royal Family Royal Family Vezzoli, Bonadies et al. 2010) Kme1/2/3; transcriptional regulation ; (Jacobs and Khorasanizadeh 2002; Chromo dual PTMs heterochromatin formation Nielsen, Nietlispach et al. 2002) Kme0/2/3; transcriptional regulation; tumor (Li, Ilin et al. 2006; Pena, Davrazou et PHD Kac; dual suppressor; immunodeficiency and al. 2006; Shi, Hong et al. 2006; PTMs autoimmunity Matthews, Kuo et al. 2007) (Otani, Nankumo et al. 2009; Kme0; dual Dhayalan, Tamas et al. 2011; ADD involvement in mental disorders PTMs Eustermann, Yang et al. 2011; Iwase, PHD-type PHD-type Xiang et al. 2011) (He, Umehara et al. 2010; CW Kme1/2/3 regulation of tissue-specific genes Hoppmann, Thorstensen et al. 2011)

12

The biological significance of the interactions of many reader domains with histone

PTMs is not known. All these epigenetic regulators are multidomain proteins and it is not trivial

to dissect out the role of a single domain. One way to isolate the biological effect of the reader

domain itself is by introducing a point mutation that abolishes its interaction with a histone mark, which, in turn, results in a distinct phenotype. Table 1.1 lists several examples of such studies.

1.4 Royal family

The chromodomain (chromatin organization modifier) of HP1 protein was the first domain discovered to recognize methylated Lys9 on histone H3 (Bannister, Zegerman et al.

2001). Since then a plethora of chromodomains and other modules with the similar structural features that are capable of recognizing methylated histones have been discovered. They include

MBT repeats, Tudor domains, and PWWP domains with all four sharing a similar protein fold and referred to as the Royal Family (Maurer-Stroh, Dickens et al. 2003).

1.4.1 Identification

Authors originally looked at the ENT domains within Arabidopsis thaliana and noted that they often co-occurred with another module. This new Agenet domain, named after the

Plantagenet English monarchs, contains 62 copies in plants and is most closely related to the human Tudor domain. Subsequent sequence and structure comparisons of the human proteins revealed similarities between the Tudor, Chromo, MBT and PWWP domains. It was proposed that these domains are divergent members of a larger homologous superfamily (named the Royal

Family after Plantagenet and Tudors), and as such, might possess a methyl-substrate-binding function (Maurer-Stroh, Dickens et al. 2003). 13

Chromo domains recognize lysines methylated to various degrees as was first

demonstrated by HP1 (Bannister, Zegerman et al. 2001; Lachner, O'Carroll et al. 2001; Jacobs

and Khorasanizadeh 2002). The first instance of the Tudor domain binding to a dimethylated

Lys79 on histone H3 was reported in 2004 (Huyen, Zgheib et al. 2004). In subsequent years a

complex structure of a pair of tandem tudor domains with a histone peptide was reported, providing details of the interaction at the atomic level (Botuyan, Lee et al. 2006). MBT domains were found to recognize the lower methylation states (mono- and dimethyl) of lysines in a protein array screen (Kim, Daniel et al. 2006). The structural basis for this recognition came from several crystal structures of the MBT-histone peptide interactions (Li, Fischle et al. 2007;

Min, Allali-Hassani et al. 2007). The PWWP domain also binds to methylated lysines on histones (Wang, Reddy et al. 2009; Vezzoli, Bonadies et al. 2010). Thus, since the initial proposal that the Royal Family proteins recognize methylated lysines many experimental studies have provided evidence to support this hypothesis. Currently, a key question remaining for most of these domains is their in vivo binding specificity and their biological function.

1.4.2 Architecture

These domains are evolutionarily conserved in proteins from yeast to mammals. The fold of the Royal family domains is reminiscent of the SH3 domain fold that contains an incomplete

β-barrel composed of usually four β-strands (Fig. 1.3). These domains have been implicated in protein-protein, protein-RNA and protein-DNA interactions (Yap, Li et al.; Selenko, Sprangers et al. 2001; Lukasik, Cierpicki et al. 2006). In what follows I describe only examples of the domain – histone interactions. It is interesting to note that the histone peptides are unstructured in the free state. Upon binding with an effector domain the bound peptide adopts a β-strand

14

a b c.

L1 L3

1 2 3 4 COOH NH2 COOH NH2

NH2 L2 COOH

d e

COOH

NH2

NH 2 COOH

Figure 1.3. Recognition of methyllysine by the Royal family protein domains. (a). Cartoon representation of the Royal family fold. Loops 1 and 3 form the aromatic pocket that recognizes methyllysine mark. (b-e). Examples of the known histone-effector domain structures, including Tudor domain of UHRF1, PDB: 3DB3 (b); Chromodomain of Cbx3, PDB: 2L11 (c); PWWP module of BRPF1, PDB: 2X4Y (d); and MBT repeat of L3MBTL1, PDB: 2PQW (e). Strands that form the SH3-like β-barrel are in blue, additional secondary structure elements are in grey, and methylated lysine is shown in pink as a stick model. 15

conformation and forms extensive hydrogen bond network with the protein (Min, Allali-Hassani

et al. 2007; Taverna, Li et al. 2007; Kaustov, Quyang et al. 2010). Crystal and NMR structures

from the numerous Royal family proteins revealed its great versatility.

The Tudor domain is about 80 residues in length and found to exist from one up to eleven consecutive repeats. These domains have the canonical five strand β-barrel fold often supplemented with additional secondary structure elements. The relative orientation of these secondary structure elements provide unique features and great versatility within the Tudor domain family (Adams-Cioaba and Min 2009) (Fig. 1.3b).

The Chromodomain is defined as ~50-residue protein module found in a number of proteins involved in the assembly of protein complexes on chromatin. This domain appears as a three-stranded anti-parallel β-sheet which folds against an α-helix. Interestingly, the N-terminal end of a chromodomain is unstructured and only upon the complex formation with histone peptide does it form an extended β-strand (Min, Zhang et al. 2003; Kaustov, Quyang et al. 2010)

(Fig. 1.3c).

The PWWP domain is larger, contains nearly 135 residues and is found in proteins involved in DNA methylation, DNA repair and transcription regulation. The N-terminal half of the PWWP domain creates a five strand β-barrel, while the C-terminal portion is made up of a helix bundle (Vezzoli, Bonadies et al. 2010; Wu, Zeng et al. 2011) (Fig. 1.3d).

The conserved MBT module is about 100 amino acids in length and always occurs in

repeats of two to four members. Each MBT repeat has a signature MKLE motif and domain

architecture comprised of a β-barrel core preceded by an α-helical portion that interacts closely

with the β-core of the adjacent repeat domain. Thus, each structural module contains residues 16

from two MBT repeat sequences and packs tightly with the neighboring domain to form a stable

multi-repeat module (Wang, Tereshko et al. 2003; Adams-Cioaba and Min 2009) (Fig. 1.3e).

1.4.3 Biological Function

The experimental evidence obtained using biochemical, genetic and cell biology

approaches point to a critical function of the Royal family proteins in fundamental processes

such as maintenance of cell identity during development, regulation of transcription and tumor

suppression. There are many Royal family-containing proteins for which mutations in other

regions of the protein or loss of the entire protein are associated with diseases. MBT, Tudor,

PWWP, and chromodomains of many of these proteins are likely to contribute to the appropriate

chromatin targeting. Below are several examples linked to aberrations within the domains

themselves, rather than a full length protein.

The main role of the MBT-containing proteins appears to be transcriptional repression

and in cell culture the MBT domains of L3MBTL1 and SFMBT1 were required for

transcriptional repression (Boccuni, MacGrogan et al. 2003; Wu, Trievel et al. 2007; Kalakonda,

Fischle et al. 2008). The mechanism of repression remains elusive, but studies suggest that

L3MBTL1 may function through chromatin compaction or by acting as an insulator (Trojer, Li et al. 2007; Richter, Oktaba et al. 2011).

On the opposite side of the spectrum are the Tudor domains of TDRD3 that recognize

H3R17me2a and H4R3me2a histone marks, and act as a transcriptional activator. Genome-wide chromatin IP showed localization of TDRD3 to promoters that was dependent on the histone

PTMs. The authors speculate that the transcriptional activation may be facilitated by the interaction between arginine-methylated enhancers and promoters (Yang, Lu et al. 2010). A 17

similar mechanism has been proposed for CHD7 function, a protein that contains tandem

chromodomains. Chromodomains of CHD7 bind to enhancer elements in the presence of

trimethylated Lys4 on histone H3 and facilitate looping of chromatin to bring the enhancer in

close proximity with transcription start sites. This allows for chromodomains to modulate the

transcriptional output of the genes in human colorectal carcinoma cells, human neuroblastoma

cells, and mouse embryonic stem cells before and after differentiation (Schnetz, Bartels et al.

2009). These data suggest involvement of CHD7 chromodomains in tissue and developmental stage-specific processes.

The histone-reading ability of PWWP domain has been linked to another major epigenetic modification, DNA methylation. The PWWP domain of Dnmt3a methyltransferase is

involved in (I) targeting of Dnmt3a to chromatin carrying H3K36me3 mark and (II) increasing

the activity of Dnmt3a for methylation of nucleosomal DNA (Dhayalan, Rajavelu et al. 2010).

Mutations within these domains have been linked to a variety of disorders. The term

Malignant Brain Tumor domain was coined because mutations in L(3)MBT gene cause malignant

transformations of optic neuroblasts in flies (Wismar, Loffler et al. 1995). Human L(3)MBT

homolog has been implicated in hematopoietic disorders (Gurvich, Perna et al. 2010; Perna,

Gurvich et al. 2010). In addition to malignancies, developmental disorders have also been

reported. Mutations within CHD7 protein, including the chromodomains give rise to various

phenotypes of the CHARGE (coloboma of the eye, heart defects, atresia of the choanae,

retardation of growth and/or development, genital and/or urinary abnormalities, and ear

abnormalities) syndrome (Zentner, Layman et al. 2010).

In summary, mutations within the Royal family domains have been associated with a

wide variety of human diseases. However, questions still remain about the contribution of 18

histone–reader domain interactions to the direct cause-effect relationship of disease pathogenesis. Ultimately, learning the regulatory roles of histone modifications in the context of human disease will greatly broaden our mechanistic appreciation of normal and pathological development.

1.5 Rationale and Intensions of the Thesis

Over 180 modifications have been reported on the four core histones within the nucleosome. This work focuses on studying the histone lysine methylation mark due to its intricacy and its array of diverse biological functions. Methylation of the key lysines works in concert with other modifications, contributing to combinatorial regulation often referred to as the

Histone Code.

At the time of starting my graduate research in 2006, a large number of the histone PTMs and even a greater number of the protein effector domains had been identified. However, there were large gaps in our understanding of which PTMs are recognized by which proteins and the molecular determinants that dictate the specific recognition. To address this issue, I developed a high-throughput peptide array method that allowed analysis of up to 600 PTMs in a single experiment. This method is now recognized and used not only in our laboratory, but also by other groups in the field. Subsequently, this approach was utilized to characterize macromolecules interacting with specific PTMs on histone tails and to screen for the modifications that bound to all human MBT-containing proteins, important chromatin regulators implicated in cancer. In solving the crystal structures of the specific MBT-histone complexes we sought to reveal the details of the binding specificity, if any. 19

Lastly, the Histone Code and the multivalency hypotheses propose the contribution of two or more modifications to the readout of the biological effect. Using a combination of structural and biochemical techniques it was demonstrated that the Tudor domains of UHRF1, a protein involved in epigenetic maintenance of DNA methylation, is able to read a dual histone modification state of trimethylated histone H3 Lys9 and unmodified Lys4.

1.6 References

Adams-Cioaba, M. A. and J. Min (2009). "Structure and function of histone methylation binding proteins." Biochem Cell Biol 87(1): 93-105. Allfrey, V. G., R. Faulkner, et al. (1964). "Acetylation and Methylation of Histones and Their Possible Role in the Regulation of Rna Synthesis." Proc Natl Acad Sci U S A 51: 786-94. Allfrey, V. G. and A. E. Mirsky (1964). "Structural Modifications of Histones and their Possible Role in the Regulation of RNA Synthesis." Science 144(3618): 559. Allis, C. D., T. Jenuwein, et al. (2007). Epigenetics. Cold Spring Harbor, N.Y., Cold Spring Harbor Laboratory Press. Anand, R. and R. Marmorstein (2007). "Structure and mechanism of lysine-specific demethylase enzymes." J Biol Chem 282(49): 35425-9. Aoto, T., N. Saitoh, et al. (2008). "Polycomb group protein-associated chromatin is reproduced in post-mitotic G1 phase and is required for S phase progression." J Biol Chem 283(27): 18905-15. Bannister, A. J., P. Zegerman, et al. (2001). "Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain." Nature 410(6824): 120-4. Boccuni, P., D. MacGrogan, et al. (2003). "The human L(3)MBT polycomb group protein is a transcriptional and interacts physically and functionally with TEL (ETV6)." J Biol Chem 278(17): 15412-20. Botuyan, M. V., J. Lee, et al. (2006). "Structural basis for the methylation state-specific recognition of histone H4-K20 by 53BP1 and Crb2 in DNA repair." Cell 127(7): 1361- 73. Brownell, J. E., J. Zhou, et al. (1996). "Tetrahymena histone acetyltransferase A: a homolog to yeast Gcn5p linking histone acetylation to gene activation." Cell 84(6): 843-51. Chen, Y., R. Sprung, et al. (2007). "Lysine propionylation and butyrylation are novel post- translational modifications in histones." Mol Cell Proteomics 6(5): 812-9. Cheung, P., K. G. Tanner, et al. (2000). "Synergistic coupling of histone H3 phosphorylation and acetylation in response to epidermal growth factor stimulation." Mol Cell 5(6): 905-15. Collins, R. E., J. P. Northrop, et al. (2008). "The ankyrin repeats of G9a and GLP histone methyltransferases are mono- and dimethyllysine binding modules." Nat Struct Mol Biol 15(3): 245-50. Dhayalan, A., A. Rajavelu, et al. (2010). "The Dnmt3a PWWP domain reads histone 3 lysine 36 trimethylation and guides DNA methylation." J Biol Chem 285(34): 26114-20. 20

Dhayalan, A., R. Tamas, et al. (2011). "The ATRX ADD domain binds to H3 tail peptides and reads the combined methylation state of K4 and K9." Hum Mol Genet. Eustermann, S., J. C. Yang, et al. (2011). "Combinatorial readout of histone H3 modifications specifies localization of ATRX to heterochromatin." Nat Struct Mol Biol 18(7): 777-82. Gao, C., J. M. Herold, et al. (2011). "Biophysical probes reveal a "compromise" nature of the methyl-lysine binding pocket in L3MBTL1." J Am Chem Soc 133(14): 5357-62. Gurvich, N., F. Perna, et al. (2010). "L3MBTL1 polycomb protein, a candidate tumor suppressor in del(20q12) myeloid disorders, is essential for genome stability." Proc Natl Acad Sci U S A 107(52): 22552-7. Hansen, K. H., A. P. Bracken, et al. (2008). "A model for transmission of the H3K27me3 epigenetic mark." Nat Cell Biol 10(11): 1291-300. Hassa, P. O., S. S. Haenni, et al. (2006). "Nuclear ADP-ribosylation reactions in mammalian cells: where are we today and where are we going?" Microbiol Mol Biol Rev 70(3): 789- 829. He, F., T. Umehara, et al. (2010). "Structural insight into the zinc finger CW domain as a histone modification reader." Structure 18(9): 1127-39. Hoppmann, V., T. Thorstensen, et al. (2011). "The CW domain, a new histone recognition module in chromatin proteins." Embo J 30(10): 1939-52. Hosey, A. M., C. P. Chaturvedi, et al. (2010). "Crosstalk between histone modifications maintains the developmental pattern of gene expression on a tissue-specific locus." Epigenetics 5(4): 273-81. Hughes, R. M., K. R. Wiggins, et al. (2007). "Recognition of trimethyllysine by a chromodomain is not driven by the hydrophobic effect." Proc Natl Acad Sci U S A 104(27): 11184-8. Huyen, Y., O. Zgheib, et al. (2004). "Methylated lysine 79 of histone H3 targets 53BP1 to DNA double-strand breaks." Nature 432(7015): 406-11. Iwase, S., B. Xiang, et al. (2011). "ATRX ADD domain links an atypical histone methylation recognition mechanism to human mental-retardation syndrome." Nat Struct Mol Biol 18(7): 769-76. Jacobs, S. A. and S. Khorasanizadeh (2002). "Structure of HP1 chromodomain bound to a lysine 9-methylated histone H3 tail." Science 295(5562): 2080-3. Kalakonda, N., W. Fischle, et al. (2008). "Histone H4 lysine 20 monomethylation promotes transcriptional repression by L3MBTL1." Oncogene 27(31): 4293-304. Kaustov, L., H. Quyang, et al. (2010). "Recognition and specificity determinants of the human Cbx chromodomains." J Biol Chem. Kim, J., J. Daniel, et al. (2006). "Tudor, MBT and chromo domains gauge the degree of lysine methylation." EMBO Rep 7(4): 397-403. Kouzarides, T. (2007). "Chromatin modifications and their function." Cell 128(4): 693-705. Lachner, M., D. O'Carroll, et al. (2001). "Methylation of histone H3 lysine 9 creates a binding site for HP1 proteins." Nature 410(6824): 116-20. Li, H., W. Fischle, et al. (2007). "Structural basis for lower lysine methylation state-specific readout by MBT repeats of L3MBTL1 and an engineered PHD finger." Mol Cell 28(4): 677-91. Li, H., S. Ilin, et al. (2006). "Molecular basis for site-specific read-out of histone by the BPTF PHD finger of NURF." Nature 442(7098): 91-5. 21

Lo, W. S., R. C. Trievel, et al. (2000). "Phosphorylation of serine 10 in histone H3 is functionally linked in vitro and in vivo to Gcn5-mediated acetylation at lysine 14." Mol Cell 5(6): 917-26. Lu, Z., J. Lai, et al. (2009). "Importance of charge independent effects in readout of the trimethyllysine mark by HP1 chromodomain." J Am Chem Soc 131(41): 14928-31. Luger, K., T. J. Rechsteiner, et al. (1997). "Characterization of nucleosome core particles containing histone proteins made in bacteria." J Mol Biol 272(3): 301-11. Lukasik, S. M., T. Cierpicki, et al. (2006). "High resolution structure of the HDGF PWWP domain: a potential DNA binding domain." Protein Sci 15(2): 314-23. Margueron, R., N. Justin, et al. (2009). "Role of the polycomb protein EED in the propagation of repressive histone marks." Nature 461(7265): 762-7. Martin, C. and Y. Zhang (2005). "The diverse functions of histone lysine methylation." Nat Rev Mol Cell Biol 6(11): 838-49. Matthews, A. G., A. J. Kuo, et al. (2007). "RAG2 PHD finger couples histone H3 lysine 4 trimethylation with V(D)J recombination." Nature 450(7172): 1106-10. Maurer-Stroh, S., N. J. Dickens, et al. (2003). "The Tudor domain 'Royal Family': Tudor, plant Agenet, Chromo, PWWP and MBT domains." Trends Biochem Sci 28(2): 69-74. Megee, P. C., B. A. Morgan, et al. (1995). "Histone H4 and the maintenance of genome integrity." Genes Dev 9(14): 1716-27. Min, J., A. Allali-Hassani, et al. (2007). "L3MBTL1 recognition of mono- and dimethylated histones." Nat Struct Mol Biol 14(12): 1229-30. Min, J., Y. Zhang, et al. (2003). "Structural basis for specific binding of Polycomb chromodomain to histone H3 methylated at Lys 27." Genes Dev 17(15): 1823-8. Muthurajan, U. M., Y. Bao, et al. (2004). "Crystal structures of histone Sin mutant nucleosomes reveal altered protein-DNA interactions." Embo J 23(2): 260-71. Nelson, C. J., H. Santos-Rosa, et al. (2006). "Proline isomerization of histone H3 regulates lysine methylation and gene expression." Cell 126(5): 905-16. Nielsen, P. R., D. Nietlispach, et al. (2002). "Structure of the HP1 chromodomain bound to histone H3 methylated at lysine 9." Nature 416(6876): 103-7. Otani, J., T. Nankumo, et al. (2009). "Structural basis for recognition of H3K4 methylation status by the DNA methyltransferase 3A ATRX-DNMT3-DNMT3L domain." EMBO Rep 10(11): 1235-41. Pena, P. V., F. Davrazou, et al. (2006). "Molecular mechanism of histone H3K4me3 recognition by plant homeodomain of ING2." Nature 442(7098): 100-3. Perna, F., N. Gurvich, et al. (2010). "Depletion of L3MBTL1 promotes the erythroid differentiation of human hematopoietic progenitor cells: possible role in 20q- polycythemia vera." Blood 116(15): 2812-21. Rea, S., F. Eisenhaber, et al. (2000). "Regulation of chromatin structure by site-specific histone H3 methyltransferases." Nature 406(6796): 593-9. Richter, C., K. Oktaba, et al. (2011). "The tumour suppressor L(3)mbt inhibits neuroepithelial proliferation and acts on insulator elements." Nat Cell Biol. Ruthenburg, A. J., H. Li, et al. (2007). "Multivalent engagement of chromatin modifications by linked binding modules." Nat Rev Mol Cell Biol 8(12): 983-94. Sakabe, K., Z. Wang, et al. (2010). "Beta-N-acetylglucosamine (O-GlcNAc) is part of the histone code." Proc Natl Acad Sci U S A 107(46): 19915-20. 22

Schnetz, M. P., C. F. Bartels, et al. (2009). "Genomic distribution of CHD7 on chromatin tracks H3K4 methylation patterns." Genome Res 19(4): 590-601. Selenko, P., R. Sprangers, et al. (2001). "SMN tudor domain structure and its interaction with the Sm proteins." Nat Struct Biol 8(1): 27-31. Shi, X., T. Hong, et al. (2006). "ING2 PHD domain links histone H3 lysine 4 methylation to active gene repression." Nature 442(7098): 96-9. Shi, Y., F. Lan, et al. (2004). "Histone demethylation mediated by the nuclear amine oxidase homolog LSD1." Cell 119(7): 941-53. Shiio, Y. and R. N. Eisenman (2003). "Histone sumoylation is associated with transcriptional repression." Proc Natl Acad Sci U S A 100(23): 13225-30. Song, J. J., J. D. Garlick, et al. (2008). "Structural basis of histone H4 recognition by p55." Genes Dev 22(10): 1313-8. Strahl, B. D. and C. D. Allis (2000). "The language of covalent histone modifications." Nature 403(6765): 41-5. Tan, M., H. Luo, et al. (2011). "Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification." Cell 146(6): 1016-28. Taverna, S. D., H. Li, et al. (2007). "How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers." Nat Struct Mol Biol 14(11): 1025-40. Thompson, P. R. and W. Fast (2006). "Histone citrullination by protein arginine deiminase: is arginine methylation a green light or a roadblock?" ACS Chem Biol 1(7): 433-41. Trojer, P., G. Li, et al. (2007). "L3MBTL1, a histone-methylation-dependent chromatin lock." Cell 129(5): 915-28. Tsukada, Y., J. Fang, et al. (2006). "Histone demethylation by a family of JmjC domain- containing proteins." Nature 439(7078): 811-6. Turner, B. M. (2000). "Histone acetylation and an epigenetic code." Bioessays 22(9): 836-45. Vezzoli, A., N. Bonadies, et al. (2010). "Molecular basis of histone H3K36me3 recognition by the PWWP domain of Brpf1." Nat Struct Mol Biol 17(5): 617-9. Waddington, C. H. (1942). "The epigenotype." Endeavour 1(1): 18-20. Wang, W. K., V. Tereshko, et al. (2003). "Malignant brain tumor repeats: a three-leaved propeller architecture with ligand/peptide binding pockets." Structure 11(7): 775-89. Wang, Y., B. Reddy, et al. (2009). "Regulation of Set9-mediated H4K20 methylation by a PWWP domain protein." Mol Cell 33(4): 428-37. Watson, J. D. and F. H. C. Crick (2003). "Celebrating the Genetic Jubilee: A Conversation with James D. Watson." Scientific American(288): 66-69. Wismar, J., T. Loffler, et al. (1995). "The Drosophila melanogaster tumor suppressor gene lethal(3)malignant brain tumor encodes a proline-rich protein with a novel zinc finger." Mech Dev 53(1): 141-54. Wu, H., H. Zeng, et al. (2011). "Structural and histone binding ability characterizations of human PWWP domains." PLoS One 6(6): e18919. Wu, J. and M. Grunstein (2000). "25 years after the nucleosome model: chromatin modifications." Trends Biochem Sci 25(12): 619-23. Wu, S., R. C. Trievel, et al. (2007). "Human SFMBT is a transcriptional repressor protein that selectively binds the N-terminal tail of histone H3." FEBS Lett 581(17): 3289-96. 23

Xu, C., C. Bian, et al. (2010). "Binding of different histone marks differentially regulates the activity and specificity of polycomb repressive complex 2 (PRC2)." Proc Natl Acad Sci U S A 107(45): 19266-71. Yang, Y., Y. Lu, et al. (2010). "TDRD3 is an effector molecule for arginine-methylated histone marks." Mol Cell 40(6): 1016-23. Yap, K. L., S. Li, et al. "Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a." Mol Cell 38(5): 662-74. Zentner, G. E., W. S. Layman, et al. (2010). "Molecular and phenotypic aspects of CHD7 mutation in CHARGE syndrome." Am J Med Genet A 152A(3): 674-86. Zhang, L., E. E. Eugeni, et al. (2003). "Identification of novel histone post-translational modifications by peptide mass fingerprinting." Chromosoma 112(2): 77-86. Zhang, Y. and D. Reinberg (2001). "Transcription regulation by histone methylation: interplay between different covalent modifications of the core histone tails." Genes Dev 15(18): 2343-60.

24

Chapter 2

A SPOT on the chromatin landscape? Histone peptide arrays as a tool for epigenetic research

All experiments were designed and performed by the author. The plasmids for Cbx3, WDR5 and L3MBTL1 proteins were provided by Jinrong Min. Recombinant DNMT3L protein was purified by Michael S Kareta from Frederick Chédin’s laboratory.

The work described herein is published in: Nataliya Nady, Jinrong Min, Michael S Kareta, Frederick Chédin, Cheryl H Arrowsmith. A SPOT on the chromatin landscape? Histone peptide arrays as a tool for epigenetic research. Trends Biochem Sci. 2008 Jul;33(7):305-13. 25

2.1 Summary

Post-translational modifications of histones serve as docking sites and signals for effector

proteins and chromatin remodeling enzymes, thereby influencing many fundamental cellular

processes. Nevertheless, there are large gaps in our knowledge of which proteins read and write

the "histone code". A number of techniques have been used to decipher complex histone

modification patterns. However, none is entirely satisfactory due to the inherent limitations of in

vitro studies of histones, such as deficits in our knowledge of the proteins involved and the

associated difficulties in the consistent and quantitative generation of histone marks. An

alternative technique that could prove to be a useful tool in the study of the histone code is the

use of synthetic peptide arrays (SPOT blot analysis) as a screening approach to characterize

macromolecules that interact with specific covalent modifications of histone tails.

2.2 Introduction

A long history of correlative studies suggests that certain histone modifications are

associated with particular chromatin states and transcription levels, notably histone acetylation is associated with open chromatin and generally permissive for transcription (Schubeler,

MacAlpine et al. 2004; Dion, Altschuler et al. 2005). For the majority of PTMs the full biological significance and mechanism by which the signal is transmitted are poorly understood.

At the most basic level, it is important to know which histone marks are recognized by which chromatin binding proteins. Therefore, a system to quantitatively introduce a specific mark at a desired site on each histone would be a valuable research tool in chromatin biology.

It is difficult to achieve a systematic high-throughput introduction of modifications in the context of a nucleosome or recombinant histones for several reasons. First, there is no efficient 26

chemical ligation technique to chemically introduce a modification at a specific site on a

recombinant histone. Such methods are currently limited to introduction of modifications at

extreme termini (Maurer-Stroh, Dickens et al. 2003; Shogren-Knaak, Fry et al. 2003; Shogren-

Knaak 2007). Second, only a few histone marks can be generated biochemically due to lack of

knowledge about enzymes that are involved in the catalytic transfer. Although, in recent years

numerous enzymes have been identified that are responsible for catalytic transfer of methyl-, acetyl-, phospho-, SUMO-, and ubiquityl- moieties (Nelson, Santos-Rosa et al. 2006), there are many marks for which enzymes are not known. Third, for the cases in which modifying enzymes are known it is often difficult to generate a specific uniformly modified histone mark in an in

vitro system. Inherently, many of the enzymes such as histone acetyltransferases and

methyltransferases function as part of larger multiprotein complexes that may be difficult to

reconstitute in vitro. In addition, even when the enzyme is available, often full reactivity is dependent on a prior mark elsewhere on the histone or a neighboring histone which itself may be difficult to generate. Finally, the reaction is often difficult to drive to completion at a single site, and multiple sites may be modified by the same enzyme, resulting in a mixture of modified

histones. All these factors complicate the systematic generation of specifically modified histones.

In 2007 Simon and coworkers have developed an elegant method of chemically introducing site-

specific methylated lysines into recombinant histones (Simon, Chu et al. 2007). More recently a method has been developed using a combination of genetic and chemical manipulations in which the new tRNA synthetase/tRNA pair is used to install lysine acetylation, methylation and ubiquitylation (Neumann, Peak-Chew et al. 2008; Nguyen, Garcia Alai et al. 2009; Nguyen,

Garcia Alai et al. 2010; Virdee, Ye et al. 2010). However, these techniques are costly, labour- intensive and difficult to implement in a high-throughput environment. 27

Due to the difficulties in generating specifically modified recombinant histones and

nucleosomes, synthetic histone peptides containing one or more modifications are often used as

surrogates, especially for binding, structural and specificity studies. This strategy is appropriate

because most marks are found on the natively disordered N-terminal tails of the four core

histones. Because the tails contain no appreciable secondary or tertiary structure in the absence

of binding partners, short peptides are likely to have similar conformational properties as the

same sequence in the context of the native protein. In general, the domains of “reader” proteins

bind to a short specific histone sequence in a modification-dependent manner. Therefore,

synthetic peptides, ten to fifteen residues in length and containing one or more specific covalent

modifications, should be a good surrogate to study the interaction of histones with reader

proteins, and possibly chromatin remodeling enzymes. There are also a small but growing number of modifications reported within the central “histone fold domain” (Hyland, Cosgrove et al. 2005; Dialynas, Makatsori et al. 2006) and on the linker histone H1b (Kuzmichev, Margueron et al. 2005; Wisniewski, Zougman et al. 2007).

The versatility of synthetic peptides enables one to consider high throughput and/or parallel strategies to screen for protein–peptide interactions. Kim et al. described a protein array approach that involves pre-selected immobilized GST fusion proteins probed with fluorescently tagged methylated histone peptides (Botuyan, Lee et al. 2006). This method allows one to screen

for which of the immobilized proteins recognize a given histone mark. However, in practice, the

method may be limited by the integrity and number of proteins that can be surface immobilized

on a chip. An alternative strategy was recently described by Wysocka in which a biotinylated

histone peptide is used to “pull down” histone reader proteins from a cell lysate (Li, Ilin et al.

2006). 28

In order to decipher complex histone modification patterns I utilized the technique that

has been available for some time (Frank 2002) and with several alterations can serve as a primary screening tool in epigenetics research. Using SPOT peptide arrays all possible (or a subset of) histone modifications can readily be screened for characterizing reagents for

chromatin research (such as antibodies), mapping the specificity of histone binding domains, and

the identification of potential new histone marks. I present examples of SPOT blot assays that

demonstrate the utility of this strategy as a validated and useful technique in chromatin research.

2.3 Results

2.3.1 SPOT blotting

The strength of the SPOT peptide assay is its unbiased, comprehensive and systematic

approach to evaluate a given protein for binding to histone tails and up to nine different

modification types at all possible residues in both tail and folded regions. The principle of such a

screen is illustrated in Figure 2.1. Peptides are synthesized directly on a cellulose membrane with

a polyethylglycol linker. Then, the membrane is extensively blocked followed by incubation with

a protein of interest. After a washing step, the presence of bound protein is detected via Western

blot analysis or by fluorescent detection.

The efficiency and quality of the synthesis of peptides on the membrane using standard

L-amino acids has been validated by mass spectrometry (Frank 2002; Hilpert, Winkler et al.

2007). Peptides up to 50 amino acids in length can be synthesized using SPOT synthesis, however the optimal range is between 6 and 18 amino acids and greatly depends on the sequence

(Toepert, Knaute et al. 2003). In order to evaluate the efficiency of incorporation of bulkier modified amino acids into histone peptides we made use of α-Lys(Ac), a primary antibody that

29

a. Array of peptides immobilized on a b. Protein/domain of interest with cellulose membrane immuno-reactive tag

HIS HIS PHD Bromo

HIS MBT Chromo

+ HIS

HIS Tudor

WD-40 HIS

HIS Tudor

d. Visualization using c. Protein-peptide complexes Western Blot

Figure 2.1. Schematic representation of the SPOT binding assay. (a) The array of histone peptides with desired modifications is synthesized on the membrane. (b) The recombinant protein or domain of interest, tagged with an immunoreactive sequence (in this case a polyhistidine tag) is produced. (c) The membrane is incubated with the protein and (d) visualized exploiting Western blot techniques and a primary antibody against the tag. Positive interactions appear as a dark spot on the blot. One or more spots with a peptide comprising the immunoreactive sequence (indicated by an H for His tag in the figure) are used as internal positive control. 30

recognizes acetylated lysines in a non-sequence specific manner (Fig. 2.2). A membrane was

prepared in which a total of 36 out of 176 peptides contained acetylated lysines at various

positions (Fig. 2.2a). Probing with α-Lys(Ac) revealed 33 reactive peptides in the expected

positions on the membrane (Fig. 2.2b). One unacetylated peptide (dimethylated lysine at position

5 on ) was also apparently detected (spot A23). It is not clear whether this was due

to a problem in the synthesis of the peptide or an unexpected, non-specific activity of the primary

antibody. Three potentially acetylated peptides, all from the same region of histone H2B

(residues 7-18; spots A27, 28, and 30 in Fig. 2.2), did not react with the α-Lys(Ac). These peptides contained two prolines in proximity to each other which may have reduced the efficiency of coupling during the synthesis, or may induce an unusual peptide conformation that is not recognized by the antibody. Nevertheless, based on the number of correct spots detected

by α-Lys(Ac), the synthesis appears to have worked well and for the most part acetyl-lysines

were incorporated into the desired peptides.

2.3.2 Application 1: Characterization of reagents for chromatin research

Antibodies are routinely used in chromatin and epigenetic research for ChIP assays,

ELISA, flow cytometry, Western Blot analysis, immunofluorescence, immunocytochemistry,

immunohistochemistry, and immunoprecipitation. The antibodies against several “popular”

marks (ex. H3 K4me3, H3 K9me3, etc.) are used repeatedly and have reliable specificity, but

those that are newer or utilized less often may not be fully characterized for cross-reactivity,

especially when there are several regions of histones that have identical or similar sequence, such

as around K9 and K27 on histone H3. Knowledge of the binding potential of antibodies for

multiple histone marks will help avoid misinterpretation of data or possibly reveal the need for

31

a

α –Lys(Ac)

b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A

B

C

D E

F α –Lys(Ac)

Figure 2.2. Confirmation of complete synthesis of modified histone peptides. An array of histone peptides with all known modifications was probed with the monoclonal antibody against acetylated lysine residues (sourced from Cell Signaling Technology). (a) Yellow squares indicate the 36 theoretical sites with acetylated lysines. (b) Black spots indicate the presence of an antibody obtained experimentally. Although the majority of the resultant spots correlate with the expected modifications, it can be seen that three acetylated sites (A27, A28 and A30) are missing (circled in green), and one position was falsely identified (A23; circled in cyan). 32

more specific reagents. We have tested a primary antibody against a documented, well-

researched mark - phosphorylated serine at position 10 on histone H3 - against the array of all known core histone modifications (Fig. 2.3a). On the membrane there were two peptides containing H3 S10ph, one with modification closer to the C-terminus, and the other to the N-

terminus. As shown in Figure 2.3b, this antibody was very specific and bound to both peptides,

as expected. On the other hand, an antibody marketed for use against phosphorylated threonine at position 11 on histone H3, was much less specific (Fig. 2.3c).

2.3.3 Application 2: Mapping the specificity of histone binding domains

In order to demonstrate the value of using the histone peptide array to understand the

specificity of histone binding proteins we investigated two proteins for which detailed structural

and specificity data are available.

CBX3, a chromodomain containing protein, is a human homolog of D. melanogaster

HP1γ which has been shown to specifically recognize di- and trimethylated K9 on histone H3

(Jacobs, Taverna et al. 2001; Jacobs and Khorasanizadeh 2002; Bernstein, Mikkelsen et al.

2006). Figure 2.4a demonstrates that indeed, the chromodomain of CBX3 specifically recognized

di- and trimethylated K9. However, interactions with peptides containing acetylated K9, and

phosphorylated S10 and T11 were also observed. The binding to these non-methyl lysine

containing peptides was unexpected, and although potentially very interesting, require validation

by other means beyond the scope of this thesis. In the case of S10ph, the result is in conflict with

published data for HP1γ (Fischle, Tseng et al. 2005), and therefore is likely to be a false positive result. Nevertheless, the peptide array was still able to identify the correct region of H3 and the two peptides that contained di- and trimethylated K9. Thus, in this case, the array would have

33

a

b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A B C α – H3 S10ph D E F H3 residues 1-13, S10ph H3 residues 7-20, S10ph

c 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A 1 B 2 3 C α – H3 T11ph D E F 4 2H3 residues 1-13, various modifications 1H2B residues 104-114, all modifications 4H3 residues 21-34, T32ph 3H3 residues 7-20, various modifications

Figure 2.3. Characterization of reagents for chromatin research. (a) The peptide array contained a series of ~12-residue sequences from core histones H2A (white), H2B (gray), H3 (cyan) and H4 (green), containing known sites of modification. The asterisk denotes the start of a new peptide series with numbers corresponding to the modified residues; poly-his and poly-ser peptides are highlighted in red. Such array with all known modifications was probed with antibodies (sourced from Abcam) against (b) H3 S10ph and (c) H3 T11ph and the experimental results are shown. α-H3 S10ph recognized two peptides as expected, but α-H3 T11ph had broader reactivity (boxed spots) and was able to associate with the entire N-terminal end of histone H3 in a modification-independent manner (C4–19, C24-D5). In addition, it bound to a H2B peptide spanning the region of residues 104–114 of the histone, regardless of modification state, and recognized H3 T32ph. 34

served as a good initial screen to identify candidate marks for further investigation out of the hundreds of potential sequences and marks.

To further assess the use of the peptide arrays we tested WDR5, a WD-40 domain protein that binds to the N-terminus of H3 and presents H3 K4me2 for further methylation by the methyltransferase MLL1 (Dou, Milne et al. 2005). The 3D structure and mechanism of binding

for this protein have been extensively studied (Couture, Collazo et al. 2006; Huang, Fang et al.

2006; Ruthenburg, Wang et al. 2006; Schuetz, Allali-Hassani et al. 2006). The WD-40 domain

binds specifically to Arg2 which acts to anchor the H3 peptide across the “top” of the WD-40

domain thereby presenting Lys4 for further interactions with the methyltransferase. Indeed, the

peptide screen showed that WDR5 was able to recognize all peptides from the N-terminus of

histone H3 regardless of modification state, as long as they contained unmodified arginine at

position 2 (Fig. 2.4b). Binding specificity of WDR5 is a compelling example for usage of the

peptide array as a preliminary screening tool to reconcile the specificity of a histone mark

“reader” protein. This result is consistent with more extensive structural and biophysical studies

showing that the methylation state of Lys4 has only minor affects on the affinity of the H3

peptide for WDR5, and that Arg2 and other sequence features are major determinants of the

binding affinity (Couture, Collazo et al. 2006; Huang, Fang et al. 2006; Ruthenburg, Wang et al.

2006; Schuetz, Allali-Hassani et al. 2006). Furthermore, this array experiment is consistent with

the discovery that methylation of H3R2 prevents methylation of H3K4 by WDR5-containing

MLL methyltransferase complexes (Couture, Collazo et al. 2006; Guccione, Bassi et al. 2007;

Kirmizis, Santos-Rosa et al. 2007). Similarly, histone peptide array screening of DNMT3L was

consistent with the demonstration that H3 K4me is a repulsive mark, preventing the de novo

DNA methylation coupled with Lys4-methylated histone H3 (Fig. 2.4c) (Ooi, Qiu et al. 2007).

35

a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A B C α –6xHIS D CBX3 E F

H3 residues 1-13

d 1 s a h 1 2 3 1 2 3 c h h

e e 2 2 p e e e e e e a p p fi e e i 3 m m m m m m 0 1 m d T K9 1 2 m m o S1 T 2 2 R K4 K4 K4 K9 K9 K9 m R R n

u

b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A B C α –6xHIS D E WDR5 F

H3 residues 1-13

d 1 s a h 1 2 3 1 2 3 c h h

e e 2 2 p e e e e e e a p p fi e i e 3 m m m m m m 0 1

d m T K9 1 m o 2 m S1 T 2 2 R K4 K4 K4 K9 K9 K9 m R R n

u

c α –6xHIS DNMT3L

H3 residues 1-13

d 1 s a h 1 2 3 1 2 3 c h h

e e 2 2 p e e e e e e a p p fi e i e 3 m m m m m m 0 1

d m T K9 1 m o 2 m S1 T 2 2 R K4 K4 K4 K9 K9 K9 m R R n

u

Figure 2.4. Mapping the specificity of histone-binding domains. Probing an array of peptides containing known sites of modifications on core histones with proteins of known binding specificity; positive controls (poly-histidine peptides) are circled in red. (a) Binding profile of human homolog of Drosophila HP1γ chromo domain. As highlighted by the callout, strong binding is observed to the N-terminus of histone H3 with preference for di- and trimethylated lysine over monomethylated lysine at Lys9; some other peptides were recognized also. (b) The WD-40 domain of WDR5 shows preferential binding to the N terminus of histone H3. Also highlighted in a callout, strong recognition of peptides containing unmodified arginine at position 2, with additional recognition of methylation state of Lys4 is observed. (c) The DNMT3L protein shows binding to the N-terminal histone H3 tail, residues 1–13, when Thr3 and Lys4 are unmodified. The modification state on the flanking residues, including Lys9, had no effect on the binding. 36

2.3.4 Application 3: Demonstration of non-sequence specific interactions

The peptide array screening approach is also a valuable tool for identifying proteins or domains that have broad or unclear binding specificity, in which case use of standard biophysical approaches and soluble peptides would be expensive, timely, and/or labor intensive.

An example of this situation is demonstrated for the protein L3MBTL1 which contains

Malignant Brain Tumor (MBT) repeats, a domain of approximately 100 amino acids. Previous studies have shown that MBT repeat proteins can recognize mono- and dimethylated lysine marks (Botuyan, Lee et al. 2006; Klymenko, Papp et al. 2006; Trojer, Li et al. 2007) although the degree of specificity is not clear. Also, it was shown that purified MBT domains of L3MBTL1 were capable of compacting purified nucleosomal arrays from a “beads on a string” arrangement, into a more condensed structure with nucleosomes touching one another. This activity was dependent on mono- and dimethylation of histone H4 K20 and histone H1b K26 (Trojer, Li et al.

2007). In addition, several structures of the MBT repeats in a complex with H4 K20me1/2, H1.5

K26me2 and p53 K382me1 peptides have been reported (Li, Fischle et al. 2007; Min, Allali-

Hassani et al. 2007; West, Roy et al. 2010). Despite these studies several questions regarding the molecular basis for transcriptional repression and binding specificity still remain. Quantitative binding affinities for several mono- and di- methyllysine peptides were very similar in magnitude pointing towards low sequence specificity, and binding data for H3 K4me and H3

K9me was not consistent between ITC/fluorescence polarization and pull down assays (Li,

Fischle et al. 2007; Min, Allali-Hassani et al. 2007; Trojer, Li et al. 2007). We screened the domain of L3MBTL1 containing 3 MBT repeats for binding to histone peptides with all possible

Lys, Arg, Ser, and Thr residues modified, for a total of 554 peptides (Fig. 2.5a).

37 a

b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A B C D E F G H I α –6xHIS J L3MBTL1 K L M N O P Q R S T

H3 residues 7-20

s d 1 s a 1 2 3 c h h 1 2 3 c 1 a 1 2 3 c 2 e e 2 2 e e e a p e e e a e 2 e e e a p fi e e e i e m m m 0 1 m m m 4 m m m m 8

d m m K9 1 m m o 8 m 4 4 4 7 8 8 8 S1 T K1 7 K1 8 8 1 7 R K9 K9 K9 m 1 1 R R K1 K1 K1 R K1 K1 K1 n R R u 38

Figure 2.5. Demonstration of non-sequence-specific interactions. (a) The peptide array contained a series of ~12-residue sequences from core histones H2A (white), H2B (gray), H3 (cyan) and H4 (green), containing all possible lysines, arginines, and threonines modified. The asterisk denotes the start of a new peptide series with numbers corresponding to the modified residues (see Appendix); poly-his and poly-ser peptides are highlighted in red. (b) Such array of peptides containing all possible modifications was probed with the three-MBT- repeat domain of HIS-L3MBTL1; controls (poly-histidine peptides) are circled in red. All mono- and some di-methylated lysines seem to be bound more strongly and are recognized by the HIS- L3MBTL1 protein in a non-sequence-specific manner. ______

The peptide array shows that the MBT repeats of L3MBTL1 are capable of binding all histone peptides that contain a monomethylated and some dimethylated lysines apparently in a non-sequence specific manner (Fig. 2.5b). This is consistent with structural studies (Li, Fischle et al. 2007; Min, Allali-Hassani et al. 2007; West, Roy et al. 2010) showing that the architecture of the binding pocket is specific for mono- and dimethylated lysines, but not unmodified or trimethylated lysines. Moreover, the H4 K20 peptides in the crystal structures had minimal contact with the protein outside the methyllysine binding pocket, explaining the lack of sequence specificity of L3MBTL1 (Li, Fischle et al. 2007; Min, Allali-Hassani et al. 2007). Thus, in this case, the SPOT assay correctly identified the lack of sequence specificity, which would have required costly and labour intensive experiments to demonstrate using standard biophysical methods and soluble peptides.

2.4 Discussion

We believe that the histone peptide array assay provides a straightforward approach for the characterization of reagents for chromatin research, discovery of novel interactions between the proteins of interest and histone marks, and can reveal mechanistic details underlying the molecular recognition. As an initial screen to evaluate the binding specificity of a given protein 39

or domain, the peptide array assay is highly time and labor efficient. The method is cost

effective, especially when compared to generating the equivalent number of soluble peptides.

Furthermore, the membrane can be stripped of interacting proteins and reused for up to two

additional binding experiments without compromising results, further reducing costs (Kramer,

Reineke et al. 1999; Hyland, Cosgrove et al. 2005). The method is very sensitive and is capable

of detecting even weak peptide–antibody interactions with dissociation constants as high as 10-3 to 10-4 M (Reineke, Sabat et al. 1996; Kramer, Reineke et al. 1999). Although the results

obtained using the SPOT technology have been shown to be reliable in our hands, as with any

screening method, it should be used as a preliminary screening tool and positive results need to

be validated by additional biophysical methods to confirm results or eliminate false positives.

Overall in our experience with over 50 membrane screens for more than 15 different proteins

with subsequent follow-up (or literature data available), the false-negative rate appears to be very

low. The false-positive rate is higher, but appears to be tolerable in that the number of validation

experiments needed is usually experimentally manageable. For example, in the case of CBX3, a

total of 6 different peptide validation experiments would be needed to confirm H3 K9me3 as a

bona fide target mark; two H3 K9 peptide binding experiment (K9me2 and K9me3), plus three

more to either confirm or rule out binding to the three potential false positives, and one

additional negative control peptide that did not bind on the array. Six experiments are very

tractable compared to potentially hundreds, if the array had not been used.

The membrane can be designed to screen for not only histone residues that are currently

known to be modified, but also those that have yet to be discovered. Peptide arrays are useful for

screening a library of “effector” proteins and enzymes against all histone PTMs, as opposed to

testing only “popular” marks. In addition, modifications on the histone core domain and newly 40

discovered types of modifications like the acetylation of serine and threonine in bacteria

(Mukherjee, Keitany et al. 2006; Mukherjee, Hao et al. 2007), can be easily introduced into the histone peptide arrays for testing of novel protein binding modules.

Furthermore, there is growing appreciation that the histone code is far more complex than originally thought (Esteller 2006; Krebs 2007; Ruthenburg, Li et al. 2007; Fuchs, Krajewski et al. 2011). The specificity determinants of histone reader and writer proteins are not as straightforward as recognition of a single mark, as in case of HP1γ (Jacobs, Taverna et al. 2001;

Jacobs and Khorasanizadeh 2002; Bernstein, Mikkelsen et al. 2006). There is increasing evidence for the concept of combinatorial recognition of histone modifications (Young,

Dimaggio et al. 2010). Using this approach, dual and multiple modifications on the same peptide can be introduced, to explore the synergistic interplay between marks (Garske, Oliver et al. 2010;

Fuchs, Krajewski et al. 2011; Nady, Lemak et al. 2011). Thus, initial screens such as those discussed here offer a tractable strategy to quickly and efficiently assess potential in vitro histone binding specificities, as a first step in many aspects of chromatin research including, specificity mapping of readers and writers, structural biology, discovery of new histone marks and the generation and testing of research reagents.

2.5 Materials and Methods

Design and synthesis of the peptides

The library of histone peptides used in this study contained all modifications known to date as recorded in the Abcam comprehensive human histone modification map

(http://www.abcam.com/assets/pdf/chromatin/histone_modification_map_human.pdf, last accessed Dec. 14, 2006). Modifications were positioned no closer than three residues from the 41

termini of each peptide with the exception of modifications naturally located very close to

histone N- or C- termini, such as H2A S1ph.

The peptides had either no modifications or contained only one of the following marks:

mono-, di-, or trimethylated lysine, monomethylated arginine, symmetrically or asymmetrically

dimethylated arginine, acetylated lysine, phosphorylated serine or threonine. A poly-serine

peptide was used as an internal negative control. A poly-histidine peptide served as internal

positive control to confirm suitable buffer conditions for the binding studies, specificity of

primary poly-His antibody, and straightforward orientation of the membrane when interpreting

the results.

Two types of membranes were synthesized. One type contained all known modifications to a total of 176 peptides from the core histones, H2A, H2B, H3 and H4. The peptides were

arranged on a membrane in a specific manner (Fig. 2.3a, Table 2.1) and tested for complete

synthesis, antibody specificity, and binding with a recombinant human homolog of HP1γ, WDR5 and DNMT3L. A second membrane was used to screen modifications of all lysines, arginines,

serines and threonines within core histones. In total, 554 unique peptides were synthesized (Fig.

2.5a, Table 2.2) and tested for binding to the MBT domains of L3MBTL1. The complete list of

peptides with their corresponding sequences for both types of membranes is provided in

Appendix.

Peptide synthesis and validation

Peptides were synthesized using an Intavis MultiPep SPOT peptide arrayer using

commercially available standard (Intavis) and modified (Bachem) L-amino acid precursors, and

membranes (Intavis). The synthesized peptides were 8 to 14 amino acids in length and contained 42 Table 2.1. List of peptides with known modifications used in SPOT-blot synthesis.

Controls Histone - H2B cont’d Histone – H3 cont’d Identity Modifications Peptide # Identity Modifications Peptide # Identity Modifications Peptide # 1 - 12 poly-HIS + 94 - 104 R99 - monoMe B21 21 - 34 none D18 1 - 12 poly-Val - R99 - sym diMe B22 21 - 34 K23 - monoMe D19 R99 - asym diMe B23 K23 - diMe D20 Histone - H2A 104 - 114 none B24 K23 - triMe D21 Identity Modifications Peptide # 104 - 114 K108 - monoMe B25 K23 - ac D22 1 - 12 none A1 K108 - diMe B26 R26 - monoMe D23 1 - 12 S1 - phosph A2 K108 - triMe B27 R26 - sym diMe D24 K5 - ac A3 K108 - ac B28 R26 - asym diMe D25 K9 - ac A4 112 - 123 none B29 K27 - monoMe D26 7 - 18 none A5 112 - 123 K116 - ac B30 K27 - diMe D27 7 - 18 K9 - ac A6 K120 - ac C1 K27 - triMe D28 K13 - ac A7 K27 - ac D29 K15 - ac A8 Histone - H3 S28 - phosph D30 31 - 41 none A9 Identity Modifications Peptide # T32 - phosph E1 31 - 41 K36 - ac A10 1 - 13 none C4 33 - 41 none E2 92 - 103 none A11 1 - 13 R2 - monoMe C5 33 - 41 K36 - monoMe E3 92 - 103 K95 - monoMe A12 R2 - sym diMe C6 K36 - diMe E4 K95 - diMe A13 R2 - asym diMe C7 K36 - triMe E5 K95 - triMe A14 T3 - phosph C8 K37 - monoMe E6 K99 - monoMe A15 K4 - monoMe C9 K37 - diMe E7 K99 - diMe A16 K4 - diMe C10 K37 - triMe E8 K99 - triMe A17 K4 - triMe C11 51 - 61 none E9 114 - 124 none A18 K9 - monoMe C12 51 - 61 K56 - monoMe E10 114 - 124 K119 - ac A19 K9 - diMe C13 K56 - diMe E11 T120 - phosph A20 K9 - triMe C14 K56 - triMe E12 K9 - ac C15 K56 - ac E13 Histone - H2B S10 - phosph C16 74 - 84 none E14 Identity Modifications Peptide # T11 - phosph C17 K79 - monoMe E15 1 - 9 none A21 7 - 20 none C18 K79 - diMe E16 1 - 9 K5 - monoMe A22 7 - 20 K9 - monoMe C19 K79 - triMe E17 K5 - diMe A23 K9 - diMe C20 111 - 120 none E18 K5 - triMe A24 K9 - triMe C21 111 - 120 K115 - ac E19 K5 - ac A25 K9 - ac C22 T118 - phosph E20 7 - 18 none A26 S10 - phosph C23 119 - 127 none E21 7 - 18 K11 - ac A27 T11 - phosph C24 119 - 127 K122 - monoMe E22 K12 - ac A28 K14 - monoMe C25 K122 - diMe E23 S14 - phosph A29 K14 - diMe C26 K122 - triMe E24 K15 - ac A30 K14 - triMe C27 K122 - ac E25 K16 - ac B1 K14 - ac C28 124 - 135 none E26 17 - 26 none B2 R17 - monoMe C29 124 - 135 R128 - monoMe E27 17 - 26 K20 - ac B3 R17 - sym diMe C30 R128 - sym diMe E28 K23 - monoMe B4 R17 - asym diMe D1 R128 - asym diMe E29 K23 - diMe B5 K18 - monoMe D2 K23 - triMe B6 K18 - diMe D3 Histone - H4 K24 - ac B7 K18 - triMe D4 Identity Modifications Peptide # 28 - 39 none B8 K18 - ac D5 1 - 11 none F2 28 - 39 S32 - phosph B9 15 - 25 none D6 1 - 11 S1 - phosph F3 S36 - phosph B10 15 - 25 R17 - monoMe D7 R3 - monoMe F4 39 - 49 none B11 R17 - sym diMe D8 R3 - sym diMe F5 39 - 49 K43 - monoMe B12 R17 - asym diMe D9 R3 - asym diMe F6 K43 - diMe B13 K18 - monoMe D10 K5 - ac F7 K43 - triMe B14 K18 - diMe D11 K8 - ac F8 K46 - monoMe B15 K18 - triMe D12 10 - 22 none F9 K46 - diMe B16 K18 - ac D13 10 - 22 K12 - monoMe F10 K46 - triMe B17 K23 - monoMe D14 K12 - diMe F11 80 - 89 none B18 K23 - diMe D15 K12 - triMe F12 80 - 89 K85 - ac B19 K23 - triMe D16 K12 - ac F13 94 - 104 none B20 K23 - ac D17 K16 - ac F14 43 Table 2.1 – continued

Histone – H4 cont’d Identity Modifications Peptide # 10 - 22 K20 - monoMe F15 K20 - diMe F16 K20 - triMe F17 K20 - ac F18 42 - 52 none F19 42 - 52 S47 - phosph F20 56 - 65 none F21 56 - 65 K59 - monoMe F22 K59 - diMe F23 K59 - triMe F24 73 - 83 none F25 73 - 83 K77 - ac F26 K79 - monoMe F27 K79 - diMe F28 K79 - triMe F29 K79 - ac F30

Table 2.2. List of peptides used in SPOT-blot synthesis. 44

Controls Histone - H2A cont’d Histone - H2A cont’d Identity Modifications Peptide # Identity Modifications Peptide # Identity Modifications Peptide # 1 - 12 poly-His + 25 - 38 R29 - asym diMe B25 116 - 129 K118 - diMe D24 1 - 12 poly-Val - R32 - monoMe B26 K118 - triMe D25 R32 - sym diMe B27 K118 - ac D26 Histone - H2A R32 - asym diMe B28 K119 - monoMe D27 Identity Modifications Peptide # R35 - monoMe B29 K119 - diMe D28 1 - 12 none A1 R35 - sym diMe B30 K119 - triMe D29 1 - 12 S1 - phosph A2 R35 - asym diMe C1 K119 - ac D30 R3 - monoMe A3 K36 - monoMe C2 T120 - phosph E1 R3 - sym diMe A4 K36 - diMe C3 S122 - phosph E2 R3 - asym diMe A5 K36 - triMe C4 K125 - monoMe E3 K5 - monoMe A6 K36 - ac C5 K125 - diMe E4 K5 - diMe A7 33 - 45 none C6 K125 - triMe E5 K5 - triMe A8 33 - 45 R35 - monoMe C7 K125 - ac E6 K5 - ac A9 R35 - sym diMe C8 K127 - monoMe E7 K9 - monoMe A10 R35 - asym diMe C9 K127 - diMe E8 K9 - diMe A11 K36 - monoMe C10 K127 - triMe E9 K9 - triMe A12 K36 - diMe C11 K127 - ac E10 K9 - ac A13 K36 - triMe C12 K129 - monoMe E11 R11 - monoMe A14 K36 - ac C13 K129 - diMe E12 R11 - sym diMe A15 S40 - phosph C14 K129 - triMe E13 R11 - asym diMe A16 R42 - monoMe C15 K129 - ac E14 10 - 21 none A17 R42 - sym diMe C16 10 - 21 R11 - monoMe A18 R42 - asym diMe C17 Histone - H2B R11 - sym diMe A19 70 - 83 none C18 Identity Modifications Peptide # R11 - asym diMe A20 70 - 83 R71 - monoMe C19 1 - 13 none E17 K13 - monoMe A21 R71 - sym diMe C20 1 - 13 K5 - monoMe E18 K13 - diMe A22 R71 - asym diMe C21 K5 - diMe E19 K13 - triMe A23 K74 - monoMe C22 K5 - triMe E20 K13 - ac A24 K74 - diMe C23 K5 - ac E21 K15 - monoMe A25 K74 - triMe C24 S6 - phosph E22 K15 - diMe A26 K74 - ac C25 K11 - monoMe E23 K15 - triMe A27 K75 - monoMe C26 K11 - diMe E24 K15 - ac A28 K75 - diMe C27 K11 - triMe E25 T16 - phosph A29 K75 - triMe C28 K11 - ac E26 R17 - monoMe A30 K75 - ac C29 K12 - monoMe E27 R17 - sym diMe B1 T76 - phosph C30 K12 - diMe E28 R17 - asym diMe B2 R77 - monoMe D1 K12 - triMe E29 S18 - phosph B3 R77 - sym diMe D2 K12 - ac E30 S19 - phosph B4 R77 - asym diMe D3 7 - 18 none F1 R20 - monoMe B5 R81 - monoMe D4 7 - 18 K11 - monoMe F2 R20 - sym diMe B6 R81 - sym diMe D5 K11 - diMe F3 R20 - asym diMe B7 R81 - asym diMe D6 K11 - triMe F4 14 - 27 none B8 83 - 93 none D7 K11 - ac F5 14 - 27 K15 - monoMe B9 83 - 93 R88 - monoMe D8 K12 - monoMe F6 K15 - diMe B10 R88 - sym diMe D9 K12 - diMe F7 K15 - triMe B11 R88 - asym diMe D10 K12 - triMe F8 K15 - ac B12 93 - 103 none D11 K12 - ac F9 T16 - phosph B13 93 - 103 K95 - monoMe D12 S14 - phosph F10 R17 - monoMe B14 K95 - diMe D13 K15 - monoMe F11 R17 - sym diMe B15 K95 - triMe D14 K15 - diMe F12 R17 - asym diMe B16 K95 - ac D15 K15 - triMe F13 S18 - phosph B17 K99 - monoMe D16 K15 - ac F14 S19 - phosph B18 K99 - diMe D17 K16 - monoMe F15 R20 - monoMe B19 K99 - triMe D18 K16 - diMe F16 R20 - sym diMe B20 K99 - ac D19 K16 - triMe F17 R20 - asym diMe B21 T101 - phosph D20 K16 - ac F18 25 - 38 none B22 103 - 115 none D21 17 - 26 none F19 25 - 38 R29 - monoMe B23 116 - 129 none D22 17 - 26 T19 - phosph F20 R29 - sym diMe B24 116 - 129 K118 - monoMe D23 K20 - monoMe F21

Table 2.2 – continued 45

Histone - H2B cont’d Histone - H2B cont’d Histone – H3 cont’d Identity Modifications Peptide # Identity Modifications Peptide # Identity Modifications Peptide # 17 - 26 K20 - diMe F22 41 - 53 K46 - ac H21 1 - 13 K4 - monoMe J19 K20 - triMe F23 T52 - phosph H22 K4 - diMe J20 K20 - ac F24 81 - 93 none H23 K4 - triMe J21 K23 - monoMe F25 81 - 93 K85 - monoMe H24 K4 - ac J22 K23 - diMe F26 K85 - diMe H25 T6 - phosph J23 K23 - triMe F27 K85 - triMe H26 R8 - monoMe J24 K23 - ac F28 K85 - ac H27 R8 - sym diMe J25 K24 - monoMe F29 R86 - monoMe H28 R8 - asym diMe J26 K24 - diMe F30 R86 - sym diMe H29 K9 - monoMe J27 K24 - triMe G1 R86 - asym diMe H30 K9 - diMe J28 K24 - ac G2 S87 - phosph I1 K9 - triMe J29 25 - 39 none G3 T88 - phosph I2 K9 - ac J30 25 - 39 K27 - monoMe G4 T90 - phosph I3 S10 - phosph K1 K27 - diMe G5 S91 - phosph I4 T11 - phosph K2 K27 - triMe G6 R92 - monoMe I5 7 - 20 none K3 K27 - ac G7 R92 - sym diMe I6 7 - 20 R8 - monoMe K4 K28 - monoMe G8 R92 - asym diMe I7 R8 - sym diMe K5 K28 - diMe G9 94 - 107 none I8 R8 - asym diMe K6 K28 - triMe G10 94 - 107 T96 - phosph I9 K9 - monoMe K7 K28 - ac G11 R99 - monoMe I10 K9 - diMe K8 R29 - monoMe G12 R99 - sym diMe I11 K9 - triMe K9 R29 - sym diMe G13 R99 - asym diMe I12 K9 - ac K10 R29 - asym diMe G14 106 - 118 none I13 S10 - phosph K11 K30 - monoMe G15 106 - 118 K108 - monoMe I14 T11 - phosph K12 K30 - diMe G16 K108 - diMe I15 K14 - monoMe K13 K30 - triMe G17 K108 - triMe I16 K14 - diMe K14 K30 - ac G18 K108 - ac I17 K14 - triMe K15 R31 - monoMe G19 S112 - phosph I18 K14 - ac K16 R31 - sym diMe G20 T115 - phosph I19 R17 - monoMe K17 R31 - asym diMe G21 K116 - monoMe I20 R17 - sym diMe K18 S32 - phosph G22 K116 - diMe I21 R17 - asym diMe K19 R33 - monoMe G23 K116 - triMe I22 K18 - monoMe K20 R33 - sym diMe G24 K116 - ac I23 K18 - diMe K21 R33 - asym diMe G25 114 - 125 none I24 K18 - triMe K22 K34 - monoMe G26 T115 - phosph I25 K18 - ac K23 K34 - diMe G27 K116 - monoMe I26 15 - 25 none K24 K34 - triMe G28 K116 - diMe I27 15 - 25 R17 - monoMe K25 K34 - ac G29 K116 - triMe I28 R17 - sym diMe K26 S36 - phosph G30 K116 - ac I29 R17 - asym diMe K27 S38 - phosph H1 T119 - phosph I30 K18 - monoMe K28 35 - 47 none H2 K120 - monoMe J1 K18 - diMe K29 35 - 47 S36 - phosph H3 K120 - diMe J2 K18 - triMe K30 S38 - phosph H4 K120 - triMe J3 K18 - ac L1 K43 - monoMe H5 K120 - ac J4 T22 - phosph L2 K43 - diMe H6 T122 - phosph J5 K23 - monoMe L3 K43 - triMe H7 S123 - phosph J6 K23 - diMe L4 K43 - ac H8 S124 - phosph J7 K23 - triMe L5 K46 - monoMe H9 K125 - monoMe J8 K23 - ac L6 K46 - diMe H10 K125 - diMe J9 21 - 33 none L7 K46 - triMe H11 K125 - triMe J10 21 - 33 T22 - phosph L8 K46 - ac H12 K125 - ac J11 K23 - monoMe L9 41 - 53 none H13 K23 - diMe L10 41 - 53 K43 - monoMe H14 Histone - H3 K23 - triMe L11 K43 - diMe H15 Identity Modifications Peptide # K23 - ac L12 K43 - triMe H16 1 - 13 none J14 R26 - monoMe L13 K43 - ac H17 1 - 13 R2 - monoMe J15 R26 - sym diMe L14 K46 - monoMe H18 R2 - sym diMe J16 R26 - asym diMe L15 K46 - diMe H19 R2 - asym diMe J17 K27 - monoMe L16 K46 - triMe H20 T3 - phosph J18 K27 - diMe L17

Table 2.2 – continued 46

Histone – H3 cont’d Histone – H3 cont’d Histone – H4 cont’d Identity Modifications Peptide # Identity Modifications Peptide # Identity Modifications Peptide # 21 - 33 K27 - triMe L18 75 - 88 K79 - diMe N17 10 - 22 K12 - ac P15 K27 - ac L19 K79 - triMe N18 K16 - monoMe P16 S28 - phosph L20 K79 - ac N19 K16 - diMe P17 T32 - phosph L21 T80 - phosph N20 K16 - triMe P18 31 - 44 none L22 R83 - monoMe N21 K16 - ac P19 31 - 44 T32 - phosph L23 R83 - sym diMe N22 R17 - monoMe P20 K36 - monoMe L24 R83 - asym diMe N23 R17 - sym diMe P21 K36 - diMe L25 S86 - phosph N24 R17 - asym diMe P22 K36 - triMe L26 S87 - phosph N25 R19 - monoMe P23 K36 - ac L27 88 - 101 none N26 R19 - sym diMe P24 K37 - monoMe L28 88 - 101 T99 - phosph N27 R19 - asym diMe P25 K37 - diMe L29 101 - 114 none N28 K20 - monoMe P26 K37 - triMe L30 101 - 114 T107 - phosph N29 K20 - diMe P27 K37 - ac M1 114 - 126 none N30 K20 - triMe P28 R40 - monoMe M2 114 - 126 K115 - monoMe O1 K20 - ac P29 R40 - sym diMe M3 K115 - diMe O2 21 - 33 none P30 R40 - asym diMe M4 K115 - triMe O3 21 - 33 R23 - monoMe Q1 R42 - monoMe M5 K115 - ac O4 R23 - sym diMe Q2 R42 - sym diMe M6 R116 - monoMe O5 R23 - asym diMe Q3 R42 - asym diMe M7 R116 - sym diMe O6 T30 - phosph Q4 44 - 57 none M8 R116 - asym diMe O7 K31 - monoMe Q5 44 - 57 T45 - phosph M9 T118 - phosph O8 K31 - diMe Q6 R49 - monoMe M10 K122 - monoMe O9 K31 - triMe Q7 R49 - sym diMe M11 K122 - diMe O10 K31 - ac Q8 R49 - asym diMe M12 K122 - triMe O11 33 - 43 none Q9 R52 - monoMe M13 K122 - ac O12 33 - 43 R35 - monoMe Q10 R52 - sym diMe M14 124 - 135 none O13 R35 - sym diMe Q11 R52 - asym diMe M15 124 - 135 R128 - monoMe O14 R35 - asym diMe Q12 R53 - monoMe M16 R128 - sym diMe O15 R36 - monoMe Q13 R53 - sym diMe M17 R128 - asym diMe O16 R36 - sym diMe Q14 R53 - asym diMe M18 R129 - monoMe O17 R36 - asym diMe Q15 K56 - monoMe M19 R129 - sym diMe O18 R39 - monoMe Q16 K56 - diMe M20 R129 - asym diMe O19 R39 - sym diMe Q17 K56 - triMe M21 R131 - monoMe O20 R39 - asym diMe Q18 K56 - ac M22 R131 - sym diMe O21 R40 - monoMe Q19 S57 - phosph M23 R131 - asym diMe O22 R40 - sym diMe Q20 54 - 67 none M24 R134 - monoMe O23 R40 - asym diMe Q21 54 - 67 K56 - monoMe M25 R134 - sym diMe O24 41 - 50 none Q22 K56 - diMe M26 R134 - asym diMe O25 41 - 50 K44 - monoMe Q23 K56 - triMe M27 K44 - diMe Q24 K56 - ac M28 Histone - H4 K18 - diMe S57 - phosph M29 Identity Modifications Peptide # K44 - ac Q26 T58 - phosph M30 1 - 11 none O28 R45 - monoMe Q27 R63 - monoMe N1 1 - 11 S1 - phosph O29 R45 - sym diMe Q28 R63 - sym diMe N2 R3 - monoMe O30 R45 - asym diMe Q29 R63 - asym diMe N3 R3 - sym diMe P1 S47 - phosph Q30 K64 - monoMe N4 R3 - asym diMe P2 51 - 60 none R1 K64 - diMe N5 K5 - monoMe P3 51 - 60 T54 - phosph R2 K64 - triMe N6 K5 - diMe P4 R55 - monoMe R3 K64 - ac N7 K5 - triMe P5 R55 - sym diMe R4 65 - 75 none N8 K5 - ac P6 R55 - asym diMe R5 65 - 75 R69 - monoMe N9 K8 - monoMe P7 K59 - monoMe R6 R69 - sym diMe N10 K8 - diMe P8 K59 - diMe R7 R69 - asym diMe N11 K8 - triMe P9 K59 - triMe R8 R72 - monoMe N12 K8 - ac P10 K59 - ac R9 R72 - sym diMe N13 10 - 22 none P11 56 - 66 none R10 R72 - asym diMe N14 10 - 22 K12 - monoMe P12 56 - 66 K59 - monoMe R11 75 - 88 none N15 K12 - diMe P13 K59 - diMe R12 75 - 88 K79 - monoMe N16 K12 - triMe P14 K59 - triMe R13

Table 2.2 – continued 47

Histone – H4 cont’d Proteins – p53 Identity Modifications Peptide # Identity Modifications Peptide # 56 - 66 K59 - ac R14 364 - 376 none S17 65 - 76 none R15 364 - 376 K372 - monoMe S18 65 - 76 R67 - monoMe R16 K372 - diMe S19 R67 - sym diMe R17 K372 - triMe S20 R67 - asym diMe R18 T71 - phosph R19 Synthesis controls T73 - phosph R20 Identity Modifications Peptide # 76 - 86 none R21 1 - 12 poly-Ala T1 76 - 86 K77 - monoMe R22 poly-Arg T2 K77 - diMe R23 poly-Asn T3 K77 - triMe R24 poly-Asp T4 K77 - ac R25 poly-Cys T5 R78 - monoMe R26 poly-Gln T6 R78 - sym diMe R27 poly-Glu T7 R78 - asym diMe R28 poly-Gly T8 K79 - monoMe R29 poly-His T9 K79 - diMe R30 poly-Ile T10 K79 - triMe S1 poly-Leu T11 K79 - ac S2 poly-Lys T12 T80 - phosph S3 poly-Met T13 T82 - phosph S4 poly-Phe T14 87 - 97 none S5 poly-Pro T15 87 - 97 K91 - monoMe S6 poly-Ser T16 K91 - diMe S7 poly-Thr T17 K91 - triMe S8 poly-Trp T18 K91 - ac S9 poly-Tyr T19 R92 - monoMe S10 poly-Val T20 R92 - sym diMe S11 R92 - asym diMe S12 R95 - monoMe S13 R95 - sym diMe S14 R95 - asym diMe S15 T96 - phosph S16

48

a single modified residue. Synthesis was carried out in automated fashion utilizing standard

Fmoc chemistry, building the peptide from C- to N-terminus as described previously (Frank

2002; Hilpert, Winkler et al. 2007). The synthetic peptides were immobilized on the membrane by covalent linkage between the first C-terminal residue of the peptide and ethylglycol on the modified cellulose membrane. A double coupling process was used for the first seven amino acids, and triple coupling for each subsequent amino acid to ensure complete reaction. After synthesis the side-chain protection groups were cleaved using the manufacturer’s recommended protocol and stained with Fast Green FCF dye (Sigma) to ensure uniform peptide synthesis and for easier visual interpretation.

Analysis of synthesis and specificity

A general monoclonal primary antibody against acetylated lysines (#9681, Cell Signaling

Technology Inc.) was used to probe a membrane containing all known modified histone peptides for proper incorporation of acetylated lysine at various positions in the peptide. The membrane was incubated with Blocking Buffer (5% milk in 1xPBS+0.1%Tween20) for one hour at room temperature and afterwards washed three times for 5 minutes each with PBS/T. Next, the primary antibody at 1:1000 dilution in 5% BSA, 1xPBS+0.1%Tween20 was added and incubated at room temperature for one hour with gentle shaking. Following the three 5 minute washes with PBS/T, HRP-conjugated anti-mouse fragment secondary antibody in Blocking

Buffer (GE Healthcare, 1:2000 dilution) was added for one hour at room temperature. Later, the membrane was washed three times for 5 minutes each and developed using ECL detection kit

(Perkin Elmer). 49

In order to test for the specificity of the antibody, a primary antibody against

phosphorylated serine 10 (ab47297, Abcam Inc.) and threonine 11 (ab5168, Abcam Inc.) on

histone H3 were used. The protocol, including washes, dilution, and incubation times was the

same as for the acetylated antibody, but instead of anti-mouse HRP fragment secondary antibody

we used anti-rabbit HRP fragment secondary antibody (Amersham Biosciences).

Stripping from the membrane

The membrane was incubated with Stripping Buffer A (6M Guanidinium HCl, 1% Triton

X-100) for 45-60 minutes followed by an overnight incubation with Stripping Buffer A + Talon

Beads at room temperature. Afterwards, washed twice with Stripping Buffer B (500mM

Imidazole, 500mM NaCl, 20mM Tris, pH 7.5) 30 minutes each followed by a series of washes

with ddH2O at room temperature and at 60°C. Finally, series of washes, 10 minutes each with

ddH2O; 10% TFA (trifluoroacetic acid); ddH2O; 20% EtOH; 50% EtOH; 95% EtOH. The

membrane was dried and stored for extended periods of time at -20°C.

Protein-peptide interaction assay

The membrane was washed three times 5 minutes each with PBS/T at room temperature.

Next, the membrane was blocked with Blocking Buffer (5% milk in 1xPBS+0.1%Tween20)

overnight at 4°C to minimize the non-specific binding of the protein to the membrane. The

volume varied depending on the size of the membrane. Next day the membrane was washed

twice for 5 minutes each with PBS/T followed by a single wash with PBS for 5 minutes at room temperature (unless otherwise specified the incubation is done at room temperature). Added 1μM

protein of interest (recombinant protein of interest with the tag, ex. 6xHIS) diluted in 1xPBS in 50

the volume sufficient to completely cover the membrane, usually 15ml. Sealed in a tight

container and incubated overnight at 4°C. Next day, after three 5 minute washes with PBS/T, the

membrane was blocked with 40ml of Blocking Buffer B (10% milk in PBS/T) for one hour.

After the 2-3 washes, added primary antibody against the tag and incubated in Blocking Buffer B

for additional hour. The dilution for the primary monoclonal anti-his antibody from Qiagen was

1:1000. After the washes, secondary antibody was added and incubated in Blocking Buffer A for

another hour. The dilution for the secondary anti-mouse fragment antibody from GE Healthcare

was 1:5000. The membrane was washed three times for 15-20 minutes each with PBS/T and

developed using ECL detection kit (Perkin Elmer), 3 minutes reaction time and 1-5 minutes exposure time.

2.6 References

Bernstein, B. E., T. S. Mikkelsen, et al. (2006). "A bivalent chromatin structure marks key developmental genes in embryonic stem cells." Cell 125(2): 315-26. Botuyan, M. V., J. Lee, et al. (2006). "Structural basis for the methylation state-specific recognition of histone H4-K20 by 53BP1 and Crb2 in DNA repair." Cell 127(7): 1361- 73. Couture, J. F., E. Collazo, et al. (2006). "Molecular recognition of histone H3 by the WD40 protein WDR5." Nat Struct Mol Biol 13(8): 698-703. Dialynas, G. K., D. Makatsori, et al. (2006). "Methylation-independent binding to histone H3 and cell cycle-dependent incorporation of HP1beta into heterochromatin." J Biol Chem 281(20): 14350-60. Dion, M. F., S. J. Altschuler, et al. (2005). "Genomic characterization reveals a simple histone H4 acetylation code." Proc Natl Acad Sci U S A 102(15): 5501-6. Dou, Y., T. A. Milne, et al. (2005). "Physical association and coordinate function of the H3 K4 methyltransferase MLL1 and the H4 K16 acetyltransferase MOF." Cell 121(6): 873-85. Esteller, M. (2006). "The necessity of a human epigenome project." 27(6): 1121- 5. Fischle, W., B. S. Tseng, et al. (2005). "Regulation of HP1-chromatin binding by histone H3 methylation and phosphorylation." Nature 438(7071): 1116-22. Frank, R. (2002). "The SPOT-synthesis technique. Synthetic peptide arrays on membrane supports--principles and applications." J Immunol Methods 267(1): 13-26. Fuchs, S. M., K. Krajewski, et al. (2011). "Influence of combinatorial histone modifications on antibody and effector protein recognition." Curr Biol 21(1): 53-8. 51

Garske, A. L., S. S. Oliver, et al. (2010). "Combinatorial profiling of chromatin binding modules reveals multisite discrimination." Nat Chem Biol 6(4): 283-90. Guccione, E., C. Bassi, et al. (2007). "Methylation of histone H3R2 by PRMT6 and H3K4 by an MLL complex are mutually exclusive." Nature 449(7164): 933-7. Hilpert, K., D. F. Winkler, et al. (2007). "Peptide arrays on cellulose support: SPOT synthesis, a time and cost efficient method for synthesis of large numbers of peptides in a parallel and addressable fashion." Nat Protoc 2(6): 1333-49. Huang, Y., J. Fang, et al. (2006). "Recognition of histone H3 lysine-4 methylation by the double tudor domain of JMJD2A." Science 312(5774): 748-51. Hyland, E. M., M. S. Cosgrove, et al. (2005). "Insights into the role of histone H3 and histone H4 core modifiable residues in Saccharomyces cerevisiae." Mol Cell Biol 25(22): 10060-70. Jacobs, S. A. and S. Khorasanizadeh (2002). "Structure of HP1 chromodomain bound to a lysine 9-methylated histone H3 tail." Science 295(5562): 2080-3. Jacobs, S. A., S. D. Taverna, et al. (2001). "Specificity of the HP1 chromo domain for the methylated N-terminus of histone H3." Embo J 20(18): 5232-41. Kirmizis, A., H. Santos-Rosa, et al. (2007). "Arginine methylation at histone H3R2 controls deposition of H3K4 trimethylation." Nature 449(7164): 928-32. Klymenko, T., B. Papp, et al. (2006). "A Polycomb group protein complex with sequence- specific DNA-binding and selective methyl-lysine-binding activities." Genes Dev 20(9): 1110-22. Kramer, A., U. Reineke, et al. (1999). "Spot synthesis: observations and optimizations." J Pept Res 54(4): 319-27. Krebs, J. E. (2007). "Moving marks: dynamic histone modifications in yeast." Mol Biosyst 3(9): 590-7. Kuzmichev, A., R. Margueron, et al. (2005). "Composition and histone substrates of polycomb repressive group complexes change during cellular differentiation." Proc Natl Acad Sci U S A 102(6): 1859-64. Li, H., W. Fischle, et al. (2007). "Structural basis for lower lysine methylation state-specific readout by MBT repeats of L3MBTL1 and an engineered PHD finger." Mol Cell 28(4): 677-91. Li, H., S. Ilin, et al. (2006). "Molecular basis for site-specific read-out of histone H3K4me3 by the BPTF PHD finger of NURF." Nature 442(7098): 91-5. Maurer-Stroh, S., N. J. Dickens, et al. (2003). "The Tudor domain 'Royal Family': Tudor, plant Agenet, Chromo, PWWP and MBT domains." Trends Biochem Sci 28(2): 69-74. Min, J., A. Allali-Hassani, et al. (2007). "L3MBTL1 recognition of mono- and dimethylated histones." Nat Struct Mol Biol 14(12): 1229-30. Mukherjee, S., Y. H. Hao, et al. (2007). "A newly discovered post-translational modification--the acetylation of serine and threonine residues." Trends Biochem Sci 32(5): 210-6. Mukherjee, S., G. Keitany, et al. (2006). "Yersinia YopJ acetylates and inhibits kinase activation by blocking phosphorylation." Science 312(5777): 1211-4. Nady, N., A. Lemak, et al. (2011). "Recognition of multivalent histone states associated with heterochromatin by UHRF1." J Biol Chem. Nelson, C. J., H. Santos-Rosa, et al. (2006). "Proline isomerization of histone H3 regulates lysine methylation and gene expression." Cell 126(5): 905-16. Neumann, H., S. Y. Peak-Chew, et al. (2008). "Genetically encoding N(epsilon)-acetyllysine in recombinant proteins." Nat Chem Biol 4(4): 232-4. 52

Nguyen, D. P., M. M. Garcia Alai, et al. (2009). "Genetically encoding N(epsilon)-methyl-L- lysine in recombinant histones." J Am Chem Soc 131(40): 14194-5. Nguyen, D. P., M. M. Garcia Alai, et al. (2010). "Genetically directing varepsilon-N, N- dimethyl-L-lysine in recombinant histones." Chem Biol 17(10): 1072-6. Ooi, S. K., C. Qiu, et al. (2007). "DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA." Nature 448(7154): 714-7. Reineke, U., R. Sabat, et al. (1996). "Mapping protein-protein contact sites using cellulose-bound peptide scans." Mol Divers 1(3): 141-8. Ruthenburg, A. J., H. Li, et al. (2007). "Multivalent engagement of chromatin modifications by linked binding modules." Nat Rev Mol Cell Biol 8(12): 983-94. Ruthenburg, A. J., W. Wang, et al. (2006). "Histone H3 recognition and presentation by the WDR5 module of the MLL1 complex." Nat Struct Mol Biol 13(8): 704-12. Schubeler, D., D. M. MacAlpine, et al. (2004). "The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote." Genes Dev 18(11): 1263-71. Schuetz, A., A. Allali-Hassani, et al. (2006). "Structural basis for molecular recognition and presentation of histone H3 by WDR5." Embo J 25(18): 4245-52. Shogren-Knaak, M. A. (2007). "Mimicking methylated histones." ACS Chem Biol 2(4): 225-7. Shogren-Knaak, M. A., C. J. Fry, et al. (2003). "A native peptide ligation strategy for deciphering nucleosomal histone modifications." J Biol Chem 278(18): 15744-8. Simon, M. D., F. Chu, et al. (2007). "The site-specific installation of methyl-lysine analogs into recombinant histones." Cell 128(5): 1003-12. Toepert, F., T. Knaute, et al. (2003). "Combining SPOT synthesis and native peptide ligation to create large arrays of WW protein domains." Angew Chem Int Ed Engl 42(10): 1136-40. Trojer, P., G. Li, et al. (2007). "L3MBTL1, a histone-methylation-dependent chromatin lock." Cell 129(5): 915-28. Virdee, S., Y. Ye, et al. (2010). "Engineered diubiquitin synthesis reveals Lys29-isopeptide specificity of an OTU deubiquitinase." Nat Chem Biol 6(10): 750-7. West, L. E., S. Roy, et al. (2010). "The MBT repeats of L3MBTL1 link SET8-mediated p53 methylation at lysine 382 to target gene repression." J Biol Chem 285(48): 37725-32. Wisniewski, J. R., A. Zougman, et al. (2007). "Mass spectrometric mapping of linker histone H1 variants reveals multiple , , and phosphorylation as well as differences between cell culture and tissue." Mol Cell Proteomics 6(1): 72-87. Young, N. L., P. A. Dimaggio, et al. (2010). "The significance, development and progress of high-throughput combinatorial histone code analysis." Cell Mol Life Sci 67(23): 3983- 4000.

53

Chapter 3

Histone Substrate Recognition by Human MBT Domains

The cDNA plasmids for Xenopus laevis histones H3 and H4 were provided by Karolin Luger, for Widom 601 DNA by Hugo van Ingen, and for wild type MBT proteins by Jinrong Min. The plasmids for mutant proteins and mg quantities of 601 DNA used in this study were prepared by Shili Duan of the Arrowsmith laboratory. Maria F. Amaya assisted with the crystal structure calculation. Protein purification, crystal preparation, structure determination, protein-protein interactions, preparation of the recombinant nucleosomes and polyacrylamide gel-shift assays were performed by the author. 54

3.1 Summary

Histone lysine methylation is found in three distinct states (mono-, di- and trimethylation) which are recognized by specific protein modules. The Malignant Brain Tumor (MBT) domain is one such module and it is found in several chromatin regulatory complexes such as polycomb repressive complex 1 (PRC1). This chapter describes a comprehensive characterization of the human MBT domain family with emphasis on the histone peptide binding specificity. SPOT-blot peptide arrays are used to screen for the methyllysine-containing histone peptides that bind to 29

MBT domains in 9 diverse human proteins. Selected interactions were quantified using fluorescence polarization assays. The study shows that all MBT proteins recognize only mono- and/or dimethyllysine marks, and provides biochemical and structural evidence that some MBT domains have a defined consensus sequence for recognition, while others bind in a promiscuous, non-sequence specific manner. Furthermore, using structure-based mutants we identify a triad of residues in the methyllysine binding pocket that are responsible for the discrimination between the mono- and dimethyllysine states, which may provide a foundation for the rational design of potent and highly selective MBT inhibitors. Overall, this research represents a comprehensive analysis of the MBT substrate specificity, establishing a powerful resource for future functional studies of the MBT family and their role in health and disease.

3.2 Introduction

A class of chromatin regulators known as MBT domains is evolutionarily conserved and first appears with the emergence of multicellularity. The early MBTs are found in

Caenorhabditis elegans and Drosophila melanogaster, with each encoding three MBT- containing proteins. Using SMART, UniProt, Pfam databases and manual searches 29 MBT 55

domains were identified in the human proteome that are present in 9 proteins (Fig. 3.1a). The

human proteins contain from two to four MBT tandem repeats, with the exception of a short

SFMBT2 isoform that contains a single MBT repeat. Available structural studies and

mutagenesis data show that the number of domains present within the protein does not contribute

to the complexity of the PTM recognition. In each protein only one domain has a functional

aromatic cage capable of binding to the methylated lysine. Phylogenetic tree analyses of the

individual repeats show clustering of the domains (Fig. 3.1b). The repeats that have an

experimentally confirmed functional aromatic cage cluster within one branch of the tree, for

example the second MBT repeat from SCML2, L3MBTL1, and the fourth MBT repeat from

MBTD1, L3MBTL2 proteins. This suggests that the other domains clustering within the “active”

sub-branches may also contain functional aromatic binding pockets. These include the second

MBT domain from SCMH1, and the second MBTs of L3MBTL3, L3MBTL4 proteins. The MBT

repeats of SFMBT1 and SFMBT2 have lower conservation with the other MBT domains and the closest repeat to the functional cluster is the fourth domain within SFMBT1 and SFMBT2 proteins.

In flies, proteins containing the MBT domains are products of Polycomb Group (PcG) genes and have been implicated in transcriptional repression of numerous developmental genes.

Misregulation of the MBT-containing proteins contributes to various disease phenotypes. In humans SCML2, L3MBTL2 and L3MBTL3 were found to be homozygously deleted in patients with meduloblastomas (Northcott, Nakahara et al. 2009), L3MBTL4 was misregulated in breast cancer patients (Addou-Klouche, Adelaide et al. 2010), MBTD1, L3MBTL1 and L3MBTL3 were implicated in hematopoiesis (Arai and Miyazaki 2005; Gurvich, Perna et al. 2010; Perna,

Gurvich et al. 2010; Honda, Takubo et al. 2011), and SFMBT1 deletion caused ventriculomegaly

56

a L3MBTL1 L3MBTL1 E2F6.com-1 772 LIN-61 L(3)MBT (L3MBTL) L3MBTL4 L3MBTL4 623

L3MBTL3 L3MBTL3 780 (MBT-1) M4MBT L3MBTL2 PRC1L4 705 Mbtr-1 Hemp MBTD1 628 (Hemp) SFMBT1 SFMBT1 (RU1) 866 SfMBT SFMBT2 SFMBT2 (CG16975) (KIAA1617) 894 SCMH1 SCMH1 Sor-3 Scm PRC1 660 SCML2 SCML2 700

M C D H M . . . o u e m m mo C l l o t e e u i m p g la s c s ro MBT domain a n p n u a o p l t s lu e e g i x basic region a s e in s n t s e r Zn finger domain SAM domain b

L2_2 L3M BT SCM L1_2

MBT D1_4 L3MBTL2_4

Figure 3.1. Domain architecture and analysis of the MBT-containing proteins. (a). Evolutionary expansion of MBT-containing proteins from worms to humans. On the right is the schematic domain organization of human proteins that contain MBT domains. The most common, alternative names and the length of the proteins are shown. Multiprotein chromatin complexes that human MBT-containing proteins have been shown to be a part of experimentally are indicated. The positions of the different domains are highlighted as shown by the legend. MBT domains are coloured in orange or in red if they have been shown experimentally to contain a functional aromatic cage. (b). Phylogenetic tree of the individual human MBT domains generated using the SGC database (thesgc.org). The name of the protein is followed by the underscore and the sequential MBT domain numbering starting from the N-terminus. Indicated in red are MBT domains that have functional aromatic cages. 57

which may lead to dementia (Kato, Sato et al. 2011). Such broad involvement with various diseases suggests that these proteins do not function in the same pathway or as a part of the same macromolecular complex and, possibly, their association is tissue dependent. Some of the human

MBT-containing proteins are known to be a part of larger multiprotein chromatin-remodeling complexes. For example, SCMH1 is a non-essential component of the Polycomb repressive complex 1 (PRC1) in spermatocytes (Takada, Isono et al. 2007), L3MBTL2 is part of the newly identified PRC1-like 4 (Trojer, Cao et al. 2011), and L3MBTL1 associates with the E2F6.com-1 complex (Ogawa, Ishiguro et al. 2002).

Structural and biochemical analyses of the MBT domains have shown their specific recognition of the lower methylation states of lysines on histones through a conserved aromatic cage binding pocket (Taverna, Li et al. 2007; Adams-Cioaba and Min 2009). Numerous structures of MBT domains with methylated lysine peptides have been solved and binding specificities for several MBT-containing proteins have been investigated (Grimm, de Ayala

Alonso et al. 2007; Li, Fischle et al. 2007; Min, Allali-Hassani et al. 2007; Santiveri,

Lechtenberg et al. 2008; Eryilmaz, Pan et al. 2009; Grimm, Matos et al. 2009; Guo, Nady et al.

2009). However, those analyses were limited and based only on a few histone peptides with a millimolar range of binding affinities for some complexes. Therefore, a comprehensive systematic analysis of the MBT binding specificities is needed in order to better understand and interpret biological implications of MBT binding partners.

In this study, we present a comprehensive substrate characterization of the human MBT family. The SPOT-blot peptide arrays were used to test the binding specificities of 29 MBT domains from 9 proteins in the human proteome. The binding affinities for selected protein- peptide complexes were determined using fluorescence polarization (FP). Furthermore, the 58

molecular determinants responsible for the differences in recognition between the Kme1 and

Kme2 states are described. This study provides a comprehensive analysis of MBT substrate

specificity, establishing a powerful resource for future functional studies of the MBT family and

other chromatin reader domains.

3.3 Results

3.3.1 MBT domains recognize a variety of methylated lysines on core histones

Substrates for the MBT domains of L3MBTL1, L3MBTL2, SCML2 and MBTD1 have been previously reported (Li, Fischle et al. 2007; Min, Allali-Hassani et al. 2007; Santiveri,

Lechtenberg et al. 2008; Eryilmaz, Pan et al. 2009; Guo, Nady et al. 2009). We used SPOT-blot technology described in Chapter 2 to systematically screen for the interaction between the histone peptide substrates with the recombinant poly-histidine-tagged MBT domains from human proteins. The synthesized cellulose membranes contained 550 distinct, 10-15 residue long peptides spanning histones H2A, H2B, H3, and H4 (Chapter 2, Fig. 2.5a). These peptides contained unmodified controls, as well as modifications at a single position including all possible

methylation, acetylation, and phosphorylation sites (Nady, Min et al. 2008). Additionally, each

membrane had 4 poly-histidine peptide spots that served as positive controls, and 22 negative

controls (peptides containing all poly-amino acid sequences, with poly-ser in triplicates and

excluding poly-his).

Using these arrays the 6xHIS-tagged MBT domains from 9 human proteins were tested

(summarized in Fig. 3.2). Interaction with the histone peptides was visualized by the Western

Blot procedure using a primary antibody against the 6xHIS-tag. Overall, the MBT family bound

to all four core histones, with the monomethylation on lysine 36 of (H2AK36me1)

59

Figure 3.2. Interactions of the MBT domains SCMH1 SCML2 MBTD1 SFMBT1 SFMBT2 SFMBT1 L3MBTL1 L3MBTL4 SCMH1 SCML2 MBTD1 SFMBT2 L3MBTL2 L3MBTL3 L3MBTL1 L3MBTL2 L3MBTL3 L3MBTL4 with histones in SPOT arrays. Nine proteins

K5me1 K4me1 K5me2 K4me2 containing MBT domains were screened against an K9me1

1-12 K9me2

K9me1 1-13 array of singly modified peptides that cover all K9me2 possible modification sites in histones H2A, H2B, K13me1 K13me2

K9me1 K15me1 K9me2 K15me2 H3 and H4. Spots are shaded by different spot

10-21 K14me1 K14me2

7-20 intensities from light blue (weak spot) to dark blue

K18me1 K18me2 (very intense spot). Peptides indicated with asterisk K15me1 K15me2 encompass H2A (*aa83-93; **aa103-115), H2B K18me1 K18me2 14-27 K23me1 (*aa94-107), H3 (*aa88-101; **aa101-114), p53

K23me2 15-25

K23me1 (*aa364-376). K23me2

K36me1 25-38 K36me2 K27me1 K27me2 21-33

K36 me1 K36me2 K36me1 K36me2

K37me1 K37me2 33-45 31-44 Histone H3 Histone H2A

K74me1 K74me2

K75me1 K75me2 70-83

K56me1 K56me2 44-57 * K56me1 K56me2

K95me1 K95me2

K99me1

K99me2 54-67 K64me1 K64me2 93-103 ** K118me1 K118me2

K119me1 K119me2 65-75 K79me1 K79me2

K125me1 K125me2

K127me1 K127me2 116-129 75-88 K129me1 K129me2 * ** K5me1 K115me1 K5me2 K115me2

K11me1 K11me2

1-13 K122me1 K12me1 K122me2 K12me2 114-126

K11me1 K11me2

K12me1 K12me2

K15me1 7-18 K15me2 124-135

K16me1 K16me2

K5me1 K5me2 K20me1 K20me2 K8me1 K8me2 1-11 K23me1 K23me2

K12me1

17-26 K24me1 K12me2 K24me2

K16me1 K16me2 K27me1 K27me2

K28me1 K28me2 10-22

K20me1 K20me2

K30me1 K30me2 25-39

K31me1 K31me2 21-33 K34me1 K34me2 Histone H4

K43me1 K43me2 Histone H2B 33-43 K46me1 K46me2 35-47

K44me1 K44me2 K43me1 K43me2

K46me1 K46me2 41-50 41-53

K85me1 K85me2 K59me1 K59me2 51-60

K59me1 K59me2 66 56- 81-93 76 65- * K77me1 K77me2

K108me1 K108me2

K79me1 K79me2 K116me1 K116me2 76-86 106-118 K91me1 K116me1 K91me2 K116me2

K120me1 K120me2 87-97

K125me1 K125me2 K372me1 *

114-125 K372me2 p53 60

recognized by several MBT domains, and the rest of the recognized modifications were distributed approximately equally across H2B, H3 and H4 histones. The binding was confirmed and quantified using fluorescence polarization (FP). In most cases the FP results agreed with those obtained from the SPOT-blot arrays producing Kd values within the low to mid- micromolar range for the highest affinity binding peptides (SCML2, L3MBTL1-4 and MBTD1;

Table 3.1). However, some MBT proteins (SCMH1, and SFMBT1&2) showed binding to a few histone peptides and with very low affinities (Table 3.1). This suggests that either their true substrates may be non-histone proteins, or that the short histone peptides and the MBT domain constructs used in the assay are missing additional components required for higher affinity interaction. As evidenced from Figure 3.2, there are two main features of the MBT-histone PTM interaction, (1) the state of lysine methylation that was recognized and (2) the sequence specificity of the peptide surrounding the lysine.

3.3.2 Recognition mechanism for mono- and dimethyllysine

Previous work has established that several MBT proteins (dScm, dSfmbt, L3MBTL1,

L3MBTL2, MBTD1, SCML2) recognize only the lower methylation lysine states, and never trimethylated lysine (Adams-Cioaba and Min 2009; Bonasio, Lecona et al. 2010; Gao, Herold et al. 2011). In the previously reported MBT studies, the methyllysine binding aromatic cage was described to contain six residues; three aromatic ring “walls” (Phe, Trp, and Phe or Tyr), a hydrophilic “wall” (Asp and Asn) and a small residue like Cys on the “floor” (Grimm, de Ayala

Alonso et al. 2007; Li, Fischle et al. 2007; Min, Allali-Hassani et al. 2007; Santiveri,

Lechtenberg et al. 2008; Eryilmaz, Pan et al. 2009; Grimm, Matos et al. 2009; Guo, Nady et al.

2009). Our structural and biochemical analyses of the binding specificity revealed three groups 61

Table 3.1. Overview of the highest affinity histone substrates for MBT domains within human proteins.

Methyl Strength of Protein Histone Preferred modifications status Interaction H2A H2B no sequence specificity Kd ranges between promiscuous binder on SPOT L3MBTL1 H3 Kme1/2 arrays 258-924 μM H4 (H4K20me1/2) non-histone targets p53

L3MBTL2 H4 Kme1/2 H4K20me1/2 Kd ~ 10 μM

H2A H2B no sequence specificity Kd ranges between L3MBTL3 Kme2 promiscuous binder on SPOT H3 arrays 79-356 μM H4 (H4K20me2)

L3MBTL4 H4 Kme2 H4K20me2 Kd ~ 63 μM

MBTD1 H4 Kme1 H4K20me1 Kd ~ 31 μM

SCMH1 none Kd >> 2000 μM

H2A H2AK36me1 Kd ~ 102 μM SCML2 Kme1 H3 H3K36me1 Kd ~ 159 μM

dual SFMBT1 H4 H4T30ph/K31me1 Kd ~ 974 μM modifications

SFMBT2 H2B Kme1 H2BK43me1, H2BK46me1 Not tested

62

of the recombinant MBT domains. One group (ex. SCML2 protein) preferentially bound to monomethylated lysines (Kme1), the other (ex. L3MBTL3) to dimethylated lysines (Kme2), and the last group had comparable affinities to either Kme1 or Kme2 as exemplified by L3MBTL1 protein and consistent with the previous reports (Li, Fischle et al. 2007; Min, Allali-Hassani et al.

2007; Gao, Herold et al. 2011). While these residues are highly conserved among the MBT aromatic cages, our data revealed substantial differences in the specificity for the lysine methylation states among the MBT domains. Five of the six canonical aromatic cage residues mentioned above are identical in SCML2, L3MBTL1 and L3MBTL3, even though these MBTs show binding preferences for monomethyl, mono- or dimethyl, and dimethyllysine, respectively.

The sixth aromatic cage residue, Tyr386 in L3MBTL1, is also conserved in L3MBTL3 (Tyr412).

Therefore, the origins of methylation state selectivity should be due to residues other than these canonical six aromatic cage residues.

This phenomenon was further investigated by the detailed analysis of sequence alignments and the available crystal structures. Two additional residues were identified as part of the binding pocket, and they might also contribute to the methylation state selectivity (Fig. 3.3a).

These two new binding pocket residues were designated as selectivity residues 1 and 2 (SR1 and

SR2). In the case of L3MBTL1, which recognizes both, Kme1 and Kme2 with comparable affinity, these residues are Thr411 and Leu361. SCML2 is selective for Kme1 (smaller in size and with a greater point charge) and while it contains Leu at SR2, it has a large hydrophilic Gln residue at the SR1 position. Although there is no reported crystal structure for L3MBTL3, the sequence and structure-based alignments suggest that L3MBTL3 has a Phe at SR2, and Ile at

SR1 (the latter is a result of a single residue insertion), and therefore a more hydrophobic pocket than SCML2. This may contribute to the preference of L3MBTL3 for Kme2.

63

a b L3MBTL3 wt: Kme2 I437Q/F387L: Kme1 > Kme2 SCML2 : N

SCML2 : L SR2 L3MBTL1 : N 188 L3MBTL1 : L 250 250 L3MBTL3 : N 361 L3MBTL3 : F387 SCML2 : D 200 200 L3MBTL1 : D SR1 SCML2 : Q238 L3MBTL3 : D 150 150 L3MBTL1 : T411 SCML2 : F213 L3MBTL3 : I 437 100 100 L3MBTL1 : Y386 L3MBTL3 : Y SCML2 : C 412 50 50 L3MBTL1 : C Kme1 (Kd = 305 + 48uM) AC6 - Kme1 (Kd = 45 +- 4uM) L3MBTL3 : C Units Polarization Fluorescence Relative Relative FLuorescence PolarizationUnits Kme2 (Kd = 95 + 7uM) 0 - 0 Kme2 (Kd = 112 +- 11uM) SCML2 : F SCML2 : W 0 200 400 600 800 1000 0 200 400 600 800 1000 L3MBTL1 : F L3MBTL1 : W Protein concentration, uM Protein concentration, uM L3MBTL3 : F L3MBTL3 : W c L3MBTL1 wt: Kme1 ~ Kme2 T411Q: Kme1 > Kme2 T411I/L361F: Kme1 < Kme2 250

250 250 200

200 200 150

150 150

100 100 100

50 50 50

Kme1 (Kd = 107 + 15uM) Kme1 (Kd = 70 + 14uM) Kme1 (Kd = 89 + 7uM) - - 0 - Relative Fluorescence Polarization Units Polarization Fluorescence Relative 0 0 Relative FluorescenceRelative Polarization Units Kme2 (Kd = 77 +- 21uM) Kme2 (Kd = 569 +- 79) Units Polarization Fluorescence Relative Kme2 (Kd = 38 +- 4uM)

0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 Protein concentration, uM Protein concentration, uM Protein concentration, uM

d SCML2 wt: Kme1 Q238I/F213Y:Kme1 < Kme2 Q238I/F213Y/L188F:Kme1 < Kme2 250 200 250

200 200 150

150 150 100

100 100

50 50 50

Kme1 (Kd = 211 + 37uM) 0 0 - 0 Kme1 (Kd = 1312 + 508uM) Kme1 (Kd = 1105 + 218uM) Relative Polarization Fluorescence Units -

- Relative Fluorescence Polarization Units Kme2 (Kd > 2000uM) Relative Fluorescence Polarization Units Kme2 (Kd = 572 +- 160uM) Kme2 (Kd = 870 +- 243uM)

0 200 400 600 800 1000 1200 0 200 400 600 800 1000 0 200 400 600 800 1000 Protein concentration, uM Protein concentration, uM Protein concentration, uM

Figure 3.3. Influence of the MBT pocket architecture on the recognition of the lysine methylation state. (a). Structure of the L3MBTL1 pocket bound to monomethylated lysine (PDB: 2RHY). For comparison, residues within L3MBTL3 and SCML2 pockets are indicated. Residues in grey are identical, those shown in black (AC6, SR1, SR2) are not identical and have been mutated. (b,c,d). FP assays (t = 28.8 °C) looking at the interaction between L3MBTL3 (b), L3MBTL1 (c), and SCML2 (d) wild type and mutant proteins. 64

In order to test whether SR1 and SR2 contribute directly to the methylation state selectivity, a panel of mutants was designed to switch the methylation state selectivity of

SCML2, L3MBTL1 and L3MBTL3 (Fig. 3.3b-d). It was first ensured that each mutant MBT protein had no discernable structural perturbations by measuring its thermal stability relative to the wild type protein (Table 3.2). The wild type and mutant proteins were then tested for binding to the respective highest affinity histone peptide (listed in Table 3.1) using FP assay (Table 3.3).

Wild type SCML2 (2xMBT domains) binds to the H3K36 peptide with a clear preference for Kme1. The objective was to design mutants that would drive specificity towards Kme2. First, we mutated the SR1 (Gln238) or the SR2 (Leu188) residues within SCML2 to the equivalent residues within the binding pockets of L3MBTL1 (no preference between Kme1 and Kme2) or

L3MBTL3 (preference for Kme2). The binding data showed that neither of those single point mutants could switch the selectivity from Kme1 to Kme2. Next, a mutation of Phe213 to Tyr in the sixth residue of the canonical aromatic cage (designated here as the AC6 position) was made.

This mutation dramatically increased the affinity for both Kme1 and Kme2 (Table 3.3). The double SCML2 mutant (Q238T at SR1 and F213Y at AC6), which resembles L3MBTL1 pocket, did not affect Kme1 binding of SCML2 and increased its affinity for Kme2. On the other hand, the double mutant that resembles L3MBTL3 binding pocket (Q238I at SR1 and F213Y at AC6) resulted in the affinity swap for Kme2 vs Kme1. Finally, the triple SCML2 mutant

(F213Y/Q238I/L188F) mimicking the pocket of L3MBTL3 had weak affinities for both Kme1 and Kme2, although, relative to the wild type SCML2 its affinity for Kme1 was greatly decreased, and affinity for Kme2 was much improved (Table 3.3 and Fig. 3.3d). These data support a role of the SR1 and the AC6 residues within MBT domains in contributing to the methylation state selectivity. 65

Table 3.2. Thermal stability of the wild type and mutant proteins as indicated by aggregation temperatures.

Protein Tagg SCML2(2xMBT)wt 42.1 ± 0.08 F213Y 42.76 ± 0.04 Q238I 40.88 ± 0.04 L188F 41.94 ± 0.03 Q238T/F213Y 42.04 ± 0.09 Q238I/F213Y 41.76 ± 0.01 Q238I/L188F/F213Y 42.58 ± 0.17

SCML2(1-325)wt 41.02 ± 0.37 N137A 41.24 ± 0.14 D182N 42.8 ± 0.56

L3MBTL1(3xMBT)wt 42.12 ± 0.51 T411Q 41.7 ± 0.05 T411I 42.34 ± 0.08 L361F 43.52 ± 0.01 T411Q/Y386F 42.81 ± 0.18 T411I/L361F 42.62 ± 0.22

L3MBTL3(3xMBT)wt 40.68 ± 0.24 I437Q 41.2 ± 0.07 I437T 40.97 ± 0.03 F387L 39.49 ± 0.02 I437Q/F387L 41.23 ± 0.19 I437T/F387L 41.01 ± 0.15 66

Table 3.3. Overview of the ability of MBT mutants to bind lower lysine methylation states.

Kd, μM Protein Position Kme1 Kme2 SCML2(2xMBT)wt 211 ± 37 >2000 F213Y AC6 58 ± 6 461 ± 111 Q238T SR1 457 ± 103 1230 ± 469 Q238I SR1 >2000 >2000 L188F SR2 396 ± 138 >2000 Q238T/F213Y SR1/AC6 285 ± 42 657 ± 216 Q238I/F213Y SR1/AC6 1312 ± 508 572 ± 160 Q238I/L188F/F213Y SR1/SR2/AC6 1105 ± 218 870 ± 243

L3MBTL1(3xMBT)wt 107 ± 15 77 ± 21 T411Q SR1 70 ± 14 569 ± 79 T411I SR1 115 ± 16 75 ± 9 L361F SR2 412 ± 76 163 ± 22 T411Q/Y386F SR1/AC6 31 ± 5 212 ± 12 T411I/L361F SR1/SR2 89 ± 7 38 ± 4

L3MBTL3(3xMBT)wt 305 ± 48 95 ± 7 I437Q SR1 136 ± 10 109 ± 13 I437T SR1 127 ± 15 22 ± 2 F387L SR2 110 ± 10 57 ± 6 I437Q/F387L SR1/SR2 45 ± 4 112 ± 11 I437T/F387L SR1/SR2 133 ± 19 71 ± 7 67

In a similar manner we sought to alter the lack of Kme1/Kme2 selectivity of L3MBTL1 with SR1 and SR2 mutagenesis, generating a protein with preference for either Kme1 or Kme2.

In this protein, a single mutation to Gln at the SR1 position (T411Q) resulted in preference for

Kme1, while, Thr to Ile substitution at the same position had no effect. A single mutation within

L3MBTL1 at the SR2 residue, which mimics L3MBTL3 (L361F) pocket, was able to tilt the preference toward Kme2 and significantly decrease the affinity for Kme1 (relative to WT

L3MBTL1). The double L3MBTL1 mutant combining the SR1 mutation with AC6 or SR2 residue swaps, resulted in significant selectivity for either Kme1 or Kme2 (Table 3.3 and Fig.

3.3c). Wild type L3MBTL3 (3xMBT) shows a three-fold preference for Kme2 over Kme1. In this protein a single substitution to hydrophilic Gln at the SR1 residue (I437Q) virtually eliminated methylation state preference. Introduction of a second mutation at the SR2 position

(F387L) resulted in a 2.5-fold selectivity for Kme1 vs Kme2, and 7-fold improved affinity for

Kme1 compared to the wild type protein (Table 3.3 and Fig. 3.3b).

Overall, a larger polar residue such as Gln at the SR1 position improves binding to

Kme1, while the presence of hydrophobic Ile at the same position contributes to Kme2 recognition. The presence of an aromatic residue at SR2 also favors binding to Kme2. These data also highlight synergy among the SR1, SR2 and AC6 sites of the methyllysine binding pocket which may have important consequences for the design of selective peptide competitive inhibitors of the MBT domains.

3.3.3 Lack of sequence specific binding by L3MBTL1 and L3MBTL3

The sequence surrounding the methylated lysine may play an important role in dictating the specificity. However, the biochemical analysis of binding specificity within the MBT family 68

revealed that two proteins, L3MBTL1 (3xMBT) and L3MBTL3 (3xMBT), showed a promiscuous binding pattern. In SPOT-blot arrays L3MBTL3 recognized almost all peptide sequences on core histones, with the exception of short acidic peptide stretches (pI < 6.0) on histones H3, H2B and H4. In order to find a high-affinity binder we performed a detailed FP analysis. In these FP assays, L3MBTL3 (3xMBT) bound very weakly to a variety of peptides, all producing high FP reading which correlates with the results on the SPOT-blot assays. However,

L3MBTL3 (3xMBT) had the highest affinity toward dimethylated marks, especially the

H4K20me2-containing peptides which bound 2-fold better than H3K4me2 and 8-fold better than

H3K9me2 (Fig. 3.4a,b). Interestingly, Northcott et al. using ChIP assays followed by end-point

PCR showed increased levels of H3K9me2 upon L3MBTL3 overexpression in vivo (Northcott,

Nakahara et al. 2009). These results taken together with our data, suggest that L3MBTL3 does not have strong selectivity for only one dimethyl histone mark.

In the SPOT-blot assays L3MBTL1 bound to virtually all the monomethyllysine- containing peptides (Fig. 3.4c). To test the L3MBTL1 sequence selectivity FP experiments were performed with eight different Kme1-containing peptides covering multiple regions within core histones. Although Kd values for monomethyllysine-containing peptides bound to L3MBTL1 had a broad range of affinity constants, visual inspection of binding curves revealed that several curves overlapped greatly and it was difficult to distinguish between some of them (Fig. 3.4d).

Further evidence for the lack of sequence preference comes from the analysis of the available 3D structures of L3MBTL1. A small number of contacts was observed between residues flanking the methyllysine and the protein. The majority of the protein-peptide contacts occurred within the methyllysine binding pocket suggesting that flanking residues contribute little to the recognition and the binding affinity (Li, Fischle et al. 2007; Min, Allali-Hassani et al. 2007).

69

a b 250 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

H2A K36me1(Kd > 1000μM) 200 A H3 K4me1(Kd > 1000μM) B H3 K9me1(Kd > 1000uM) 150 C H3 K27me1(Kd > 1000μM) D H3 K36me1(Kd > 1000μM) 100 H3 K37me1(Kd > 1000μM) E 50 F H3 K79me1(Kd > 1000μM) H4 K20me1(Kd > 1000μM) G Relative FluorescencePolarization 0 H t = 26.0 °C 0 200 400 600 800 1000 I t = 28.8 °C Protein Concentration, uM J 250

K 200

L 250 t = 29.0 °C 150 M

100 200 N H4K20-peptide O 50 Kme0 (Kd = 385 + 68uM) Kme1 (Kd = 305 + 48uM) 150 P Kme2 (Kd = 95 + 7uM) 0 Q Relative Fluorescence Polarization Units Kme3 (Kd > 2000uM) 100 R 0 200 400 600 800 1000 Protein concentration, uM S 50 H3 K4me2(Kd = 193 ± 15μM) T H3 K9me2(Kd = 356 ± 23uM) Relative Fluorescence Polarization 0 H4 K20me2(Kd = 79 ± 8μM)

0 200 400 600 800 1000 Protein concentration, uM c d 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A 300 B H3 K4me1(Kd > 1000μM) H3 K27me1(Kd > 1000μM) C 250 D H3 K37me1(Kd = 924 ± 375μM) H4 K20me1(Kd = 258 ± 37μM) E F 200 G H I 150 J K 100 L M

N 50 O t = 26.0 °C P H2A K36me1(Kd = 543 ± 151μM) Q 0 H3 K9me1(Kd = 364 ± 17μM) R H3 K36me1(Kd > 1000μM) Relative Fluorescence Polarization S H3 K79me1(Kd > 1000μM) T 0 200 400 600 800 1000 Protein Concentration, uM

Figure 3.4. L3MBTL1 and L3MBTL3 are promiscuous binders. (a). SPOT-blot assay showing L3MBTL3 binding to most of the peptides on the membrane. Highlighted in green are poly-his peptides that served as positive controls. (b). FP assays with the indicated peptides. (c). SPOT-blot assay showing L3MBTL1 binding to mono- and some dimethylated peptides on the membrane. (d). FP assay (t = 26.0 °C) measuring the interaction between L3MBTL1 and various Kme1 histone peptides. 70

3.3.4 Potential recognition of non-histone proteins

On the other extreme, the peptide arrays show that MBT domains from SCMH1,

SFMBT1 and SFMBT2 proteins bound to very few histone peptides. SCMH1 (2xMBT) protein reproducibly bound to two peptides containing distinct sequences, H2AK36me1 and H3K79me1

(Fig. 3.5a,b). However, in FP assays, the interactions were too weak for a dissociation constant to be measured. Similar findings apply to SFMBT2 (4xMBT), for which neither SPOT-blot peptide arrays nor FP assays could identify a substrate (Fig. 3.5c,d). Interestingly, SFMBT1

(4xMBT) bound not only to the methylated lysines, but also to several non-methyl modifications.

These findings prompted us to design selected peptides with dual modifications on histone H4, aa21-33. FP binding assays confirmed that a histone H4 peptide with the dual modifications

T30ph and K31me1 was the best binder among those tested (Fig. 3.5e,f). However, due to the weak affinity of this interaction it is possible that the biological substrate for SFMBT1 may be a dual phosphorylated and methylated region on a non-histone protein with similar sequences.

3.3.5 Recognition of specific histone sequences

SPOT-blot peptide array data provided evidence that not all MBTs were promiscuous binders. MBTD1 (4xMBT), L3MBTL2 (4xMBT), L3MBTL4 (3xMBT), and SCML2 (2xMBT) proteins recognized only a few specific histone sequences. The binding preference of these proteins was also tested using FP assays. MBTD1, L3MBTL2 and L3MBTL4 bound to the methylated H4K20-containing sequences with the highest affinity in FP assays (Fig. 3.6). We note that in initial SPOT-blot assays, the peptide spots corresponding to H4 residues 10-22 with

K20 methylated repeatedly failed to bind these MBTs. This false negative result on peptide arrays can be attributed to the position of Lys20 toward the end of the peptide sequence (perhaps

71

a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 b

A B H2A K36me1 200 200 C H4K20-peptide t = 26.0 °C t = 26.0 °C D Kme0 (Kd > 2000uM) 150 Kme1 (Kd > 2000uM) 150 Kme2 (Kd > 2000uM) E Kme3 (Kd > 2000uM) F G 100 100 H I 50 50 J 0 0

K Relative Fluorescence Polarization Relative Fluorescence Polarization L M H3 K79me1 0 100 200 300 400 0 100 200 300 400 Protein Concentration, uM Protein Concentration, uM N O *H3 K79me1(Kd > 1000μM) H3 K4me1(Kd > 1000μM) P *H2A K36me1(Kd > 1000μM) H3 K9me1(Kd > 1000uM) Q H3 K36me1(Kd > 1000μM) H3 K27me1(Kd > 1000μM) R H3 K37me1(Kd > 1000μM) H4 K20me1(Kd > 1000μM) S T

c 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 d

A B

C 200 D t = 26.0 °C E 150 F H2B aa41-53 H2B K46me1 H3 K79me1(Kd > 1000μM) G H2A K36me1(Kd > 1000μM) H 100 H3 K36me1(Kd > 1000μM) I H3 K37me1(Kd > 1000μM) H2B K43me1 H3 K4me1(Kd > 1000μM) J 50 K H3 K9me1(Kd > 1000uM) H3 K27me1(Kd > 1000μM) 0

L Polarization Fluorescence Relative M H3 T107ph N 0 200 400 600 800 O Protein Concentration, uM P Q R S T

1 2 3 e e e m m m 1 1 1 3 3 3 ) ) K K K e f (s (a 1 2 3 / / / 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 2 2 h e e e h h h 3 e e e p p p p m 0 m m m 0 0 0 -3 3 m m 3 1 1 1 3 3 3 A 1 2 3 3 T 3 3 3 T T T 2 2 2 K K K 4 4 4 R R R 4 4 B 4 H 4 4 4 H H H H H 4 4 H H H C H H D E F G

H 200 200 I H4K20-peptide t = 29.0 °C H4 T30ph/ K31me1 t = 29.0 °C

J Kme0 (Kd > 2000uM) (Kd = 973 ± 83μM) 150 150 K Kme1 (Kd > 2000uM) Kme2 (Kd > 2000uM) L Kme3 (Kd > 2000uM) 100 M 100 N O 50 50 P H4 unmodified

0 PolarizationRelative Fluorescence Q H4 T30ph H4 K31me1 Polarization Fluorescence Relative 0 H4 R23me2(a) R H4 K31me1 S 0 20406080100120140 0 200 400 600 T Protein Concentration, uM Protein concentration, uM

Figure 3.5. Poor recognition of histone substrates by MBTs. SPOT-blot assay showing SCMH1 (a), SFMBT2 (c), and SFMBT1 (e) binding to two or three peptides on the membrane. Highlighted in green are poly-his peptides that served as positive controls. FP assays with the indicated peptides and SCMH1 (b), SFMBT2 (d), SFMBT1 (f).

72

a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 b

A B t = 29.0 °C 140 140 C Kme0 (Kd > 2000uM) t = 29.0 °C 120 Kme1 (Kd = 31 + 5 uM) 120 D Kme2 (Kd = 48 + 12uM) Kme3 (Kd > 2000uM) 100 100 E H4K20-peptide F 80 80 G 60 60 H 40 40 I 20 20 J Relative Fluorescence Polarization Relative Fluorescence Polarization K 0 0

L 0 20406080100120140 0 20 40 60 80 100 120 140 M Protein Concentration, uM Protein concentration, uM N O H3 K79me1(Kd > 1000μM) H3 K4me1(Kd > 1000μM) H2A K36me1(Kd > 1000μM) H3 K9me1(Kd > 1000uM) P H3 K36me1(Kd > 1000μM) H3 K27me1(Kd > 1000μM) Q R H3 K37me1(Kd > 1000μM) H4 K20me1(Kd > 31 ± 5μM) S T

c 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 d

A B 300 t = 29.0 °C 300 t = 29.0 °C C D

E 200 200 F G

100 100 H H4K20-peptide I Kme0 (Kd = 332 ± 213uM) J Kme1 (Kd = 10 ± 3uM)

0 Relative Fluorescence Polarization 0 Relative Fluorescence Polarization Fluorescence Relative Kme2 (Kd = 10 ± 2uM) K Kme3 (Kd > 2000uM)

L 0 20406080100 0 20406080100 M Protein Concentration, uM Protein Concentration, uM N O H2A K36me1(Kd = 57 ± 27μM) H3 K4me1(Kd = 43 ± 14μM) H3 K9me1(Kd = 154 ± 34μM) H3 K27me1(Kd = 302 ± 84μM) P Q H3 K36me1(Kd = 102 ± 21μM) H3 K37me1(Kd = 31 ± 1μM) R H3 K79me1(Kd > 1000μM) H4 K20me1(Kd = 10 ± 3μM) S T

t = 26.0 °C e f 200 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 H2A K36me1(Kd > 1000μM) H3 K4me1(Kd > 1000μM) A 150 B H3 K9me1(Kd > 1000uM) H3 K27me1(Kd > 1000μM) C H3 K36me1(Kd > 1000μM) 100 D H3 K37me1(Kd > 1000μM) E H3 K79me1(Kd > 1000μM) 50 F H4 K20me1(Kd > 1000μM)

G Fluorescence PolarizationRelative 0 H I 0 200 400 600 800 1000 200 t = 26.0 °C Protein Concentration, uM J 200 H4 K20me2(Kd = 63.4 ± 12μM) K 150 H3 K36me2(Kd = 332.2 ± 63μM) L 150 M 100 N 50 H4K20-peptide 100 O Kme0 (Kd > 2000uM) Kme1 (Kd = 1373 + 539uM)

P Polarization Fluorescence Relative 0 Kme2 (Kd = 131 + 29uM) 50 Q Kme3 (Kd > 2000uM) t = 29.0 °C R 0 200 400 600 800 1000 H3 K4me2(Kd = 797.9 ± 351μM)

Relative Fluorescence Polarization Fluorescence Relative 0 Protein Concentration, uM S H3 K9me2(Kd = 345.6 ± 158μM)

T 0 200 400 600 800 Protein concentration, uM

Figure 3.6. Recognition of specific histone sequences. SPOT-blot assay showing MBTD1 (a), L3MBTL2 (c), and L3MBTL4 (e) binding to only several selected peptides on the membrane. Highlighted in green are poly-his peptides that served as positive controls. FP assays with the indicated peptides and MBTD1 (b), L3MBTL2 (d), L3MBTL4 (f). 73

C-terminal residues are required for recognition), and/or failure in the proper synthesis of the peptide due to the nature of the sequence.

In both peptide arrays and FP assays SCML2 (2xMBT) recognized H2AK36me1 and

H3K36me1 with the highest affinity (Fig. 3.7a,c). In order to determine the mode of interaction between the H2AK36me1 peptide and the SCML2 (2xMBT) protein, we determined the

SCML2/H2AK36me1 complex structure at 2.5 Å resolution (Table 3.4). As expected, the monomethyl-ammonium group of the modified lysine is positioned at the center of a conserved aromatic cage (comprising the residues defined in Figure 3.3a) within the second MBT domain of SCML2. The difference between the apo (PDB: 1OI1) and the bound structures are minor, with an r.m.s. difference of 0.41 Å for 661 aligned backbone atoms. As a result of the ligand binding slight adjustments occurred within the backbone of the loop spanning residues 208-213.

This loop forms a part of the binding pocket where the side chain of Phe213 (the AC6 residue of the aromatic cage) rotates to accommodate the mono-methylated lysine. Unlike in many MBT- peptide complexes, electron density for the nine amino acids within the H2AK36me1 peptide is clearly visible allowing accurate modeling of the positions of the mono-methylated lysine, six N- terminal and two C-terminal residues (Fig. 3.7b).

An interesting feature of this complex structure is the presence of the substantial contacts between the peptide and the symmetry related molecule (ex. H2AR32-SCML2N137) in the crystal lattice (Fig. 3.7b). In order to investigate whether the binding to the symmetry related molecule contributes to binding affinity we made a mutant that disrupts this contact (N137A) and measured the binding affinity by FP. Although the structure of the mutant SCML2 (N137A) protein was not perturbed since it had the same stability as the wild type protein, it showed reduced binding to the H2AK36me1 histone peptide and the Kd values could not be calculated

74

250 H2A K36me1(Kd = 102 ± 10μM) a c H3 K36me1(Kd = 159 ± 2μM) 200 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 H3 K4me1(Kd = 369 ± 91μM) A H3 K9me1(Kd = 367 ± 78μM) 150 B H2AK36me1 H3 K27me1(Kd = 265 ± 23μM) C H3 K37me1(Kd = 307 ± 17μM) 100 D H3 K79me1(Kd = 498 ± 60μM) E H4 K20me1(Kd = 240 ± 8μM) 50 F G

H Relative Fluorescence Polarization 0 I 0 200 400 600 800 1000 1200 J H3K9me1 Protein Concentration, uM K L M 200 H3K64me1 N H3K36me1 O 150 P Q R 100 S H4K59me1 T 50 wt + H2A K36me1 wt + unmodified N137A + H2A K36me1 N137A + unmodified

Relative Fluorescence Polarization Fluorescence Relative 0 D182N + H2A K36me1 D182N + unmodified b 0 200 400 600 800 1000 1200 1400 Arg Val Protein concentration, uM Arg Asn Gly Leu His

K36 me1 Leu

Figure 3.7. Structural and functional analysis of the SCML2 binding. (a). SPOT-blot assay showing binding of SCML2 (2xMBT) only to selected monomethylated histone peptides. (b). Ribbon representation of crystal structure of the SCML2/H2AK36me1 complex with electron density for the peptide shown as grey mesh. The peptide binds to two molecules, Kme1 binds to the aromatic cage of one molecule and the rest of the peptide makes interactions with the symmetry-related molecule, shown in light blue. (c). FP assay (t = 29.2 °C) looking at the interaction between SCML2 and various Kme1 histone peptides (top) and depicting lack of interaction between the H2AK36me1 peptide and the aromatic cage mutant (D182N) or protein with the mutation within a symmetry-related molecule (N137A) (bottom). 75

Table 3.4. Crystallographic data collection and refinement statistics. SCML2/H2AK36me1 Data collection Space group P43212 Cell dimensions a, b, c (Å) 70.03, 70.03, 169.83 α, β, γ (°) 90.0, 90.0, 90.0 Unique reflections 15321 Resolution (Å) 169 (2.5)* Rsym or Rmerge 0.14 (0.62) I/σI 21.7 (1.5) Completeness (%) 98.9 (91.4) Redundancy 12.4 (6.9)

Refinement Resolution (Å) 2.5 No. reflections 14545 Rwork/ Rfree 0.22/0.28 No. atoms Protein 1660 Ligand/ion 77 Water 80 B-factors (Å2) Protein 50.7 Ligand/ion 72.8 Water 50.8 R.m.s deviations Bond lengths (Å) 0.017 Bond angles (º) 1.820 *Highest resolution shell is shown in parenthesis. 76

accurately (Fig. 3.7c and Table 3.2). On the nucleosome R32LLRK36 sequence from both H2A

molecules is found at the interface with the DNA and within ~20Å from each other creating a

possibility for SCML2 (2xMBT) to recognize H2AK36me1 from one histone molecule and

H2AR32 from the other.

Additionally, we wanted to test the binding of SCML2 with more biologically relevant substrates such as unmodified or modified nucleosomes instead of histone peptides. The reconstituted recombinant nucleosome core particles (NCPs), either unmodified or with the methylated lysine analogs (Simon, Chu et al. 2007), were probed for the interaction with SCML2 on a native gel using Electrophoretic Mobility Shifts Assays (EMSAs). Unexpectedly, SCML2

(2xMBT) weakly interacted with the H2AK36me1-containing nucleosomes under the conditions tested. Full length SCML2 interacted with the unmodified NCP, however considerably stronger

interaction was observed with the H2AK36me1 modified nucleosomes (Fig. 3.8). This suggests

that the full length SCML2 is involved in a bi-modal nucleosome recognition, with one domain binding to the DNA and another module recognizing the H2AK36me1 mark.

To date there have been no reported DNA-binding activity of SCML2. Bioinformatic analysis showed that SCML2 and SCMH1 contain a basic stretch of residues adjacent to the

MBT domains. We therefore used gel-shift assays to test the ability of this region within SCML2

protein to bind to the Widom 601 DNA sequence used in our NCPs. As expected, there was no

detectable binding between the 2xMBT domains and the Widom 601 DNA oligonucleotide. On

the other hand, full length SCML2 and a construct encompassing 2xMBT with the basic stretch

of residues showed strong binding to the 601 DNA sequences (Fig. 3.9a-c). Thus, it appears that

the basic region adjacent to the MBT domains mediates DNA binding.

77

a

0 5 50 100 150 250 500 1000 1500 2500 μmol of 0 5 50 100 150 250 500 1000 1500 2500 μmol of - + + + + + + + + + protein - + + + + + + + + + protein + + + + + + + + + + NCP + + + + + + + + + + NCP MW, bp

1000 free 600 NCP 500 WT nucleosome + SCML2 (2xMBT) H2AK36Cme1 nucl. + SCML2 (2xMBT)

b

μmol of 0 5 10 20 35 50 75 100 150 0 5 10 20 35 50 75 100 150 μmol of protein - + + + + + + + + - + + + + + + + + protein NCP + + + + + + + + + + + + + + + + + + NCP MW, bp MW, bp complex

1000 1000

600 600 free 500 500 NCP WT nucleosome + SCML2 (FL) H2AK36Cme1 nucl. + SCML2 (FL)

Figure 3.8. Binding of SCML2 to nucleosomes. (a,b). Gel shift assays showing the interaction of the unmodified or H2AK36me1-containing recombinant nucleosome with the SCML2 (2xMBT) fragment (panel a) or full length protein (panel b). The constant amount 5μmol of NCP was titrated with the increasing amounts of the protein as indicated and the gel was stained with SafeView for visualization of nucleic acids.

78

a b

0 2.7 5.4 10.8 21.6 32.4 43.2 54 135 270 540 μmol of 0 2.7 5.4 10.8 18.9 27 40.5 54 81 μmol of - + + + + + + + + + + protein - + + + + + + + + protein + + + + + + + + + + + DNA +++ ++ + + + +DNA MW, bp DNA aggregates complex

500

200 free 601DNA 100 Widom 601 DNA + SCML2 (2xMBT) Widom 601 DNA + SCML2 (FL)

c

0 2.7 5.4 10.8 21.6 32.4 43.2 54 81 135 270 μmol of - + + + + + + + + + + protein + + + + + + + + + + + DNA MW, bp complex

500 free 200 601DNA 100 Widom 601 DNA + SCML2 (2xMBT+basic region)

Figure 3.9. SCML2 binds to DNA with its basic region. (a-c). Gel shift assays showing the interaction of the 147bp Widom 601 DNA with the SCML2 (2xMBT) fragment (a), full length protein (b) and construct that encompasses MBT domains with the basic region (c). The constant amount 2.7μmol of DNA was titrated with the increasing amounts of the protein as indicated and the gel was stained with SafeView for visualization of nucleic acids. 79

In order to identify the minimal region within SCML2 required for the DNA and histone

PTM recognition on the nucleosome, we tested a shorter construct covering just the MBT repeats and the basic region, encompassing residues 1 to 325 (Fig. 3.10). This region showed strong binding to the tested recombinant nucleosomes. The strongest binding was with the H2AK36me1 modified nucleosome, followed by about 5x weaker preference for the unmodified nucleosome and ~8x weaker preference for the H3K9me1-containing nucleosome. Both, the aromatic cage mutant (D182N) and the mutant within the symmetry-related molecule (N137A) showed reduced binding to the monomethylated nucleosomes (Fig. 3.10). This suggests that the DNA binding activity of SCML2 cooperates with the H2AK36me1-binding activity to increase the affinity for the nucleosome.

3.4 Discussion

This work identifies histone substrates for the human MBT-containing proteins and describes the requirements for Kme1 vs Kme2 recognition, establishing a powerful resource for future functional studies of MBT family and chromatin reader domains. In order to determine the biological significance of the histone-MBT interaction, one needs to assess function of the histone PTM itself. In many cases we observed specific recognition of the methylated H2AK36- containing histone peptides. The monomethylation on H2AK36 site has not been shown to exist in vivo yet, although acetylation and crotonylation have been identified by proteomics mass spectrometry (Zhang, Eugeni et al. 2003; Tan, Luo et al. 2011). Nonetheless, it is possible that the H2AK36me1/2 marks also exist, likely at low levels and they may be tissue specific, present at a specific time during development or cell cycle and induced by specific environmental cues.

Another possibility is that SCML2 recognizes a sequence similar to the H2AK36 peptide found

80

a

μmol of 0 0.5 1 2.5 5 7.5 10 20 30 40 60 0 0.5 1 2.5 5 7.5 10 20 30 40 60 0 0.5 1 2.5 5 7.5 10 20 30 40 60 protein - + + + + + + + + + + - + + + + + + + + + + - + + + + + + + + + + NCP + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

MW, bp complex

1000 free 600 NCP 500 WT NCP + SCML2 (1-325)wt WT NCP + SCML2 (1-325)N137A WT NCP + SCML2 (1-325)D182N b

μmol of 0 0.5 1 2.5 5 7.5 10 20 30 40 60 0 0.5 1 2.5 5 7.5 10 20 30 40 60 0 0.5 1 2.5 5 7.5 10 20 30 40 60 protein - + + + + + + + + + + - + + + + + + + + + + - + + + + + + + + + + NCP + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

MW, bp complex

1000 free 600 NCP 500 H2AK36me1 NCP + H2AK36me1 NCP + H2AK36me1 NCP + SCML2 (1-325)wt SCML2 (1-325)N137A SCML2 (1-325)D182N

c

μmol of 0 0.5 1 2.5 5 7.5 10 20 30 40 60 0 0.5 1 2.5 5 7.5 10 20 30 40 60 0 0.5 1 2.5 5 7.5 10 20 30 40 60 protein - + + + + + + + + + + - + + + + + + + + + + - + + + + + + + + + + NCP + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

MW, bp complex

1000 600 free NCP. 500 H3K9me1 NCP + H3K9me1 NCP + H3K9me1 NCP + SCML2 (1-325)wt SCML2 (1-325)N137A SCML2 (1-325)D182N

Figure 3.10. SCML2 interacts specifically with the H2AK36me1-containing nucleosomes. (a-c). Gel shift assays showing the interaction of the SCML2 (2xMBT+basic region) protein and its mutant with the unmodified (panel A), H2AK36me1 (panel b) or H3K9me1-containing (panel c) recombinant nucleosome core particles (NCPs). The constant amount 5μmol of NCP was titrated with the increasing amounts of the protein as indicated and the gels were stained with SafeView for visualization of nucleic acids. Asterisk denotes minimal amount required for complex formation. 81

(RVHRLLRK36GNYAERV) found elsewhere in the genome. In fact, similar sequences are found within chromatin-associated proteins like NSD1 (KGNYA motif) and MYST4 (KRxxR).

There are numerous examples where “reader” modules recognize non-histone proteins.

Some examples include the MBT domains of L3MBTL1 that recognize monomethylated K382 on the C-terminus of p53 (West, Roy et al. 2010); the chromodomain of HP1 recognizes G9a

methyltransferase (Chin, Esteve et al. 2007; Sampath, Marazzi et al. 2007); and the

chromodomain of Cbx7 is proposed to recognize SETDB1 (Kaustov, Quyang et al. 2010).

Interestingly, the MBT domains within SCMH1 and SFMBT2 do not recognize any histone

substrates, while SFMBT1 recognizes a dual modification of H4T30ph and K31me1. The latter

modification has been identified in cells (Garcia, Busby et al. 2005; Beck, Nielsen et al. 2006)

where it plays a role in telomeric and ribosomal DNA silencing (Hyland, Cosgrove et al. 2005).

Nevertheless the question remains whether this combination is biologically relevant and the role of SFMBT1 in cells. Overall, it appears that these MBTs bind to methylated substrates on non- histone proteins and some of them are even capable of recognizing dual modifications.

The MBT domains do not exist in isolation and the presence of additional domains within

the MBT-containing proteins may contribute to their functional versatility. In addition to the

MBT domains all MBT family proteins, except L3MBTL2 and MBTD1, contain SAM domains

shown to be involved in homo-, heterodimerization and polymerization (Tomotsune, Takihara et

al. 1999; Boccuni, MacGrogan et al. 2003; Knight, Leettola et al. 2011), and could contribute to

the incorporation of MBT proteins within larger chromatin remodeling complexes. Several MBT

proteins also contain different types of zinc fingers (FCS and C2HC), functions of which are not

known (Lechtenberg, Allen et al. 2009; Wang, Ilangovan et al. 2011). We showed here that

SCML2 contains a basic region that is required for the efficient binding to the nucleosomes. A 82

similar sequence is also found within SCMH1 at a similar distance from the MBT domains, and may also act as a DNA-binding domain.

Proteins containing epigenetic reader modules have been implicated in the development of many diseases (Baker, Allis et al. 2008; Bonasio, Lecona et al. 2010). Thus, targeting lysine

“reader” modules and enzymes that methylate (KMTs) or demethylate (KMDTs) lysine residues are attractive strategies for treatment of various disorders (Daigle, Olhava et al. 2011; Vedadi,

Barsyte-Lovejoy et al. 2011). The recent demonstration of anti-inflammatory and antitumor activity of acetyllysine competitive inhibitors of Bromo domains is remarkable proof of concept for targeting epigenetic “reader” domains in disease (Filippakopoulos, Qi et al. 2010; Delmore,

Issa et al. 2011; Zuber, Shi et al. 2011). Importantly, Frye and colleagues have recently reported methyllysine competitive antagonists of the MBT domains of L3MBTL1, paving the way for future development of modulators MBT domain-protein interactions (Herold, Wigle et al. 2010).

Since some MBT domains are highly conserved, one of the challenges is to design inhibitors that are selective for a particular MBT module. Our extensive mutational data provides the basis for understanding the differential recognition between Kme1 and Kme2, and shows that the MBT binding pockets have variability as well as a small degree of substrate preference. We identified two residues that at a first glance appear to be distant from the methylated lysine (SR1 and SR2), but they play a critical role in distinguishing between the Kme1 vs Kme2 states. However, our mutagenesis data also revealed a complex interrelationship among the residues in the active site.

It may be possible to exploit these observations to develop selective, or pan-MBT inhibitors.

These reagents will also enable us to isolate the role of MBT domains from the rest of the structural motifs found within these proteins. Overall, our work provides a foundation for the 83

rational design of potent and selective MBT inhibitors which will further our understanding of the role of epigenetic reader modules in health and disease.

3.5 Methods

Cloning, protein expression, and purification

The cDNA encoding human MBT domains was cloned into a modified pET28a-MHL

bacterial expression vector encoding an N-terminal hexahistidine fusion tag with a TEV protease

cleavage site. The constructs used for each protein are indicated below.

Protein NCBI Accession Construct domains Construct boundaries, aa

SCMH1 Q96GD3 2xMBT 27-238 SCML2 Q9UQR0 2xMBT 29-243 SCML2 Q9UQR0 2xMBT + AT-hook like 1-325 SCML2 Q9UQR0 FL (full length) 1-700 L3MBTL1 NP_056293 3xMBT 200-522 L3MBTL2 Q969R5 4xMBT 170-625 L3MBTL3 Q96JM7 3xMBT 225-555 L3MBTL4 Q8NA19 3xMBT 44-371 SFMBT1 Q9UHJ3 4xMBT 16-455 SFMBT2 Q5VUG0 4xMBT 38-484 MBTD1 Q05BQ5 4xMBT 130-566

Mutated cDNAs were made by using QuikChange II XL Site-Directed Mutagenesis Kit

(Stratagene); mutations were confirmed by sequencing complete cDNAs. The proteins were

expressed in E. coli BL21 (DE3) grown in LB in the presence of 50 µg/ml of kanamycin, and

induced at OD600 0.8-1.0 with 1.0 mM isopropyl-1-thio-D-galactopyranoside at 15°C for 16 hrs.

The proteins were purified using standard affinity and gel filtration chromatography. Briefly, the cell pellet from a 2 L culture was re-suspended in 40 ml lysis buffer (for SCMH1 and L3MBTL2 the buffers contained 25 mM NaPi, pH 6.2, for all others it consisted of 20 mM Tris-HCl, pH 84

8.0, supplemented with 500 mM NaCl, 0.1% Triton X-100, 15% glycerol, 5 mM β-

mercaptoethanol, 2 mM benzamidine, 0.5 mM phenylmethyl sulfonyl fluoride (PMSF) and

DNaseI). The cells were sonicated and centrifuged at 15,000 rpm for 30 min. The clarified cell

lysate was applied to 3-4 ml TALON metal affinity resin (Clontech). The beads were washed

with 40 ml Wash buffer A (25 mM NaPi, pH 6.2 or 20 mM Tris-HCl, pH 8.0 with 500 mM

NaCl, 5 mM imidazole, 5 mM β-mercaptoethanol, 2mM benzamidine, 0.5 mM PMSF) and 40

ml Wash buffer B (Wash buffer A with 10mM imidazole). The protein was eluted with 10-15 ml

of Elution buffer (25 mM NaPi, pH 6.2 or 20 mM Tris-HCl, pH 8.0 with 500 mM NaCl, 500 mM

imidazole, 5 mM β-mercaptoethanol, 2mM benzamidine, 0.5 mM PMSF). For crystallographic

studies the N-terminal His-tag was removed by overnight incubation with TEV protease at 4ºC.

For binding assays we did not observe any interference from the tag and it was left uncut. All

proteins were further purified by gel filtration on a HighLoad 26/60 Superdex 75 column (GE

Healthcare) equilibrated with either 25 mM NaPi, pH 6.2 or 20 mM Tris-HCl, pH 8.0

supplemented with 250 mM NaCl, 1mM benzamidine, 0.5 mM PMSF, 5 mM β-

mercaptoethanol, 1 mM tris(2-carboxyethyl)phosphine (TCEP), 2 mM dithiothreitol (DTT).

Proteins were concentrated to ~150μM for L3MBTL2 and MBTD1, ~400μM for SCMH1 and between 800-1100μM for all other wild-type and mutant proteins. Recombinant proteins were

stored at 4ºC for 2 – 3 weeks and at -80ºC for longer periods. Most MBT domains precipitate at

low salt concentrations, thus in order to make meaningful comparisons across the entire family,

all binding assays and structure determination buffers contained at least 250 mM NaCl.

85

Crystallization, data collection, structure solution and refinement

The solution of 35 mg/ml SCML2 (aa 29-243) was incubated for 30 min. with

H2AK36me1 (VHRLLRKme1GNYAERVGA) peptide in a 1:5 molar ratio prior to crystallization. Crystals of the complex were grown using the hanging drop vapor diffusion method at 18 °C by mixing 1 μl of the protein-peptide solution with 1 μl of precipitant solution containing 1.6M NH4PO4 and 0.1M Tris·HCl, pH 9.5. The crystals were harvested straight from the drop, frozen and stored in liquid nitrogen. Data from crystals of SCML2 and H2AK36me1 were collected at SBC-CAT 19ID at the Advanced Photon Source and processed using the

HKL2000 program suite (Otwinowski and Minor 1997). Data from all crystals was collected at

100 K. The complex structure has been solved by molecular replacement in Phaser (McCoy,

Grosse-Kunstleve et al. 2007) using the SCML2 2xMBT structure (1OI1; (Sathyamurthy, Allen et al. 2003)) as a search model. Manual model building was carried out using the graphics program Coot (Emsley and Cowtan 2004). Refinement was carried out using the CCP4 program

REFMAC (Murshudov, Vagin et al. 1997). In the later stages of refinement, TLS and restrained refinement was carried out, with the initial TLS parameters obtained from the TLSMD webserver (Painter and Merritt 2006). The MolProbity Ramachandran plot showed that 97.2% of the residues were in the most favoured region, while the rest were in the allowed region.

Histone peptide SPOT-blot peptide array screen

Peptides were synthesized directly on a modified cellulose membrane with a polyethylglycol linker using the peptide synthesizer MultiPep (Intavis). The binding reaction was initially performed using a library of 26 control peptides (4 positive and 22 negative controls) and 561 membrane-immobilized peptides corresponding to 10 – 15 residue-long stretches of 86

histones H2A, H2B, H3 and H4 sequences with either non-modified or variously modified arginine, lysine, serine and threonine residues (one modified residue per peptide); the designed membrane is exactly as described previously (Nady, Min et al. 2008). The subsequent peptide libraries were designed specifically to test the binding preference of SFMBT1 (4xMBT) to peptides with dual modifications.

Fluorescence-polarization binding assays

FP assays to check the preference of MBT domains to mono- and dimethyllysine states were done using H3K36me1/2 peptides for SCML2 wild type and mutant proteins, and

H4K20me1/2 peptides for L3MBTL1 and L3MBTL3 wild type and mutant proteins. For all other FP studies, peptides used are indicated in the figures with the following sequences,

Peptide Sequence H2AK36me1 VHRLLRKme1GNYSERVG H3K4me ART[Kme1/2]QTARKST H3K9me ARTKQTAR[Kme1/2]STGGKA H3K27me1 QLATKAARKme1SAPATG GGV[Kme0/1/2/3]KPHRY H3K37me1 GGVKKme1PHRY H3K79me1 VREIAQDFKme1TDLRFQ GGAKRHR[Kme0/1/2/3]VLRDNIQ H4 unmodified HRKVLRDNIQGITKPAI H4 R23me2(a) HRKVLRme2(a)DNIQGITKPAI H4 K31me1 HRKVLRDNIQGITKme1PAI H4 T30ph/K31me1 HRKVLRDNIQGITphKme1PAI

All peptides were synthesized, N-terminally labeled with fluorescein and purified by

Tufts University Core Services (Boston, MA, USA). Binding assays were performed in 10 µl volume at a constant labeled peptide concentration of 40 nM, and the protein was used at concentrations at saturation ranging between 800 and 1300 µM in buffer containing either

1.3mM NaH2PO4/23.7 mM Na2HPO4 or 20 mM Tris pH 8.0 with 250 mM NaCl, 1 mM DTT, 1 mM TCEP, 5mM β-mercaptoethanol, 1mM benzamidine, 1 mM P MSF, 0.01% Tween-20. FP 87

assays were performed in 384 well plates using Synergy 2 microplate reader (BioTek, Vermont,

USA). The excitation wavelength of 485nm and the emission wavelength of 528nm were used.

In the earlier experiments the temperature used was ~26±0.2 °C, and later it was increased above ambient to ~29±0.2 °C. The Kd values varied with temperature fluctuations, however, all experiments where we make direct Kd comparisons were performed at the same time under identical conditions. The data was corrected for background of the free labeled peptides. To determine Kd values, the data were fit to a hyperbolic function using Sigma Plot software (Systat

Software, Inc., CA, USA). Kd values were not calculated and indicated as Kd > 2000 μM for curves that did not exceed a reading of 50 relative fluorescence polarization units. The Kd values represent averages ± standard error for at least three independent experiments.

Thermostability studies

The thermostabilities of SCML2 (2xMBT), L3MBTL1 (3xMBT), L3MBTL3 (3xMBT) and their mutant proteins were studied using differential static light scattering (Senisterra,

Markin et al. 2006; Vedadi, Niesen et al. 2006), monitoring protein stability by its aggregation properties. Protein samples at 0.2 mg/ml were heated from 25 to 80°C at a rate of 1°C/min in clear bottom 384-well plates (Nunc, Rochester, NY) in 50 μl of buffer (50 mM Hepes, 150 mM

NaCl, 1mM TCEP, pH 7.5). For these scans 10 μl of mineral oil (Sigma, St. Louis, MO) was layered on top of the protein solution to prevent evaporation. Protein aggregation was measured by recording the scattered light using a CCD camera and taking images of the plate every 0.5°C.

The pixels intensities in a preselected region of each well were integrated using proprietary software to generate a value representative of the total amount of scattered light in that region.

These total intensities were then plotted against temperature for each sample well and fit to the 88

Boltzman equation by nonlinear regression. The resulting point of inflection of each resulting

curve was defined as the Tagg.

The thermostabilities of SCML2 (1-325) and its mutant proteins were studied using

differential scanning fluorimetry (Vedadi, Niesen et al. 2006; Senisterra and Finerty 2009). A

real-time PCR device (iCycler from Bio-Rad, Hercules, CA) was used to monitor protein

unfolding by the increase in the fluorescence of the fluorophore SYPRO Orange (Invitrogen,

Carlsbad, CA). Protein samples (50-200 μg/ml) in 50 mM Hepes buffer (pH 7.5) containing 150 mM NaCl, 1mM TCEP were prepared in a reaction volume of 20 μl and incubated in 96-well

PCR microplates (ABGene, Surrey, U.K.). Optical foil was used to cover the plates in the

FluoDia T70 fluorescence plate reader (Photon Technology Int., Birmingham, UK). The samples were heated at 1°C per min, from 25°C to 95°C. The fluorescence intensity was measured every

1°C. Fluorescence intensities were plotted as a function of temperature by using the same, internally developed software package as was used for the static light scattering data.

Preparation of nucleosome core particles with methylated histones

Unmodified and methylated Xenopus laevis (H3 and H4) and human histones (H2A and

H2B) were prepared as described previously (Dyer, Edayathumangalam et al. 2004; Simon, Chu et al. 2007) and incorporation of the methyl moiety was confirmed by mass spectrometry.

Histone octamers containing unmodified, H2AK36Cme1 or H3K9Cme1 protein were assembled first followed by the nucleosome core particles (NCP) assembly. NCPs were assembled with the

146 bp of Widom 601 strong positioning DNA sequence (Lowary and Widom 1998) and their correspondent histone octamers using a salt dialysis method and analyzed on 5% polyacrylamide gel as described previously (Dyer, Edayathumangalam et al. 2004). The DNA was amplified 89

using PCR (5'- forward primer: 5'-TGG AGA ATC CCG GTG CCG AGG-3'; 3’- reverse primer:

5'-CAC AGG ATG TAT ATA TCT GAC AC-3'). The Widom 601 DNA sequence is:

5’- TGG AGA ATC CCG GTG CCG AGG CCG CTC AAT TGG TCG TAG ACA GCT CTA 3’- ACC TCT TAG GGC CAC GGC TCC GGC GAG TTA ACC AGC ATC TGT CGA GAT

GCA CCG CTT AAA CGC ACG TAC GCG CTG TCC CCC GCG TTT TAA CCG CCA AGG CGT GGC GAA TTT GCG TGC ATG CGC GTC AGG GGG CGC AAA ATT GGC GGT TCC

GGA TTA CTC CCT AGT CTC AAG GCA CGT GTC AGA TAT ATA CAT CCT GTG-3’ CCT AAT GAG GGA TCA GAG TTC CGT GCA CAG TCT ATA TAT GTA GGA CAC-5’

Electromobility shift assays

Various concentrations of SCML2 (2xMBT; 1-325; full length) protein samples and their

mutants were incubated with 2.7μmol (0.25μg) of Widom 601 DNA or 5μmol (1μg) of

unmodified, H2AK36Cme1 or H3K9me1 NCP. Binding was performed in reaction buffer

containing 20 mM Tris, pH 7.5, 100-150 mM KCl, 1mM EDTA, 1mM DTT. The samples were

incubated at room temperature for 30 minutes and were run at 150 V on a pre-electrophoresed

5% polyacrylamide gel (59:1) for 601 DNA or 7% polyacrylamide gel (59:1) for NCP. The

running buffer contained 0.25x TBE and the gels with 601 DNA and NCP were run at 4ºC for one or two hours, respectively. The gels were stained with SafeView (Applied Biological

Materials) to image DNA or Coomassie Blue to visualize protein. The gels stained with

SafeView were inverted for better contrast.

3.6 References

Adams-Cioaba, M. A. and J. Min (2009). "Structure and function of histone methylation binding proteins." Biochem Cell Biol 87(1): 93-105. Addou-Klouche, L., J. Adelaide, et al. (2010). "Loss, mutation and deregulation of L3MBTL4 in breast cancers." Mol Cancer 9: 213. Arai, S. and T. Miyazaki (2005). "Impaired maturation of myeloid progenitors in mice lacking novel Polycomb group protein MBT-1." Embo J 24(10): 1863-73. Baker, L. A., C. D. Allis, et al. (2008). "PHD fingers in human diseases: disorders arising from misinterpreting epigenetic marks." Mutat Res 647(1-2): 3-12. 90

Beck, H. C., E. C. Nielsen, et al. (2006). "Quantitative proteomic analysis of post-translational modifications of human histones." Mol Cell Proteomics 5(7): 1314-25. Boccuni, P., D. MacGrogan, et al. (2003). "The human L(3)MBT polycomb group protein is a transcriptional repressor and interacts physically and functionally with TEL (ETV6)." J Biol Chem 278(17): 15412-20. Bonasio, R., E. Lecona, et al. (2010). "MBT domain proteins in development and disease." Semin Cell Dev Biol 21(2): 221-30. Chin, H. G., P. O. Esteve, et al. (2007). "Automethylation of G9a and its implication in wider substrate specificity and HP1 binding." Nucleic Acids Res 35(21): 7313-23. Daigle, S. R., E. J. Olhava, et al. (2011). "Selective killing of mixed lineage leukemia cells by a potent small-molecule DOT1L inhibitor." Cancer Cell 20(1): 53-65. Delmore, J. E., G. C. Issa, et al. (2011). "BET Bromodomain Inhibition as a Therapeutic Strategy to Target c-Myc." Cell 146(6): 904-17. Dyer, P. N., R. S. Edayathumangalam, et al. (2004). "Reconstitution of nucleosome core particles from recombinant histones and DNA." Methods Enzymol 375: 23-44. Emsley, P. and K. Cowtan (2004). "Coot: model-building tools for molecular graphics." Acta Crystallogr D Biol Crystallogr 60(Pt 12 Pt 1): 2126-32. Eryilmaz, J., P. Pan, et al. (2009). "Structural studies of a four-MBT repeat protein MBTD1." PLoS One 4(10): e7274. Filippakopoulos, P., J. Qi, et al. (2010). "Selective inhibition of BET bromodomains." Nature 468(7327): 1067-73. Gao, C., J. M. Herold, et al. (2011). "Biophysical probes reveal a "compromise" nature of the methyl-lysine binding pocket in L3MBTL1." J Am Chem Soc 133(14): 5357-62. Garcia, B. A., S. A. Busby, et al. (2005). "Resetting the epigenetic histone code in the MRL- lpr/lpr mouse model of lupus by inhibition." J Proteome Res 4(6): 2032-42. Grimm, C., A. G. de Ayala Alonso, et al. (2007). "Structural and functional analyses of methyl- lysine binding by the malignant brain tumour repeat protein Sex comb on midleg." EMBO Rep 8(11): 1031-7. Grimm, C., R. Matos, et al. (2009). "Molecular recognition of histone lysine methylation by the Polycomb group repressor dSfmbt." Embo J 28(13): 1965-77. Guo, Y., N. Nady, et al. (2009). "Methylation-state-specific recognition of histones by the MBT repeat protein L3MBTL2." Nucleic Acids Res 37(7): 2204-10. Gurvich, N., F. Perna, et al. (2010). "L3MBTL1 polycomb protein, a candidate tumor suppressor in del(20q12) myeloid disorders, is essential for genome stability." Proc Natl Acad Sci U S A 107(52): 22552-7. Herold, J. M., T. J. Wigle, et al. (2010). "Small-molecule ligands of methyl-lysine binding proteins." J Med Chem 54(7): 2504-11. Honda, H., K. Takubo, et al. (2011). "Hemp, an mbt domain-containing protein, plays essential roles in hematopoietic stem cell function and skeletal formation." Proc Natl Acad Sci U S A 108(6): 2468-73. Hyland, E. M., M. S. Cosgrove, et al. (2005). "Insights into the role of histone H3 and histone H4 core modifiable residues in Saccharomyces cerevisiae." Mol Cell Biol 25(22): 10060-70. Kato, T., H. Sato, et al. (2011). "Segmental copy number loss of SFMBT1 gene in elderly individuals with ventriculomegaly: a community-based study." Intern Med 50(4): 297- 303. 91

Kaustov, L., H. Quyang, et al. (2010). "Recognition and specificity determinants of the human Cbx chromodomains." J Biol Chem. Knight, M. J., C. Leettola, et al. (2011). "A human sterile alpha motif domain polymerizome." Protein Sci. Lechtenberg, B. C., M. D. Allen, et al. (2009). "Solution structure of the FCS zinc finger domain of the human polycomb group protein L(3)mbt-like 2." Protein Sci 18(3): 657-61. Li, H., W. Fischle, et al. (2007). "Structural basis for lower lysine methylation state-specific readout by MBT repeats of L3MBTL1 and an engineered PHD finger." Mol Cell 28(4): 677-91. Lowary, P. T. and J. Widom (1998). "New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning." J Mol Biol 276(1): 19- 42. McCoy, A. J., R. W. Grosse-Kunstleve, et al. (2007). "Phaser crystallographic software." J. Appl. Cryst. 40: 658-74. Min, J., A. Allali-Hassani, et al. (2007). "L3MBTL1 recognition of mono- and dimethylated histones." Nat Struct Mol Biol 14(12): 1229-30. Murshudov, G. N., A. A. Vagin, et al. (1997). "Refinement of macromolecular structures by the maximum-likelihood method." Acta Crystallogr D Biol Crystallogr 53(Pt 3): 240-55. Nady, N., J. Min, et al. (2008). "A SPOT on the chromatin landscape? Histone peptide arrays as a tool for epigenetic research." Trends Biochem Sci 33(7): 305-13. Northcott, P. A., Y. Nakahara, et al. (2009). "Multiple recurrent genetic events converge on control of histone lysine methylation in medulloblastoma." Nat Genet 41(4): 465-72. Ogawa, H., K. Ishiguro, et al. (2002). "A complex with chromatin modifiers that occupies E2F- and Myc-responsive genes in G0 cells." Science 296(5570): 1132-6. Otwinowski, Z. and W. Minor (1997). "Processing of X-ray diffraction data collected in oscillation mode." Macromolecular Crystallography, Pt A 276: 307-326. Painter, J. and E. A. Merritt (2006). "TLSMD web server for the generation of multi-group TLS models." J. Appl. Crystallogr. 39(1). Perna, F., N. Gurvich, et al. (2010). "Depletion of L3MBTL1 promotes the erythroid differentiation of human hematopoietic progenitor cells: possible role in 20q- polycythemia vera." Blood 116(15): 2812-21. Sampath, S. C., I. Marazzi, et al. (2007). "Methylation of a histone mimic within the histone methyltransferase G9a regulates protein complex assembly." Mol Cell 27(4): 596-608. Santiveri, C. M., B. C. Lechtenberg, et al. (2008). "The malignant brain tumor repeats of human SCML2 bind to peptides containing monomethylated lysine." J Mol Biol 382(5): 1107- 12. Sathyamurthy, A., M. D. Allen, et al. (2003). "Crystal structure of the malignant brain tumor (MBT) repeats in Sex Comb on Midleg-like 2 (SCML2)." J Biol Chem 278(47): 46968- 73. Senisterra, G. A. and P. J. Finerty, Jr. (2009). "High throughput methods of assessing protein stability and aggregation." Mol Biosyst 5(3): 217-23. Senisterra, G. A., E. Markin, et al. (2006). "Screening for ligands using a generic and high- throughput light-scattering-based assay." J Biomol Screen 11(8): 940-8. Simon, M. D., F. Chu, et al. (2007). "The site-specific installation of methyl-lysine analogs into recombinant histones." Cell 128(5): 1003-12. 92

Takada, Y., K. Isono, et al. (2007). "Mammalian Polycomb Scmh1 mediates exclusion of Polycomb complexes from the XY body in the pachytene spermatocytes." Development 134(3): 579-90. Tan, M., H. Luo, et al. (2011). "Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification." Cell 146(6): 1016-28. Taverna, S. D., H. Li, et al. (2007). "How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers." Nat Struct Mol Biol 14(11): 1025-40. Tomotsune, D., Y. Takihara, et al. (1999). "A novel member of murine Polycomb-group proteins, Sex comb on midleg homolog protein, is highly conserved, and interacts with RAE28/mph1 in vitro." Differentiation 65(4): 229-39. Trojer, P., A. R. Cao, et al. (2011). "L3MBTL2 protein acts in concert with PcG protein- mediated monoubiquitination of H2A to establish a repressive chromatin structure." Mol Cell 42(4): 438-50. Vedadi, M., D. Barsyte-Lovejoy, et al. (2011). "A chemical probe selectively inhibits G9a and GLP methyltransferase activity in cells." Nat Chem Biol 7(8): 566-74. Vedadi, M., F. H. Niesen, et al. (2006). "Chemical screening methods to identify ligands that promote protein stability, protein crystallization, and structure determination." Proc Natl Acad Sci U S A 103(43): 15835-40. Wang, R., U. Ilangovan, et al. (2011). "Identification of nucleic acid binding residues in the FCS domain of the polycomb group protein polyhomeotic." Biochemistry 50(22): 4998-5007. West, L. E., S. Roy, et al. (2010). "The MBT repeats of L3MBTL1 link SET8-mediated p53 methylation at lysine 382 to target gene repression." J Biol Chem 285(48): 37725-32. Zhang, L., E. E. Eugeni, et al. (2003). "Identification of novel histone post-translational modifications by peptide mass fingerprinting." Chromosoma 112(2): 77-86. Zuber, J., J. Shi, et al. (2011). "RNAi screen identifies Brd4 as a therapeutic target in acute myeloid leukaemia." Nature.

93

Chapter 4

Recognition of multivalent histone states associated with heterochromatin by Tandem Tudor Domains of UHRF1

The work presented here was performed in collaboration with the groups of Sirano Dhe-Paganon and Fred Chédin. The conserved region within UHRF1 and the project were originally conceived by Sirano Dhe-Paganon; the crystal structures were obtained and solved in his group. Cell-based assays were performed by Michael S Kareta in Chédin laboratory. Alexander Lemak taught and overlooked the NMR data processing, analysis and structure calculation. All experiments involving binding assays with peptide arrays, dual specificity interactions, mutational data and NMR experiments were designed, performed and analyzed by the author.

The work partially described herein is published in: Nataliya Nady, Alexander Lemak, John R. Walker, George V. Avvakumov, Michael S. Kareta, Mayada Achour, Sheng Xue, Shili Duan, Abdellah Allali-Hassani, Xiaobing Zuo, Yun-Xing Wang, Christian Bronner, Frédéric Chédin, Cheryl H. Arrowsmith and Sirano Dhe-Paganon. Recognition of multivalent histone states associated with heterochromatin by UHRF1 protein. J Biol Chem. 2011 Jul 8;286(27):24300-11. 94

4.1 Summary

Histone modifications and DNA methylation represent two layers of heritable epigenetic

information that regulate eukaryotic chromatin structure and gene activity. UHRF1 is a unique

factor that bridges these two layers; it is required for maintenance DNA methylation at hemi- methylated CpG sites which are specifically recognized through its SRA domain and also interacts with histone H3 trimethylated on lysine 9 (H3K9me3) in an unspecified manner. This chapter describes a novel Tandem Tudor Domain (TTD) within UHRF1 protein that recognizes

H3 tail peptides with the heterochromatin-associated modification state of trimethylated lysine 9 and unmodified lysine 4 (H3K4me0/K9me3). Solution NMR and crystallographic data reveal the

TTD simultaneously recognizes H3K9me3 through a conserved aromatic cage in the first Tudor subdomain, and unmodified H3K4 within a groove between the tandem subdomains. The subdomains undergo a conformational adjustment upon peptide binding, distinct from previously reported mechanisms for dual histone mark recognition. Mutant UHRF1 protein deficient for

H3K4me0/K9me3 binding shows altered localization to heterochromatic chromocenters. These results demonstrate a novel recognition mechanism for the combinatorial readout of histone modification states associated with gene silencing, and add to the growing evidence for

coordination of, and cross-talk between the modification states of H3K4 and H3K9 in regulation

of gene expression.

4.2 Introduction

Histone modifications and DNA methylation represent two layers of heritable epigenetic

information that regulate chromatin structure and gene activity in eukaryotic organisms.

Methylated DNA sequences are generally associated with long term transcriptional silencing 95

through the recruitment of repressor complexes including methyl-binding proteins, histone

deacetylases, and chromatin remodeling machinery (Klose and Bird 2006; Denslow and Wade

2007). Likewise, specific histone methylation states can recruit multivalent adaptor proteins,

which lead to chromatin condensation, further inhibiting gene expression. Accumulating

evidence shows that these two methylation systems act cooperatively to establish the epigenetic

state of the cell (Ooi, Qiu et al. 2007; Meissner, Mikkelsen et al. 2008; Bartke, Vermeulen et al.

2010), however, the mechanisms of this cooperation remain vague.

During replication, CpG methylation patterns are maintained in mammals by the DNA

methyltransferase 1 (DNMT1) with hemi-methylated CpG dinucleotides serving as a substrate.

This enzyme is aided by UHRF1 (Ubiquitin-like, PHD and RING Finger containing 1, also known as ICBP90 in humans and NP95 in mouse), which interacts with DNMT1 and specifically recognizes hemi-methylated CpG dinucleotides through its SRA domain (Bostick, Kim et al.

2007; Sharif, Muto et al. 2007). UHRF1 has also been implicated in histone methylation- associated activities related to pericentric heterochromatin (Uemura, Kubo et al. 2000; Miura,

Watanabe et al. 2001; Bostick, Kim et al. 2007; Papait, Pistore et al. 2007; Sharif, Muto et al.

2007; Karagianni, Amazit et al. 2008; Papait, Pistore et al. 2008). For example, UHRF1 is found in a complex with both methylated histones (Citterio, Papait et al. 2004; Karagianni, Amazit et al. 2008) and histone-modifying enzymes such as HDAC1 and KMT1C/G9a (Unoki, Nishidate et al. 2004; Kim, Esteve et al. 2009). UHRF1 also interacts with H3K9me3-containing nucleosomes and this interaction is potentiated by DNA methylation (Bartke, Vermeulen et al.

2010). Thus, it is not surprising that UHRF1 deficiency leads not only to decreased levels of

DNA methylation (Bostick, Kim et al. 2007; Sharif, Muto et al. 2007), but also to impaired maintenance of heterochromatin structure (Woo, Pontes et al. 2007; Karagianni, Amazit et al. 96

UBL TTDN TTDC PHD SRA RING

TTDN TTDC β1 β2 β3 β4 β5 β1’ β2’ β3’ β4’ β5’

Figure 4.1. A novel evolutionary conserved domain within UHRF1 corresponds to TTD. Domain composition of UHRF1 that includes UBL (ubiquitin-like), TTD (Tandem Tudor Domain; TTDN, N- terminal Tudor subdomain; TTDC , C-terminal Tudor subdomain), PHD (Plant Homeobox Domain), SRA (SET and RING associated domain), and RING (Real Interesting New Gene). The human UHRF1 protein sequence (aa 119-298) is aligned with that of other species (bt, Bos taurus; cf, Canis familiaris; dr, Danio rerio; fc, Felis catus; gg, Gallus gallus; hs2, Homo sapiens; md, Monodelphis domestica; mm, Mus musculus; pt, Pan troglodytes; rn, Rattus norvegicus; xl, Xenopus laevis); the lower-case sequences are flexible regions in the liganded X-ray structure. The residue numbering corresponds to the human UHRF1 sequence. Drawn above the number line are secondary structure elements and their labels. Residues that are fully conserved (cons), that undergo medium and strong (uppercase) and weak (lowercase) chemical shifts with addition of K4me0/K9me3-containing H3 peptide (NMR) and those that form the K9me3 aromatic cage and K4me0 cage (cage) are shown below. 97

2008; Papait, Pistore et al. 2008) and increased transcription of major satellites, regions that make up the bulk of pericentric heterochromatin.

UHRF1 has significant affinity for H3K9me3 (Citterio, Papait et al. 2004; Karagianni,

Amazit et al. 2008), but the mechanism of this interaction remains unclear. In addition to the

SRA domain, UHRF1 contains other conserved domains (Fig. 4.1), including a ubiquitin-like domain (UBL), a RING E3 ligase domain, a plant homeobox domain (PHD) that has been implicated in the UHRF1 binding to DNMT1 (Bostick, Kim et al. 2007), H3K9me3 (Karagianni,

Amazit et al. 2008), and most recently to unmodified H3R2 (Hu, Li et al. 2011; Rajakumara,

Wang et al. 2011; Wang, Shen et al. 2011). Here we provide biochemical and cell-based evidence for the mechanism of the UHRF1 binding to histone H3 in which Lys9 is trimethylated and Lys4 is unmodified or monomethylated. Furthermore, the structural analysis revealed a novel mode of interaction enabling combinatorial readout of a multivalent state within a single

H3 tail.

4.3 Results

4.3.1 UHRF1 contains a Tandem Tudor Domain that binds H3K9me3 in vitro

In order to understand the mechanism by which UHRF1 recognizes H3K9me3, we screened SPOT-blot histone peptide arrays (Nady, Min et al. 2008) with recombinant UHRF1 and its domains (Fig. 4.2a). In agreement with previous studies (Citterio, Papait et al. 2004;

Karagianni, Amazit et al. 2008), full length UHRF1 bound to H3K9me3-containing peptides.

However, in contrast to earlier reports (Karagianni, Amazit et al. 2008), the PHD, SRA, or tandem PHD-SRA domains did not interact. Recombinant protein encompassing the region between residues 121 and 286 showed robust binding to methylated H3K9me peptides in SPOT-

98

a b H3 - ARTKQTARK STGG 1.2 9 Kd(μM) H3 - ARTKQTARK9me1STGG H3 - ARTKQTARK me2STGG 1.0 aa 121-286 18 ± 5 9 PHD + SRA >1000 H3 - ARTKQTARK9me3STGG 0.8 SRA >1000 FL A A SR PHD SR -286 - 0.6 121 PHD d 0.4

Fraction bound 0.2 α1 β4' 0.0 β4 β2' TTDC NH2 β1' β3 β3' -0.2 β2 β5 0.1 1 10 100 1000 UHRF1 region (μM) TTDN β1

α1' c

COOH 1.0 Kd(μM) e H3K9 >1000 0.8 H3K9me1 90 ± 13 H3K9me2 78 ± 19 Y191 H3K9me3 25 ± 7 F152 0.6

0.4

D145 Fraction bound 0.2

0.0

Y188 N194 0.1 1 10 100 1000 aa121-286 (μM)

Figure 4.2. A novel TTD domain within UHRF1 binds H3 histone tail with H3K9me3. (a). SPOT-blot peptides corresponding to the amino-terminal tail of histone H3 with or without modification at the Lys9 position (1-mono; 2-di; 3-trimethylation) were probed for binding with different UHRF1 domains. (b). FP assays confirm binding of the aa121-286 to a modified peptide corresponding to residues 1-11 of H3 (ARTKQTARKme3ST). (c). Binding of the UHRF1 (TTD) construct spanning aa121-286 to the series of peptides that have different methylation state of Lys9 was measured using fluorescence polarization (FP). The peptides corresponding to residues 1-11 of H3, ARTKQTAR[Kme0/1/2/3]ST with Lys9 unmodified, mono-, di- or tri-methylated were used. (d). The Tandem Tudor Domain (TTD) is shown in ribbon format

with TTDN in light-brown and TTDC in light-blue. A stick representation of the H3K9me3 peptide is shown in magenta; electron density was only observed for three residues – R8K9me3S10. (e). Close-up view of the aromatic cage in the crystal structures of TTD in its apo form (green) and in complex with H3K9me3 (magenta). The Asp145 and Asn194 appear to have some plasticity, since their side-chains rotate to present the apolar faces towards the ligand, consistent with the ability of the domain to interact with lower states of K9 methylation. 99

blot screens, but not to other methyl histone marks, consistent with a recent report (Rottach,

Frauer et al. 2010). Fluorescence polarization (FP) studies using labeled H3K9me3 peptides

(residues 1-12 of H3) confirmed the SPOT-blot results, showing no interaction with either SRA,

PHD, or tandem PHD-SRA domains, while the conserved, and until recently uncharacterized

region corresponding to residues 121-286 showed reproducible and robust binding (Fig. 4.2b).

Further FP experiments confirmed that this domain favored binding to trimethylated Lys9 H3

peptide compared to peptides corresponding to lower Lys9 methylation states (Fig. 4.2c).

The 2.4 Å crystal structure of residues 126-285 of human UHRF1 (PDB ID: 3DB4) was

determined by the Dhe-Paganon group in the SGC. The structure revealed a tightly packed pair

of Tudor domains (Hashimoto, Horton et al. 2009), each having the signature five-stranded β- barrel fold seen in Royal family domains, many of which are involved in the recognition of methylated lysine histone marks (Maurer-Stroh, Dickens et al. 2003). The closest structural match is that between the N-terminal UHRF1 subdomain (residues 126-206) and the N-terminal

Tudor domain of 53BP1 (PDB entry 2G3R; RMSD of 1.3 Å for 80 Cα positions) including a

conserved aromatic cage, which in the case of 53BP1 binds H4K20me2 (Botuyan, Lee et al.

2006). We therefore hypothesized that recognition of the H3K9me3 mark by UHRF1 could be

attributed to the tandem tudor domain (TTD). The SGC group obtained a second crystal structure of the UHRF1 TTD in complex with a short H3K9me3 peptide (residues 6-11; PDB ID: 3DB3,

Fig. 4.2d). This complex structure confirmed that the aromatic cage (Phe152, Tyr188 and

Tyr191) of the N-terminal Tudor subdomain (TTDN), indeed interacts with the

trimethylammonium moiety in the canonical fashion seen in 53BP1 and other Royal family

members (Maurer-Stroh, Dickens et al. 2003; Botuyan, Lee et al. 2006), along with two polar

residues, Asn194 and Asp145, which complete the binding pocket and provide countercharge. 100

Interestingly, Asp145 occupies an approximate conserved position that often forms hydrogen bonds with dimethyl ammonium moieties within aromatic cages that recognize lower methylation states such as those of 53PB1 and some MBT domains (Botuyan, Lee et al. 2006;

Grimm, de Ayala Alonso et al. 2007; Li, Fischle et al. 2007; Min, Allali-Hassani et al. 2007;

Santiveri, Lechtenberg et al. 2008; Grimm, Matos et al. 2009; Guo, Nady et al. 2009). However, in UHRF1 the side chain of both Asn194 and Asp145 are rotated away from the trimethylammonium moiety of H3K9me3, and participate in an alternative H-bond network to effectively widen the binding pocket and accommodate trimethyl lysine (Fig. 4.2e).

4.3.2 TTD recognizes hallmarks of heterochromatin

The Histone Code hypothesis emphasizes the combinatorial nature of histone posttranslational modifications (Strahl and Allis 2000). For example, pericentric heterochromatin, which is highly methylated at CpG sites, is characterized not only by high levels of H3K9me3 but also by low levels of modified H3K4 (Chen and Townes 2000; Agalioti,

Chen et al. 2002; Binda, LeRoy et al. 2010) (Fig. 4.3a). To explore the possibility that UHRF1 simultaneously recognizes this unique heterochromatic state, an array of doubly-modified and/or mutated H3 peptides was screened for interaction with TTD (Fig. 4.3b). Results showed that in addition to H3K9me3, binding required H3K4 to be either unmodified or monomethylated while di- and trimethylation of Lys4, its deletion or substitution with an alanine eliminated binding.

Binding was not observed for H3K27me3 peptides encompassing residues 21-33, which contains the ARK27S motif (similar to ARK9S), but lacks the lysine residue equivalent to the K4 position

(Fig. 4.3b). FP assays confirmed these data and showed that binding to H3K4me3/K9me3, while still detectable, showed an approximately 5-fold lower affinity compared to the

101

a b Repressed, but “poised” genes H3-R8me1 (no DNA methylation) ARTKQTARK9STGG H3-R8me1/K9me3

X or ARTKQTARK9me3STGG M M M H3-R8me2(s) H3-R2me1 AR2T3 K4QT6ARK9S10T11GGK14APR17K18QLATK23AAR26K27S28AP H3 H3-R8me2(s)/K9me3 X M X / H3-R2me1 K9me3 H3-R8me2(a) H3-R me2(s) Stably repressed/heterochromatic genes 2 H3-R8me2(a)/K9me3 H3-R me2(s)/K me3 (with DNA methylation) 2 9 H3-R8A/K9me3 c H3-R2me2(a) H3-S10ph 1.2 H3-R me2(a)/K me3 H3-K me3/S ph Kd(μM) 2 9 9 10 H3-R2A/K9me3 H3-K9me3/S10A 1.0 H3-K4me0/K9me3 22 ± 6 H3-T3ph ARTKQTARK9me3STGG H3-K4A/K9me3 211 ± 54 H3-K me3/K me3 93 ± 21 H3-T3ph/K9me3 .RTKQTARK9me3STGG 0.8 4 9 H3-T3A/K9me3 ..TKQTARK9me3STGG

H3-K4me1 ...KQTARK9me3STGG 0.6 H3-K4me1/K9me3 ....QTARK9me3STGG H3-K4me2 .....TARK9me3STGG 0.4 H3-K4me2/K9me3 ...... ARK9me3STGG Fraction bound H3-K4me3 ...... RK9me3STGG 0.2 H3-K4me3/K9me3 ...... K9me3STGG

H3-K4A/K9me3 ARTKQTARK9me3STG. 0.0 H3-Q5A/K9me3 ARTKQTARK9me3ST.. H3-T6ph ARTKQTARK me3S... -0.2 9 0.1 1 10 100 1000 H3-T6ph/K9me3 ARTKQTARK9me3.... H3-T6A/K9me3 TTD domain (μM) ATKAARK27me3SAPATG

Figure 4.3. UHRF1 recognizes multivalent histone signatures associated with heterochromatin. (a). Schematic representation of the multivalent nature of the histone H3 tail. Residues known to be modified in vivo are numbered. Stably repressed genes and heterochromatic regions are characterized by the K4me0/K9me3 state of H3. By contrast, “poised” genes are marked by the bivalent K4me3/K27me3 state accompanied in some cases by K9me3 (Bilodeau, Kagey et al. 2009). The region boxed in red is the key sequence studied here. (b). A series of H3 peptides were analyzed for binding by the TTD using SPOT- blot arrays. Unmodified H3 peptide (aa 1-13) showed no binding, while trimethylated H3K9me3 peptide showed strong binding. The effect of dual modifications, alanine replacement and deletion (.) in the background of K9me3 was tested. Arrows indicate peptides for which binding was lost. (c). FP assays measuring the binding of the purified TTD to H3 peptides that have Lys9 trimethylated and Lys4 that is either mutated or methylated to different degrees. 102

H3K4me0/K9me3 peptide (Fig. 4.3c). Likewise, conversion of Lys4 to alanine reduced the

binding affinity by an order of magnitude, highlighting the importance of this residue.

Interestingly, T6 phosphorylation also exerted an inhibitory effect on the binding, but S10

phosphorylation did not (Fig. 4.3b).

To confirm the significance of the unmodified K4 residue in the binding specificity of

UHRF1, we monitored the NMR spectra of amide resonances of 15N-labeled TTD upon titration

with a short H3K9me3 (residues 6-11) and a longer H3K4me0/K9me3 peptide (residues 1-11).

The titration data reinforces several key features of the interaction. Consistent with FP studies and the crystal structure, the short H3K9me3 peptide lacking Lys4 displayed weak binding with

dissociation constant, Kd, over 1mM involving only a cluster of residues in and around the

aromatic cage (Fig. 4.4a,c). Titration with the longer H3K4me0/K9me3 peptide, on the other

hand, resulted in significantly more chemical shift perturbations with greater values compared to

the short peptide, reflecting greater affinity (Kd = 22 μM; Fig. 4.4b,c). The two order of

magnitude difference in Kd between short and long peptides indicates an important role for

residues upstream of Thr6 in the peptide, consistent with our SPOT blot results. Residues

affected by addition of the short peptide were a subset of those affected by the addition of the

long peptide consistent with the same mode of binding by H3K9me3 in the short and long

peptides. The additional residues affected by the long peptide are predominantly those that are at

the interface between the TTDN and TTDC subdomains and those that link the two subdomains.

The extensive nature of the chemical shift changes including buried residues at the interface

between TTDN and TTDC that are not surface exposed, suggests a conformational adjustment of the two subdomains relative to one another in order to accommodate a multivalent histone tail.

c given aconstant valueof0.21ppm. represented peakintheapo are form tothecorresponding could notbe attributed ( peptide short, onlyK9me3-containing longpeptide K4me0/K9me3-containing changes withrespectto residuenumber ofUHRF1TTD domain. Changes afte (c).Histogr Lys9 andLys4unmodified. trimethylated (37%) isinvolvedintheintera number ofresidues peptid K4me0/K9me3-containing Gly236 ofUHRF1TTDwiththelongamide backbone. (b).HSQC ontheinteraction spectradetailing shift the observedchemical byfitting estimated complex. this trimethylated Lys9, consistentwiththe crystalstructureof with interaction (15%)areinvolvedinthe A fewresidues of35-fo andmaximum different colors Lys4residue shortH3peptide,lacking TTD withthe (a). 4.4.MinimalinteractionsbeFigure a Δδ(p.p.m) Chemical shift, ppm 0.0 0.1 0.2 0.3 0.4 0.5 UHRF1(TTD) +H3K9 15 0 Ala150 yr188 Ty Gly149 N- 3000 K Peptide concentration, uM Peptide concentration, Tyr 188 d >1mM 6000 1 H heteronuclear single quantum co single quantum H heteronuclear 9000 Gly236 Asn194-HD22 A sn 194 12000 ARTK - HD21 15000 Gly Tyr140 236

0.000 0.025 0.050 0.075 0.100 Asp142 Chemical shift, ppm shift, Chemical me3 4 β1 me0 1 Thr146 H, ppm Gly

QT 149 T Ala150 β2 6 6 ARK ARK

9 9 e. The residues that were strongly affected are listed in the spectra. Alarge spectra. inthe arelisted affected e. Theresiduesthatwerestrongly me3 me3 Residues that were strongly affected are listed on the histogram. arelistedonthe Residues that werestronglyaffected TTD ST ST tween TTD and theshortH3K9me3-containingpeptide insolution. aa6-11) are indicated in blue. Residue inblue. aa6-11) areindicated ld molar excess of thepeptide(com sc Asn sc N UHRF1 TTD residues (aa1-11) (aa1-11) are shiftsobservedindicated in red, uponaddition ofthe Tyr188 1:7 1:0 194

herence (HSQC) spectra detailing on the interaction ofUHRF1 herence ontheinteraction (HSQC) spectradetailing Asp190 Ala 150 β 1:15 1:35 3 n and Gly149, Ala150, Tyr188, n andGly149,Ala150, changes ofAsn194sidechai

15 β4 ction with the peptide correspondingtotheH3with ction withthepeptide N, ppm Arg205 . Molar excess of the peptide added is indicated by . Molarexcessofthepeptideaddedisindicated b Ala208 β5 am representation of the weighted chemical shift chemical weighted the of representation am the short version of the of shortversion the UHRF1(TTD) +H3K4 Glu Ile212 193 Tyr 188 Asn226 β1 Gly Ile 272 Ile Asp 190 Asp ’

236 Asn228 Trp Arg 151 144 Asp142 Val 277 Val The dissociation constant, Kd,was Thedissociationconstant, me0 Asn 1 H, ppm 226 K9 205 Arg pared to the protein)wasadded. pared tothe

Gly236 Ile 280 Ile Gly s thathadaverylargeshift, but me3 β2 149 Asp 275 Asp

’ Phe237 Ile 211 Ile

TTD peptide that contains only peptide thatcontains Phe

237

as bright red bars and bars as brightred C β3 r binding to H3 tail r bindingtoH3tail ’ Asp263 Arg Tyr 191 207 sc Asn sc Asn

228 194 Glu Phe273 276 β4 ’ 1:0.5 1:0

Glu276 Ala 150 Phe β5

273

’ 1:5 1:1 Ile280

Glu281 103

15N, ppm 104

4.3.3 Structural basis for the recognition of the H3K4me0/K9me3 signature

To determine the mechanism by which H3K4 contributes to the interaction between TTD

and H3, the SGC attempted to crystallize UHRF1 TTD with H3K4me0/H3K9me3 peptides.

Despite repeated attempts, there were no crystals formed in the presence of the longer peptides.

As an alternative method NMR spectroscopy was used to obtain the solution structure of the

UHRF1 TTD (residues 124-285) in complex with a H3K4me0/H3K9me3 peptide (residues 1-11)

(PDB ID: 2L3R; BMRB: 17200, Table 4.1, Fig. 4.5 & 4.7). Both Tudor subdomains formed

extensive interactions with the peptide involving side chains of K9me3, T6 and K4 (Fig. 4.5), which resembles the cooperation between the two chromodomains of CHD1 that form a single pocket to recognize an extended methylated histone tail (Flanagan, Mi et al. 2005).

The K9me3 side chain sits within the conserved aromatic cage of TTDN as seen in the

crystal structure with the short peptide. H3 residues amino-terminal to K9me3 extend along a shallow groove at the interface between the two Tudor subdomains (Fig. 4.5a). The side chain of

K4 is recognized by a network of hydrogen bonds from carboxylates of Asp142 and Glu153

(Fig. 4.5c,d). This recognition mode of the unmodified H3 K4 is similar to that of PHD domains

(Lan, Collins et al. 2007; Ooi, Qiu et al. 2007; Otani, Nankumo et al. 2009). While

monomethylation can be tolerated, di- or trimethylation would be expected to disrupt this

hydrogen-bond network and also sterically interfere with binding, consistent with our SPOT-blot

and FP results. However, unlike many H3K4 recognition motifs, the TTD does not interact with

+ the N-terminal NH3 moiety. Interestingly, phosphorylation of T6, which has been shown to prevent the demethylation of H3K4 (Metzger, Imhof et al. 2010), would be expected to introduce both steric clashes and electrostatic repulsion (Fig. 4.5b). The extensive interactions with Lys4 and Thr6 upstream of the ARKS motif common to Lys9 and Lys27 explain the 105

Table 4.1. NMR data and refinement statistics

TTD-H3K4me0/K9me3 NMR distance and dihedral constraints Distance restraints Total NOE 3352 Intra-residue 658 Inter-residue 2664 Sequential (|i – j| = 1) 969 Nonsequential (|i – j| > 1 ) 1695 Hydrogen bonds 71 Protein–peptide intermolecular 30 Total dihedral angle restraints 256 Protein φ 130 ψ 126 Total RDCs 416

Structure statistics Violations (mean and s.d.) Distance constraints (Å) 0.0435 ± 0.0117 Dihedral angle constraints (º) 3.3492 ± 0.4605 Max. dihedral angle violation (º) 4.4468 Max. distance constraint violation 0.0747 (Å) Deviations from idealized geometry Bond lengths (Å) 0.0144 ± 0.0002 Bond angles (º) 1.2976 ± 0.0253 Impropers (º) 3.24 ± 0.13 Average pairwise r.m.s.d.* (Å) Protein Heavy 1.40 ± 0.13 Backbone 0.76 ± 0.18 Peptide heavy 1.99 ± 0.35 backbone 0.77 ± 0.19 Complex heavy 1.47 ± 0.13 backbone 0.81 ± 0.16 *Ensemble of 15 lowest-energy structures out of 100 used in r.m.s. deviation calculations. For UHRF1 TTD protein, residues 135-160, 183-281 were used; H3K4me0K9me3 peptide, residues 4-10.

106

a b K9me3 T6 Asn TTDN 194 Tyr 191 Arg 235 Asp 190 TTDC T6 Trp 238 Arg 235 Glu 153 Phe Asp 278 190 Arg Ala 207 208

K4me0 c d

K4me0

Glu 153 Glu 153 K4me0

Asp 142 Asp 142 Trp 238 Trp 238 Met Met 224 224 Arg 207 Arg 207

Ala 208 Phe 278 Ala 208 Phe 278

Figure 4.5. Recognition of multivalent sites at the interface between the two Tudor subdomains. (a). Surface representation of the lowest-energy complex NMR structure of TTD bound to the histone H3 tail. The N- and C-terminal Tudor subdomains of UHRF1 are shown in cyan and slate colours, respectively. Key residues on the peptide are shown as red dots. (b). The TTD-H3 binding is stabilized by the interaction between the hydroxyl and backbone carbonyl groups of H3 T6 that hydrogen bond with UHRF1 Asp190 carboxylate and the Arg235 guanidinium groups, respectively. Only protons participating in hydrogen bonding are shown. (c). The H3K4me0 pocket is formed by hydrophilic wall (residues from

TTDN: Asp142 and Glu153), an aromatic wall (TTDC : Trp238 and Phe278), as well as Met224 and residues from the linker between the two subdomains Arg207 and Ala208. (d). Detailed interactions between the TTD and K4me0 showing the side chain of K4 is “caged” by two hydrogen bonds. 107

selectivity of UHRF1 TTD for K9me3 over K27me3. Heteronuclear NOE values showed that in

solution the linker between TTDN and TTDC does not undergo motions faster than the TTD as a

whole, and therefore does not behave as a flexible linker (Fig. 4.6). Thus, the TTD behaves as a

single domain to recognize a single histone tail with the H3K4me0/K9me3 modification state as opposed to recognition of two separate H3 tails with the K4me0 and K9me3 states. This suggests that there may be cellular mechanisms for coordination of the modifications at the K4 and K9 sites on individual histone H3 tails.

4.3.4 Reorientation of TTD subdomains upon histone H3 binding

Overlay of the backbone atoms of TTDN in the apo crystal structure and the lowest energy member of the solution ensemble of the H3K4me0/K9me3 complex revealed a rigid body

movement of the second subdomain relative to the first (Fig. 4.7). Although TTDN in the apo and

bound conformations had only minor differences with an RMSD of 0.77Å, superposition of the

entire TTD revealed a two-fold greater discrepancy between the structures with RMSD of 1.37Å

(Fig. 4.7d). In the bound form, we observed an adjustment of the TTDC corresponding to a 4.2Å movement of the tip of the α1' helix, and smaller shifts in the β1'-β2' and the β2'-β3' loops (Fig.

4.7c). This difference in subdomain orientations was also observed between the solution structure and the X-ray structure bound to the short H3K9me3 peptide, suggesting that the two

TTD subdomains adjust their relative orientation for optimal recognition of the longer bivalent

H3K4me0/K9me3 peptide.

Residual dipolar coupling (RDC) measurements give direct information about the orientation of bond vectors relative to the molecular alignment tensor, and are extremely sensitive to the relative orientations of domains within a protein (Bax, Kontaxis et al. 2001;

108

TTDN TTDC β1 β2 β3 β4 β5 β1’ β2’ β3’ β4’ β5’

1

0.8

0.6

0.4 Relative Peak Intensity Peak Relative

0.2 apo complex 0 124 134 144 154 164 174 184 194 204 214 224 234 244 254 264 274 284 UHRF1 TTD residue

Figure 4.6. Tandem tudor domain behaves as a single rigid body. Probing the motion at the picosecond to nanosecond timescale by measuring 1H-15N heteronuclear NOE values. The values for apo and complex structures are very similar and are greater than 0.7, as expected for a well-structured protein, with the exclusion of few regions. These regions of high flexibility correspond to extreme N- and C- termini, as well as large loop in the TTDN (aa161-180) which is consistent with the apo crystal structure where we did not observe the electron density. Note that in solution the linker between TTDN and TTDC (aa204-214) does not undergo motions faster than the TTD as a whole, and therefore does not behave as a flexible linker.

109

a H3 peptide b

TTDC TTDC

TTDN TTDN α1'

H3 peptide c α1

apo apo TTD TTD N β4 C β4' β3 β1' β2 β3'

β2' β1 β5

β5'

complex complex linker α1' TTDN TTDC d 4.2Å apo crystal

RMSD, Å 1 2 3 TTDN TTDC both

TTDN 0.769

TTDC 1.166

complex NMR both 1.368

1Region used for overlay: 135-160,183-206 2207-281 3135-160,183-281

Figure 4.7. Structural re-adjustment of TTDC in order to accommodate the histone tail. (a). Solution NMR ensemble (15 structures) of the complex structure showing the polypeptide backbone of TTD (blue) and histone peptide (red). The structures were overlayed over the region within TTD residues 139-161 and 182-279. (b). Overlay of the solution TTD ensemble (residues 139-161, 182-206) in the bound form (blue) and X-ray apo structure (red). (c). Overlay of the apo crystal structure (green) and complex NMR structure (blue). The N- and C-termini are removed for clarity to better show the linker between the TTD

subdomains. The extent of the TTDC shift relative to TTDN upon recognition of the H3K4me0/K9me3 peptide is indicated. (d). The r.m.s.d. (Å) between the apo crystal structure and lowest-energy structure of the ensemble of TTD complex solution structures was calculated with MOLMOL. 110

Prestegard, Bougault et al. 2004). Our solution structure of the H3K4me0/K9me3-bound TTD was refined using five sets of RDCs (3 sets of 15N-1HN, and 13C'-13CA, 15N-13C' data sets) to more accurately determine the relative orientations of the two subdomains. Analysis of these

RDC measurements with respect to the apo crystal structure showed that they had a significantly poorer fit with the apo structure, indicating that the bond vectors in the TTD-H3K4me0/K9me3 solution ensemble do not correspond well with the apo crystal structure (Fig. 4.8). The apo protein in solution suffers from selective line broadening, especially at the interface between

TTDN and TTDC, and was less amenable to a detailed RDC analysis. Nevertheless, a single RDC data set collected on the apo protein in solution has a better fit to the apo crystal structure

(compared to the RDCs collected on the TTD bound to H3K4me0/K9me3). Taken together, the data support a conformational adjustment of the two TTD subdomains relative to one another upon peptide binding, further underscoring the importance of the inter-subdomain peptide binding groove.

4.3.5 Mutational analysis confirms localization of TTD to heterochromatin in vivo

In order to investigate the biological relevance of UHRF1 TTD, we generated mutant protein and tested the binding to histone H3 peptides. Mutations within the canonical aromatic cage disrupted the binding greatly and Kd values could not be accurately determined (Fig. 4.9a).

Mutations of Asp142 and Glu153, residues that contribute to the hydrogen bond network with

H3K4me0, disrupt the interaction with the H3 tail without affecting the protein folding (Fig.

4.9b,d). Lastly, mutation of TTD residue interacting with H3T6 leads to the diminished interaction observed by FP assays (Fig. 4.9c,e).

111

a b c l) ta crys ( apo (NMR) ex l comp

d e Figure 4.8. Residual dipolar couplings (RDCs) collected on the TTD/H3K4me0K9me3 complex fit poorly the apo crystal structure. Plot of experimental vs theoretical RDCs plotted with MODULE1.0. Experimental RDCs measured in apo (crystal) different media for TTD/H3K4me0K9me3 sample were compared with predicted values for apo crystal and complex NMR structures (a-e). Total volume of 3.3% peg/hexanol was used as alignment media to measure 15N- 1HN (a), 13C'-13CA (b), 15N-13C' (c) couplings. Same alignment tensor complex (NMR) was used to fit couplings in a,b,c. To measure 15N-1HN dipolar couplings pf1 phages (d) and 6% peg/hexanol f Q values (e) were used as alignment media. apo (crystal) complex (NMR) (f). Goodness of the fit between the experimental RDC values measured 15N-1HN (3.3% peg/hexanol) 0.462 0.228 for either the H3K4me0/K9me3- 13C'-13CA (3.3% peg/hexanol) 0.517 0.295 bound TTD or apo TTD in solution 15N-13C' (3.3% peg/hexanol) 0.477 0.298 and the predicted values for the NMR solution structure and apo 15N-1HN (pf1 phages) 0.449 0.235 crystal structures. For couplings 15N-1HN (6% peg/hexanol) 0.434 0.174

RDCs in bound form highlighted in grey the same alignment tensor was used for 15N-1HN (3.3% peg/hexanol) 0.361 0.382 RDCs in apo form fitting.

112 a b c

1.0 Kd(μM) Kd(μM) Kd(μM) WT 25 ± 2 WT 23 ± 4 0.8 WT 23 ± 4 D145A 114 ± 31 E153A 301 ± 75 D190A 54 ± 10 F152A >1000 D142A/ 0.6 >1000 Y188A >1000 E153A 0.4

0.2 Fraction bound 0.0

0.1 1 10 100 1000 0.1 1 10 100 1000 0.1 1 10 100 1000 TTD domain (μM) TTD domain (μM) TTD domain (μM) d e WT WT D142A/E153A D190A

D190 N, ppm N, ppm 15 15

D142A

NH of E153 could not be assigned 1H, ppm 1H, ppm f

Figure 4.9. Mutational analysis confirms localization of TTD to heterochromatin in mouse ES cells. (a,b,c). Binding of wild-type and TTD mutants that disrupt H3K9me3 binding cage (a), H3K4me0 binding cage (b) or T6 recognition within the recombinant human TTD. The fluorescence polarization binding assays were performed using long, H3K4me0/K9me3-containing histone peptide (aa1-11). (d,e). Overlay of HSQC spectra of wild type (blue) and various mutant forms (red) of UHRF1 TTD. The mutant proteins are folded and have only minor local perturbations. (f). Stably integrated mouse Np95-/- ES cell lines expressing HA-tagged wild-type mUHRF1 (top panel) or mUHRF1F148A (lower panel) were stained for HA, H3K9me3 and DAPI. F148A mutation in mouse protein is equivalent to F152A in human protein. The extent of mUHRF1 co-localization with H3K9me3 was quantitatively evaluated using 60 independent nuclei selected in an unbiased manner by the ImageJ Nucleus Counter plug-in. The HA-mUHRF1F148A protein shows a significantly reduced correlation coefficient compared to the wild-type protein (p-value = 0.0102, average and standard deviation are shown). 113

UHRF1 localizes to and contributes to the shape and density of pericentric

heterochromatin (Uemura, Kubo et al. 2000; Miura, Watanabe et al. 2001; Bostick, Kim et al.

2007; Papait, Pistore et al. 2007; Sharif, Muto et al. 2007; Karagianni, Amazit et al. 2008; Papait,

Pistore et al. 2008), which is typically condensed in DAPI-bright chromocenters that are

enriched with the H3K9me3 modification (Krauss 2008). In order to determine whether the TTD

contributes to the localization of UHRF1 to H3K9me3 enriched regions in cells, our

collaborators stably expressed cDNAs for the wild-type murine UHRF1 (mUHRF1wt; Np95)

and a mutant defective for H3K9me3 binding, mUHRF1F148A (mF148 is equivalent to hF152) in

UHRF1-deficient mouse ES cells (Muto, Kanari et al. 2002; Bostick, Kim et al. 2007; Sharif,

Muto et al. 2007). The wild-type protein was localized almost exclusively to the DAPI-bright,

H3K9me3-rich chromocenters, as expected (Fig. 4.9f). However, the mUHRF1F148A mutant

protein consistently showed a slightly more diffuse localization throughout the nucleus even

though staining at the pericentric heterochromatin could often be observed. This indicates that the TTD domain can assist the localization of UHRF1 to H3K9me3-marked heterochromatin regions in a manner dependent on the integrity of its trimethyllysine-binding aromatic cage.

4.4 Discussion

The Tudor domain is found in many proteins involved in epigenetic regulation and can exist in isolation, in tandem, as a triad and even as many as eleven (Taverna, Li et al. 2007;

Adams-Cioaba and Min 2009). Comparison with other proteins that contain tandem Tudor

domains suggests that although they all have the canonical five strand β-barrel fold, additional

secondary structure elements and the relative orientation and topology of the two domains

provide unique features and great versatility among this family. To our knowledge inter-domain 114

movement accompanying histone peptide recognition by effector domains has not been

described before. For example, recognition of histone tails by the double chromodomains of

CHD1 (Flanagan, Mi et al. 2005), tandem tudor domains of FXR2 (Adams-Cioaba, Guo et al.

2010), 53BP1 (Botuyan, Lee et al. 2006), JMJD2A (Huang, Fang et al. 2006) or Sgf29 (PDB:

3MEU; DOI: 10.2210/pdb3meu/pdb), and tandem PHD fingers of DPF3b (Zeng, Zhang et al.

2010) does not require the adjustment of the two domains. The recognition of two acetylation marks by a single Brdt (Moriniere, Rousseaux et al. 2009) seems to preserve the overall fold, and the combinatorial rheostat-like readout of H3K4me3/T6ph by ING2 PHD finger (Garske, Oliver et al. 2010) does not appear to involve structural rearrangement within the PHD finger. Thus,

previous modes of effector domain-histone peptide binding show minimal structural

perturbations and can be described as either “surface recognition” or “cavity insertion” models,

but neither involves subdomain rearrangements as observed here (Ruthenburg, Li et al. 2007;

Taverna, Li et al. 2007).

Taken together these data support a model in which the simultaneous readout of the

H3K4 and H3K9 modification states by the TTD is achieved by a single domain that undergoes a

conformational change to accommodate both lysines. There is increasing evidence for

coordination of the posttranslational status of H3K4 and H3K9 in mammalian cells. Binda et al.

(Binda, LeRoy et al. 2010) recently reported that the H3K9 methyltransferases, SETDB1,

SUV39H1, G9a and GLP preferentially methylate Lys9 of H3 histones that are depleted in the

K4me2/3 mark. This indicates cross talk between H3K4 and H3K9 in the writing of the

H3K9me3 mark, and it therefore stands to reason that “reader” domains may have also evolved

to read the status of both K4 and K9. It was recently reported that the ADD domain of ATRX can recognize the H3K4me0/K9me3 signature and may play a role in localizing ATRX to DAPI- 115

dense chromo centers similar to that seen here for UHRF1 (Dhayalan, Rajavelu et al. 2010).

Furthermore, Bartke et al. showed that while UHRF1 from cellular extracts was enriched for

binding recombinant nucleosomes containing methyl-CpG (presumably via its SRA domain),

UHRF1 was also highly enriched at nucleosomes without CpG methylation, but only if they

contained the H3K4me0/K9me3 modification signature (Bartke, Vermeulen et al. 2010). There

was no enrichment of UHRF1 at H3K4me3 nucleosomes even if the DNA contained methyl-

CpG (Bartke, Vermeulen et al. 2010). These observations are consistent with an important role

for the TTD in contributing to the subnuclear localization of UHRF1 in addition to the SRA

domain (Rottach, Frauer et al. 2010). Given the critical role played by UHRF1 in maintenance

DNA methylation, our work suggests that the H3K4me0/K9me3 signature is highly associated

with methylated DNA, which is consistent with observations for heterochromatic regions and

stably repressed genes. By contrast, other chromatin states associated with transcriptional

repression such as bivalent domains (Bernstein, Mikkelsen et al. 2006; Bilodeau, Kagey et al.

2009) or Polycomb-repressed domains (Schuettengruber, Chourrout et al. 2007) are not expected

to undergo similar coupling between DNA and histone modifications, thus ensuring more

dynamic expression trajectories through development.

4.5 Experimental Procedures

Cloning, protein expression, and purification. The cDNA encoding residues 121 – 286 of

human UHRF1 protein was cloned into a modified pET28a bacterial expression vector encoding

an N-terminal hexahistidine fusion protein with a TEV protease cleavage site. Mutated cDNAs

were made using QuikChange II XL Site-Directed Mutagenesis Kit (Stratagene); mutations were

confirmed by sequencing complete cDNAs. The protein was expressed in E. coli BL21 (DE3) 116

grown in Terrific Broth in the presence of 50 µg/ml of kanamycin, and induced with 0.2 mM

isopropyl-1-thio-D-galactopyranoside (IPTG). The cell pellet from a 2 L culture was re- suspended in 50 ml lysis buffer consisting of 50 mM Tris-HCl, pH 8.0, 0.5 M NaCl, 5% glycerol, 2 mM imidazole, 1 mM β-mercaptoethanol and 0.1 µM phenylmethyl sulfonyl fluoride

(PMSF), and cells were lyzed. After centrifugation at 40,000 x g for 30 min, the clarified cell lysate was applied on a column packed with 3 ml TALON metal affinity resin (Clontech). The column was consequently washed with 15 ml Wash buffer A (50 mM Tris-HCl, pH 8.0, 0.5 M

NaCl, 5% glycerol, 10 mM imidazole, 1 mM β-mercaptoethanol), 15 ml Wash buffer B (Wash buffer A supplemented with 0.05% Tween 20), and 20 ml Wash buffer A. The protein was eluted with 6 ml 50 mM Tris-HCl, pH 8.0, 0.5 M NaCl, 5% glycerol, 200 mM imidazole, and dithiotreitol (DTT) was added to the eluate at a final concentration of 2 mM. The N-terminal

6xHIS-tag was removed by overnight incubation with TEV protease at 4ºC. The protein was further purified by gel filtration on a HighLoad 16/60 Superdex 200 column (GE Healthcare) equilibrated with 20 mM Tris-HCl, pH 8.0, 0.5 M NaCl, 5% glycerol, 2 mM dithiothreitol. Final purification was achieved by using ion-exchange chromatography on a 5-ml HiTrapQ column using 0 – 0.5 M linear gradient of NaCl in 20 mM Tris-Cl, pH 8.0, 5% glycerol, 2 mM DTT.

Protein was concentrated to 30 – 40 mg/ml. LC/MS analysis was used to verify the correct molecular weight of the protein; its purity was assessed by SDS-PAGE. The concentrated purified protein was stored at 4ºC for 2 – 3 weeks and at -80ºC for longer periods. The protein was not stable at low salt concentrations, thus for binding assays and structure determination buffers contained at least 250mM NaCl.

For uniformly 15N- and 13C-labeled proteins, E. coli were grown in M9 minimal medium

supplemented with 15N-labeled ammonium chloride (0.8 g l-1) for titration experiments and 15N- 117

labeled ammonium chloride (0.8 g l-1), 13C-labeled glucose (2 g l-1) for all other experiments. The

protein was induced with 1.0 mM IPTG. UHRF1 TTD-containing protein construct was purified

as described above. The final NMR samples had protein concentration of 0.45-0.6 mM and were

prepared in buffer containing 12.2 mM Na2HPO4, 7.8 mM NaH2PO4 (pH 7.0), 250 mM NaCl, 2 mM DTT, 1mM benzamidine, 0.5mM PMSF and supplemented with 10% (v/v) D2O.

Peptides used for NMR were purchased in purified form from Tufts University Core

Services (Boston, MA, USA). Two peptides were used for these studies, peptide TARK(me3)ST

corresponding to the N-terminal histone H3 residues 6-11, hereby referred to as “short peptide”

or H3K9me3, and peptide ARTKQTARKme3ST corresponding to the histone H3 residues 1-11,

also referred to as the long or H3K4me0/K9me3 peptide.

Histone peptide SPOT-blot peptide array screen

Peptides were synthesized directly on a modified cellulose membrane with a

polyethylglycol linker using the peptide synthesizer MultiPep (Intavis). The binding reaction was

initially performed using a library of 580 membrane-immobilized peptides corresponding to

control peptides and 8 – 14 residue-long stretches of histones H2A, H2B, H3 and H4 sequences

with either non-modified or variously modified arginine, lysine, serine and threonine residues

(one modified residue per peptide) as described in Chapter 2 (Nady, Min et al. 2008). The

subsequent peptide libraries were designed specifically to test the binding preference of UHRF1

TTD domain to H3K4 and K9 marks. All recombinant proteins that were tested contained an N-

terminal 6xHIS-tag.

118

Fluorescence-polarization binding assays

For FP studies, peptides indicated in the figures were synthesized, N-terminally labeled

with fluorescein and purified by Tufts University Core Services (Boston, MA, USA). Binding

assays were performed in 10 µl volume at a constant labeled peptide concentration of 36 nM, and

the protein was used at concentrations at saturation ranging between 800 and 1300 µM in buffer

containing 20 mM Tris pH 8.0, 250 mM NaCl, 1 mM DTT, 1mM Benzamidine, 1 mM PMSF,

0.01% Tween-20. FP assays were performed at t=29.0°C in 384 well plates using Synergy 2

microplate reader (BioTek, Vermont, USA). The excitation wavelength of 485nm and the

emission wavelength of 528nm were used. The data was corrected for background of the free

labeled peptides. To determine Kd values, the data were fit to a hyperbolic function using Sigma

Plot software (Systat Software, Inc., CA, USA). The Kd values represent averages ± standard

error for at least three independent experiments.

NMR spectroscopy and data analysis

Chemical shift mapping on the UHRF1 TTD domain was done by monitoring the 1H-15N

HSQC spectra of the uniformly 15N-labeled TTD domain alone (0.45 mM) and with an excess of

unlabeled interacting short or long H3 peptides. Aliquots of unlabeled peptides were titrated into

the labeled TTD domain in molar ratio 1:35 for the short peptide and 1:5 for the long peptide

until no further changes in chemical shifts were detected in the 1H-15N HSQC spectrum. The

HSQC spectra were recorded at 25°C in 12.2 mM Na2HPO4, 7.8 mM NaH2PO4 (pH 7.0), 250

mM NaCl, 2 mM DTT, 1mM benzamidine, 0.5mM PMSF and supplemented with 10% (v/v)

D2O on a Bruker Avance 800-MHz spectrometer. Composite chemical shift perturbation values

2 2 1/2 shown were calculated using the equation Δcomp = [Δδ HN + (ΔδN/6.5) ] . The dissociation 119

constant, Kd, was estimated by fitting the observed chemical shift changes for selected residues

2 1/2 to the following equation Δ = Δmax ([L]T + [P]T + Kd – (([L]T+[P]T+Kd) – 4[L]T[P]T) )/(2[P]T),

in which Δ is the observed chemical shift change at a given total ligand concentration, [L]T, Δmax is the change in chemical shift at saturation and [P]T is the total protein concentration. Data was

fitted using GraphPad Prism software.

For structure determination, NMR spectra were recorded at 25°C on Varian INOVA 600-

MHz spectrometer equipped with triple resonance probe and Bruker Avance 600 and 800-MHz

spectrometers equipped with cryoprobes. NMR data were collected at high resolution from

nonlinearly sampled spectra and processed using multidimensional decomposition (Gutmanas,

Jarvoll et al. 2002; Orekhov, Ibraghimov et al. 2003), and NMRPipe software (Delaglio,

Grzesiek et al. 1995). The data was analyzed with NMRView and Sparky software (Goddard and

Kneller).

The assignment of 1H, 13C and 15N resonances of TTD bound to the long peptide (protein

concentration was 0.6mM, peptide concentration was 3.0mM) was achieved using an ABACUS

protocol to 92.3% completion (Lemak, Steren et al. 2008; Lemak, Gutmanas et al. 2010).

Interproton distance restraints within the protein were obtained from multi-dimensional NOE

spectra recorded with a 100ms mixing time. The peptide’s 1H resonances were assigned using

two-dimensional total correlation spectroscopy (TOCSY) and NOESY experiments. Initial

intermolecular contacts were obtained from a three-dimensional 13C/15N isotope-filtered NOESY

spectrum (Zwahlen, Legault et al. 1997). Further, HADDOCK (van Dijk, de Vries et al. 2005)

was used to aid the structure calculation.

The solution structure was calculated using iterative cycles of CYANA (version 3.0)

(Guntert, Mumenthaler et al. 1997), HADDOCK and CNS (Brunger, Adams et al. 1998). An 120

initial structure calculation of the peptide-bound TTD was performed with CYANA based on restraints from five sets of RDCs, backbone hydrogen-bond restraints in regular secondary structure, φ and ψ torsion angles derived from TALOS (Cornilescu, Delaglio et al. 1999), NOE- derived distance restraints, and additional distance constraints between the two subdomains and between protein and peptide. The RDC restraints included those for 15N-1HN collected in three different media and for 15N-13C', and 13C'-13Ca that were collected in the same media. The additional distance constraints included 24 interproton distance constraints between protein residues of the conserved aromatic cage and the trimethyl lysine of the peptide derived from the crystal structure with the short peptide, 5 additional peptide-protein restraints were derived from confidently assigned peaks in the isotope-filtered experiments (Table 4.2). The structure was refined with CNS in explicit water, however, due to the low number of protein-peptide distance restraints, the peptide was not packed well against the protein.

To improve the packing of the peptide with the protein HADDOCK was used. In the

HADDOCK calculations we used the structure and restraints obtained from the first round of calculations described above. Additionally, data from the NMR titrations were used to generate ambiguous interaction restraints. The lowest energy structure from HADDOCK was refined using CNS and analyzed manually. From this result, 11 additional distance restraints between the protein and peptide were obtained by manually assigning peaks observed in both isotope-filtered and “regular” 13C,15N-edited NOE spectra that were consistent with the HADDOCK-derived structure (Table 4.2). For the final round of structure calculations we performed CYANA calculations with the new HADDOCK-derived constraints followed by refinement using CNS.

The 15 lowest energy structures were selected for the final NMR ensemble (Fig. 4.7a). We evaluated the quality of the final structures using PSVS (Bhattacharya, Tejero et al. 2007). 121

Table 4.2. Intra- and intermolecular distance restraints used for TTD-H3K4me0/K9me3 calculations

Initial set of intermolecular protein-peptide distance restraints Residue Atom Residue Atom Upper distance (peptide) (peptide) (protein) (protein) restraint, Å 02 ARG HE 272 ILE QG2 5.50 02 ARG HE 272 ILE QD1 5.50 04 LYS QB 226 ASN H 5.00 09 LYS QB 191 TYR QE 4.00 09 LYS QZ 145 ASP HB2 3.47 09 LYS QZ 145 ASP HB3 4.72 09 LYS QZ 152 PHE HB3 4.59 09 LYS QZ 152 PHE HB2 4.71 09 LYS QZ 152 PHE HD1 4.12 09 LYS QZ 152 PHE HD2 4.17 09 LYS QZ 152 PHE HE1 3.62 09 LYS QZ 152 PHE HE2 3.60 09 LYS QZ 152 PHE HZ 3.31 09 LYS QZ 188 TYR HB3 4.91 09 LYS QZ 188 TYR HB2 4.71 09 LYS QZ 188 TYR HD1 4.07 09 LYS QZ 188 TYR HD2 4.98 09 LYS QZ 188 TYR HE1 3.77 09 LYS QZ 188 TYR HE2 4.74 09 LYS QZ 188 TYR HH 4.68 09 LYS QZ 191 TYR HB3 3.73 09 LYS QZ 191 TYR HB2 3.85 09 LYS QZ 191 TYR HD1 3.95 09 LYS QZ 191 TYR HD2 3.96 09 LYS QZ 191 TYR HE1 4.26 09 LYS QZ 191 TYR HE2 4.28 09 LYS QZ 194 ASN HB2 4.70 09 LYS QZ 194 ASN HB3 4.72 10 SER HA 148 MET QE 6.00

Final set of intermolecular protein-peptide distance restraints Residue Atom Residue Atom Upper distance (peptide) (peptide) (protein) (protein) restraint, Å 02 ARG H 275 ASP O 2.00 02 ARG H 275 ASP OD1 2.00 02 ARG HE 275 ASP OD1 2.00 04 LYS HZ2 142 ASP OD1 2.00 04 LYS HZ3 142 ASP OD2 2.00 04 LYS HZ1 153 GLU OE1 2.00 04 LYS QD 208 ALA QB 4.50 04 LYS HA 226 ASN HD22 4.00 06 THR HG1 190 ASP OD2 2.00 06 THR O 235 ARG HH11 2.00 122

09 LYS QB 191 TYR QE 4.00 09 LYS QZ 145 ASP HB2 3.47 09 LYS QZ 145 ASP HB3 4.72 09 LYS QZ 152 PHE HB3 4.59 09 LYS QZ 152 PHE HB2 4.71 09 LYS QZ 152 PHE HD1 4.12 09 LYS QZ 152 PHE HD2 4.17 09 LYS QZ 152 PHE HE1 3.62 09 LYS QZ 152 PHE HE2 3.60 09 LYS QZ 152 PHE HZ 3.31 09 LYS QZ 188 TYR HB3 4.91 09 LYS QZ 188 TYR HB2 4.71 09 LYS QZ 188 TYR HD1 4.07 09 LYS QZ 188 TYR HD2 4.98 09 LYS QZ 188 TYR HE1 3.77 09 LYS QZ 188 TYR HE2 4.74 09 LYS QZ 188 TYR HH 4.68 09 LYS QZ 191 TYR HB3 3.73 09 LYS QZ 191 TYR HB2 3.85 09 LYS QZ 191 TYR HD1 3.95 09 LYS QZ 191 TYR HD2 3.96 09 LYS QZ 191 TYR HE1 4.26 09 LYS QZ 191 TYR HE2 4.28 09 LYS QZ 194 ASN HB2 4.70 09 LYS QZ 194 ASN HB3 4.72 10 SER HG 145 ASN OD1 2.00 10 SER HG 147 ASN O 2.00 10 SER HA 148 MET QE 6.00

Intramolecular protein distance restraints Residue Atom Residue Atom Upper distance (protein) (protein) (protein) (protein) restraint, Å 128 ASP H 218 GLU HB2 4.08 128 ASP H 218 GLU HB3 4.08 128 ASP H 218 GLU QG 3.77 128 ASP H 221 GLN QB 3.44 128 ASP H 221 GLN HA 4.96 128 ASP HA 221 GLN QB 3.67 128 ASP H 221 GLN QG 3.96 133 GLY H 283 PRO HB3 4.65 133 GLY H 283 PRO HA 4.59 133 GLY H 283 PRO QG 4.71 148 MET QE 229 PRO HG3 4.75 150 ALA QB 239 TYR HA 5.00 150 ALA QB 239 TYR QD 5.00 150 ALA QB 239 TYR QE 5.00 208 ALA QB 279 LYS H 5.00 214 TRP HE3 246 LYS HA 5.50 123

214 TRP HZ2 248 GLU QB 3.97 214 TRP HZ2 248 GLU QG 4.62 214 TRP HH2 248 GLU QB 4.80 214 TRP HH2 253 ARG QG 5.31 214 TRP HZ2 253 ARG QG 4.85 214 TRP HZ2 253 ARG QD 4.64 214 TRP HH2 253 ARG HA 5.50 214 TRP HH2 254 GLU H 5.50 214 TRP HE3 255 LEU QD2 4.26 214 TRP HH2 255 LEU QD2 5.07 214 TRP HE3 255 LEU QD1 4.26 214 TRP HH2 255 LEU QD1 5.07 214 TRP HE3 255 LEU HA 5.50 214 TRP HB3 255 LEU QQD 5.44 214 TRP HE3 255 LEU QQD 3.57 214 TRP HH2 255 LEU QQD 4.17 *226 ASN HB3 238 TRP HZ3 3.50 *226 ASN HB3 238 TRP HZ3 3.50 * observed ring current effect 124

MolProbity Ramachandran plot statistics calculated for residues 135-160, 183-281 shows that

91.8% of the residues are in the most favoured regions; The Procheck all dihedral angle Z-score

is -3.96 and MolProbity Clashscore is -1.96. Figures were prepared using MOLMOL (Koradi,

Billeter et al. 1996) and PyMOL (DeLano Scientific).

NMR relaxation measurements

In order to further characterize the mechanism of histone peptide recognition by UHRF1

we probed the backbone dynamics of both the apo TTD and the TTD-H3K4me0/K9me3

complex by measuring 1H-15N heteronuclear NOE values, which measure picosecond to

nanosecond timescale motions. 15N−1HN heteronuclear NOE measurements were taken at 25 °C

on a Bruker Avance 600 instrument using published pulse sequence (Dayie and Wagner 1994;

Farrow, Muhandiram et al. 1994) with some modifications. For both apo and complex (TTD

bound to H3K4me0/K9me3) protein samples, 15N−1HN NOE saturated and unsaturated spectral

measurements were recorded in an interleaved manner. Data were processed using NMRpipe

(Delaglio, Grzesiek et al. 1995) and analyzed using NMRView and Sparky (Goddard and

Kneller).

Residual dipolar coupling (RDC) measurements and analysis

Dipolar couplings were measured on isotropic and anisotropic sample containing

0.6mM 15N, 13C TTD protein and 3mM unlabeled H3K4me0/K9me3 peptide. 15N-1HN residual

dipolar couplings were extracted from two-dimensional IPAP 15N-1HN HSQC spectra (Ottiger,

Delaglio et al. 1998). Pulse sequences that were used for measurement of 13C'-13Ca couplings have been described earlier (Mittermaier and Kay 2001). Pulse sequences for 15N-13C' are 125

courtesy of Lewis Kay. The aligned sample from which 15N-1HN, 15N-13C', and 13C'-13Ca RDCs were extracted contained 3.3% final sample volume of C12E5 PEG/hexanol media (deuterium splitting, 13.2Hz; linewidth, 1.5Hz) (Rückert and Otting 2000 ), and 219 couplings were used

(collected on a Varian INOVA 600 MHz instrument). The aligned sample from which 15N-1HN

RDCs were extracted contained total volume 6% C12E5 PEG/hexanol media (deuterium splitting, 25 Hz; linewidth, 1.8 Hz) (Rückert and Otting 2000 ), 102 couplings were used and another sample was prepared by adding 10mg/ml Pf1 bacteriophage (deuterium splitting, 12 Hz; linewidth, 1.7 Hz) (Hansen, Mueller et al. 1998), 95 couplings were used (collected on a Bruker

Avance 600 MHz spectrometer). Software FuDA (Hansen) was used to extract peak shape and intensity parameters from J-evolution 15N-13C', and 13C'-13Ca RDCs.

Another set of 15N-1HN residual dipolar couplings was measured on isotropic and

anisotropic sample containing only 0.6mM 15N, 13C TTD protein (collected on a Bruker Avance

600 MHz spectrometer). The aligned sample contained 3.2% final sample volume of C12E5

PEG/hexanol media (deuterium splitting, 16Hz; linewidth, 1.8Hz) (Rückert and Otting 2000 ), 73

couplings were used.

To obtain goodness of fit (Q) values for experimental RDC values with theoretically

back-calculated values for the apo crystal structure and complex (TTD/H3K4me0K9me3) NMR

structures, the data was analyzed using PALES (Zweckstetter 2008). The plots were produced

with MODULE v1.0 (Dosset, Hus et al. 2001).

4.6 References

Adams-Cioaba, M. A., Y. Guo, et al. (2010). "Structural studies of the tandem Tudor domains of fragile X mental retardation related proteins FXR1 and FXR2." PLoS One 5(11): e13559. Adams-Cioaba, M. A. and J. Min (2009). "Structure and function of histone methylation binding proteins." Biochem Cell Biol 87(1): 93-105. 126

Agalioti, T., G. Chen, et al. (2002). "Deciphering the transcriptional histone acetylation code for a human gene." Cell 111(3): 381-92. Bartke, T., M. Vermeulen, et al. (2010). "Nucleosome-interacting proteins regulated by DNA and histone methylation." Cell 143(3): 470-84. Bax, A., G. Kontaxis, et al. (2001). "Dipolar couplings in macromolecular structure determination." Methods Enzymol 339: 127-74. Bernstein, B. E., T. S. Mikkelsen, et al. (2006). "A bivalent chromatin structure marks key developmental genes in embryonic stem cells." Cell 125(2): 315-26. Bhattacharya, A., R. Tejero, et al. (2007). "Evaluating protein structures determined by structural genomics consortia." Proteins 66(4): 778-95. Bilodeau, S., M. H. Kagey, et al. (2009). "SetDB1 contributes to repression of genes encoding developmental regulators and maintenance of ES cell state." Genes Dev 23(21): 2484-9. Binda, O., G. LeRoy, et al. (2010). "Trimethylation of histone H3 lysine 4 impairs methylation of histone H3 lysine 9: regulation of lysine methyltransferases by physical interaction with their substrates." Epigenetics 5(8): 767-75. Bostick, M., J. K. Kim, et al. (2007). "UHRF1 plays a role in maintaining DNA methylation in mammalian cells." Science 317(5845): 1760-4. Botuyan, M. V., J. Lee, et al. (2006). "Structural basis for the methylation state-specific recognition of histone H4-K20 by 53BP1 and Crb2 in DNA repair." Cell 127(7): 1361- 73. Brunger, A. T., P. D. Adams, et al. (1998). "Crystallography & NMR system: A new software suite for macromolecular structure determination." Acta Crystallogr D Biol Crystallogr 54(Pt 5): 905-21. Chen, W. Y. and T. M. Townes (2000). "Molecular mechanism for silencing virally transduced genes involves histone deacetylation and chromatin condensation." Proc Natl Acad Sci U S A 97(1): 377-82. Citterio, E., R. Papait, et al. (2004). "Np95 is a histone-binding protein endowed with ubiquitin ligase activity." Mol Cell Biol 24(6): 2526-35. Cornilescu, G., F. Delaglio, et al. (1999). "Protein backbone angle restraints from searching a database for chemical shift and sequence homology." J Biomol NMR 13(3): 289-302. Dayie, K. T. and G. Wagner (1994). "Relaxation-rate measurements for 15N-1H groups with pulsed-field gradients and preservation of coherence pathways. ." J. Magn. Reson. 111A: 121–26. Delaglio, F., S. Grzesiek, et al. (1995). "NMRPipe: a multidimensional spectral processing system based on UNIX pipes." J Biomol NMR 6(3): 277-93. Denslow, S. A. and P. A. Wade (2007). "The human Mi-2/NuRD complex and gene regulation." Oncogene 26(37): 5433-8. Dhayalan, A., A. Rajavelu, et al. (2010). "The Dnmt3a PWWP domain reads histone 3 lysine 36 trimethylation and guides DNA methylation." J Biol Chem 285(34): 26114-20. Dosset, P., J. C. Hus, et al. (2001). "A novel interactive tool for rigid-body modeling of multi- domain macromolecules using residual dipolar couplings." J Biomol NMR 20(3): 223-31. Farrow, N. A., R. Muhandiram, et al. (1994). "Backbone dynamics of a free and phosphopeptide- complexed Src homology 2 domain studied by 15N NMR relaxation." Biochemistry 33(19): 5984-6003. Flanagan, J. F., L. Z. Mi, et al. (2005). "Double chromodomains cooperate to recognize the methylated histone H3 tail." Nature 438(7071): 1181-5. 127

Garske, A. L., S. S. Oliver, et al. (2010). "Combinatorial profiling of chromatin binding modules reveals multisite discrimination." Nat Chem Biol 6(4): 283-90. Goddard, T. D. and D. G. Kneller SPARKY 3. University of California, San Francisco. Grimm, C., A. G. de Ayala Alonso, et al. (2007). "Structural and functional analyses of methyl- lysine binding by the malignant brain tumour repeat protein Sex comb on midleg." EMBO Rep 8(11): 1031-7. Grimm, C., R. Matos, et al. (2009). "Molecular recognition of histone lysine methylation by the Polycomb group repressor dSfmbt." Embo J 28(13): 1965-77. Guntert, P., C. Mumenthaler, et al. (1997). "Torsion angle dynamics for NMR structure calculation with the new program DYANA." J Mol Biol 273(1): 283-98. Guo, Y., N. Nady, et al. (2009). "Methylation-state-specific recognition of histones by the MBT repeat protein L3MBTL2." Nucleic Acids Res 37(7): 2204-10. Gutmanas, A., P. Jarvoll, et al. (2002). "Three-way decomposition of a complete 3D 15N- NOESY-HSQC." J Biomol NMR 24(3): 191-201. Hansen, F. D. FuDA. http://pound.med.utoronto.ca/software, [email protected]. Hansen, M. R., L. Mueller, et al. (1998). "Tunable alignment of macromolecules by filamentous phage yields dipolar coupling interactions." Nat Struct Biol 5(12): 1065-74. Hashimoto, H., J. R. Horton, et al. (2009). "UHRF1, a modular multi-domain protein, regulates replication-coupled crosstalk between DNA methylation and histone modifications." Epigenetics 4(1): 8-14. Hu, L., Z. Li, et al. (2011). "Crystal structure of PHD domain of UHRF1 and insights into recognition of unmodified histone H3 arginine residue 2." Cell Res 21(9): 1374-8. Huang, Y., J. Fang, et al. (2006). "Recognition of histone H3 lysine-4 methylation by the double tudor domain of JMJD2A." Science 312(5774): 748-51. Karagianni, P., L. Amazit, et al. (2008). "ICBP90, a novel methyl K9 H3 binding protein linking protein ubiquitination with heterochromatin formation." Mol Cell Biol 28(2): 705-17. Kim, J. K., P. O. Esteve, et al. (2009). "UHRF1 binds G9a and participates in p21 transcriptional regulation in mammalian cells." Nucleic Acids Res 37(2): 493-505. Klose, R. J. and A. P. Bird (2006). "Genomic DNA methylation: the mark and its mediators." Trends Biochem Sci 31(2): 89-97. Koradi, R., M. Billeter, et al. (1996). "MOLMOL: a program for display and analysis of macromolecular structures." J Mol Graph 14(1): 51-5, 29-32. Krauss, V. (2008). "Glimpses of evolution: heterochromatic histone H3K9 methyltransferases left its marks behind." Genetica 133(1): 93-106. Lan, F., R. E. Collins, et al. (2007). "Recognition of unmethylated histone H3 lysine 4 links BHC80 to LSD1-mediated gene repression." Nature 448(7154): 718-22. Lemak, A., A. Gutmanas, et al. (2010). "A novel strategy for NMR resonance assignment and protein structure determination." J Biomol NMR. Lemak, A., C. A. Steren, et al. (2008). "Sequence specific resonance assignment via Multicanonical Monte Carlo search using an ABACUS approach." J Biomol NMR 41(1): 29-41. Li, H., W. Fischle, et al. (2007). "Structural basis for lower lysine methylation state-specific readout by MBT repeats of L3MBTL1 and an engineered PHD finger." Mol Cell 28(4): 677-91. Maurer-Stroh, S., N. J. Dickens, et al. (2003). "The Tudor domain 'Royal Family': Tudor, plant Agenet, Chromo, PWWP and MBT domains." Trends Biochem Sci 28(2): 69-74. 128

Meissner, A., T. S. Mikkelsen, et al. (2008). "Genome-scale DNA methylation maps of pluripotent and differentiated cells." Nature 454(7205): 766-70. Metzger, E., A. Imhof, et al. (2010). "Phosphorylation of histone H3T6 by PKCbeta(I) controls demethylation at histone H3K4." Nature 464(7289): 792-6. Min, J., A. Allali-Hassani, et al. (2007). "L3MBTL1 recognition of mono- and dimethylated histones." Nat Struct Mol Biol 14(12): 1229-30. Mittermaier, A. and L. E. Kay (2001). "Chi1 torsion angle dynamics in proteins from dipolar couplings." J Am Chem Soc 123(28): 6892-903. Miura, M., H. Watanabe, et al. (2001). "Dynamic changes in subnuclear NP95 location during the cell cycle and its spatial relationship with DNA replication foci." Exp Cell Res 263(2): 202-8. Moriniere, J., S. Rousseaux, et al. (2009). "Cooperative binding of two acetylation marks on a histone tail by a single bromodomain." Nature 461(7264): 664-8. Muto, M., Y. Kanari, et al. (2002). "Targeted disruption of Np95 gene renders murine embryonic stem cells hypersensitive to DNA damaging agents and DNA replication blocks." J Biol Chem 277(37): 34549-55. Nady, N., J. Min, et al. (2008). "A SPOT on the chromatin landscape? Histone peptide arrays as a tool for epigenetic research." Trends Biochem Sci 33(7): 305-13. Ooi, S. K., C. Qiu, et al. (2007). "DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA." Nature 448(7154): 714-7. Orekhov, V. Y., I. Ibraghimov, et al. (2003). "Optimizing resolution in multidimensional NMR by three-way decomposition." J Biomol NMR 27(2): 165-73. Otani, J., T. Nankumo, et al. (2009). "Structural basis for recognition of H3K4 methylation status by the DNA methyltransferase 3A ATRX-DNMT3-DNMT3L domain." EMBO Rep 10(11): 1235-41. Ottiger, M., F. Delaglio, et al. (1998). "Measurement of J and dipolar couplings from simplified two-dimensional NMR spectra." J Magn Reson 131(2): 373-8. Papait, R., C. Pistore, et al. (2008). "The PHD domain of Np95 (mUHRF1) is involved in large- scale reorganization of pericentromeric heterochromatin." Mol Biol Cell 19(8): 3554-63. Papait, R., C. Pistore, et al. (2007). "Np95 is implicated in pericentromeric heterochromatin replication and in major satellite silencing." Mol Biol Cell 18(3): 1098-106. Prestegard, J. H., C. M. Bougault, et al. (2004). "Residual dipolar couplings in structure determination of biomolecules." Chem Rev 104(8): 3519-40. Rajakumara, E., Z. Wang, et al. (2011). "PHD finger recognition of unmodified histone H3R2 links UHRF1 to regulation of euchromatic gene expression." Mol Cell 43(2): 275-84. Rottach, A., C. Frauer, et al. (2010). "The multi-domain protein Np95 connects DNA methylation and histone modification." Nucleic Acids Res 38(6): 1796-804. Rückert, M. and G. Otting (2000 ). "Alignment of Biological Macromolecules in Novel Nonionic Liquid Crystalline Media for NMR Experiments." JACS 122(32): 7793-7797. Ruthenburg, A. J., H. Li, et al. (2007). "Multivalent engagement of chromatin modifications by linked binding modules." Nat Rev Mol Cell Biol 8(12): 983-94. Santiveri, C. M., B. C. Lechtenberg, et al. (2008). "The malignant brain tumor repeats of human SCML2 bind to peptides containing monomethylated lysine." J Mol Biol 382(5): 1107- 12. Schuettengruber, B., D. Chourrout, et al. (2007). "Genome regulation by polycomb and trithorax proteins." Cell 128(4): 735-45. 129

Sharif, J., M. Muto, et al. (2007). "The SRA protein Np95 mediates epigenetic inheritance by recruiting Dnmt1 to methylated DNA." Nature 450(7171): 908-12. Strahl, B. D. and C. D. Allis (2000). "The language of covalent histone modifications." Nature 403(6765): 41-5. Taverna, S. D., H. Li, et al. (2007). "How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers." Nat Struct Mol Biol 14(11): 1025-40. Uemura, T., E. Kubo, et al. (2000). "Temporal and spatial localization of novel nuclear protein NP95 in mitotic and meiotic cells." Cell Struct Funct 25(3): 149-59. Unoki, M., T. Nishidate, et al. (2004). "ICBP90, an E2F-1 target, recruits HDAC1 and binds to methyl-CpG through its SRA domain." Oncogene 23(46): 7601-10. van Dijk, A. D., S. J. de Vries, et al. (2005). "Data-driven docking: HADDOCK's adventures in CAPRI." Proteins 60(2): 232-8. Wang, C., J. Shen, et al. (2011). "Structural basis for site-specific reading of unmodified R2 of histone H3 tail by UHRF1 PHD finger." Cell Res 21(9): 1379-82. Woo, H. R., O. Pontes, et al. (2007). "VIM1, a methylcytosine-binding protein required for centromeric heterochromatinization." Genes Dev 21(3): 267-77. Zeng, L., Q. Zhang, et al. (2010). "Mechanism and regulation of acetylated histone binding by the tandem PHD finger of DPF3b." Nature 466(7303): 258-62. Zwahlen, C., P. Legault, et al. (1997). "Methods for Measurement of Intermolecular NOEs by Multinuclear NMR Spectroscopy: Application to a Bacteriophage λ N-Peptide/boxB RNA Complex." JACS 119(29): 6711-6721. Zweckstetter, M. (2008). "NMR: prediction of molecular alignment from structure using the PALES software." Nat Protoc 3(4): 679-90.

130

Chapter 5

Conclusions and Future Directions 131

5.1. Conclusions

5.1.1 A SPOT on the chromatin landscape? Histone peptide arrays as a tool for epigenetic

research

• SPOT-blot peptide arrays is a novel high-throughput approach that allows synthesis of up

to 600 histone peptides on a solid support single membrane

• Modified amino acids that correspond to various PTMs on histones can be faithfully

incorporated into peptides that are ~15 residues in length

• The method works well for characterizing reagents for chromatin research, ex. epitope

mapping for antibodies as demonstrated using the primary antibody against H3S10ph and

H3T11ph marks

• Peptide arrays can be used to identify protein/domain binding specificity as demonstrated

with Cbx3, WDR5, DNMT3L, and L3MBTL1 proteins

5.1.2 Histone Substrate Recognition by Human MBT Domains

• Despite the high conservation of the MBT domains in various human proteins, MBTs

recognize lysines with differential methylation states and bind to peptides with various

sequences surrounding the lysine

• There are two additional residues (SR1 and SR2) within the aromatic binding pocket that

were previously not implicated in recognition of the methylated lysines

• Presence of the hydrophobic and aromatic residues (ex. Ile, Phe) at the SR1 and SR2

positions can switch the specificity towards Kme2 and absence of an aromatic group at

SR2 or presence of a polar residue at SR1 increases the preference towards Kme1 132

• L3MBTL1 and L3MBTL3 show promiscuous binding to histones; SCMH1, SFMBT1

and SFMBT2 do not bind histones peptides with appreciable affinity; MBTD1,

L3MBTL2, L3MBTL4, SCML2 recognize specific marks on histones

• MBT domains of SCML2 protein specifically recognize H2AK36me1 mark on the

histone peptides and on the recombinant nucleosomes; this binding is dependent on the

presence of the basic region adjacent to the MBT domains

5.1.3 Recognition of multivalent histone states associated with heterochromatin by UHRF1

• UHRF1 contains evolutionary conserved tandem tudor domain (TTD) that until recently

was functionally and structurally uncharacterized

• TTD binds to H3K9me3 mark in vitro and in vivo, although the high affinity binding is

observed only when dual marks, trimethylated Lys9 and unmodified or monomethylated

Lys4 on histone H3 are present

• TTD recognizes hallmarks of heterochromatin, H3K4me0 by a pocket at the interface

between the two subdomains, and H3K9me3 via a canonical aromatic cage in the N-

terminal subdomain

• The bivalent histone signature, H3K4me0/K9me3, is recognized by a novel mechanism

whereby the two subdomains reorient themselves in order to fit both marks

5.2 Future directions

5.2.1 From a single SPOT to a polka dot on the chromatin landscape: next generation binding arrays 133

It appears that from handful modifications that were known in early 2000’s the list has

grown to over 150. This abundance of novel PTMs populating the chromatin landscape was

possible due to developments in mass spectrometry proteomics approaches. Most recently,

modifications like propionylation, butyrylation, and crotonylation on lysines have been described

(Chen, Sprung et al. 2007; Tan, Luo et al. 2011). However, the function of these modifications,

and the proteins that “write”, “erase” or “read” them in many cases remain unknown. The SPOT- blot method is a suitable approach to incorporate these novel types of modifications into histone peptides and screen for interacting protein modules.

There are many examples of combinatorial recognition of histone modifications by protein domains as reviewed by Ruthenburg (Ruthenburg, Li et al. 2007). In the future it will be pivotal to dissect the crosstalk between various histone marks and how they influence binding of chromatin associated macromolecules. Antibodies raised against a single mark on a peptide may not bind to the mark if there is another nearby modification present. Similarly, a protein may have a higher binding affinity for the dual mark. The SPOT-blot arrays can be used in a high- throughput, cost effective manner to incorporate dual modifications into histone peptides to test

various combinations. As a next step we need to be able to distinguish whether dual

modifications co-occur on the same histone/nucleosome molecule or on the different ones in

vivo. This issue can be partially addressed by ChIP-reChIP technique and more recently by a

method that uses nanofluidics and multicolor fluorescence microscopy that can sort out

individual nucleosomes and asses their modification state (Furlan-Magaril, Rincon-Arano et al.

2009; Cipriany, Zhao et al. 2010).

In the future, efforts should be made to develop an approach that is an extension of the

SPOT-blot array method and uses nucleosomes as substrates in place of histone peptides. Small 134

amounts of a variety of recombinant modified nucleosomes can now be produced using genetic and chemical methods. There are methods to introduce site-specific acetylation, methylation, and ubiquitylation marks (Simon, Chu et al. 2007; McGinty, Kim et al. 2008; Neumann, Peak-Chew et al. 2008; Neumann, Hancock et al. 2009; Nguyen, Garcia Alai et al. 2009; Chatterjee,

McGinty et al. 2010; Nguyen, Garcia Alai et al. 2010). The solutions of individual nucleosomes can then be spotted onto a surface of choice, for example glass slides. These glass slides with the immobilized nucleosomes can be used in a similar manner as peptide arrays on a membrane to screen for protein interactions. Using nucleosomes would ensure lower rate of false positive/negative results as compared to peptides since peptides may not contain all the residues required for the recognition. Overall, I believe in the future SPOT-blots will remain to be a good initial screening method that should be utilized to screen dual or novel PTMs and efforts should be made to produce comparable arrays with recombinant modified nucleosomes.

5.2.2 Development of the reagents to study the biological role of MBT-containing proteins

It is important to reconcile the mechanism for MBT-mediated gene regulation since misexpression of MBT-containing proteins has been linked to the development and progression of various diseases (Bonasio, Lecona et al. 2010). Immediate directions of this research include identification of the binding partners for MBT domains, specifically for those that we could not identify any histone substrates, SCMH1, SFMBT1, and SFMBT2. In parallel, one needs to obtain structures of the MBT domains within SFMBT1, SFMBT2, L3MBTL3 and L3MBTL4 proteins, and once ligands are identified also obtain structures of the complexes between MBT domains and their respective ligands. 135

High resolution structures of the MBT-ligand complexes provide the basis for designing

antagonists that modulate the interaction, and even if they will not be of therapeutical value,

almost certainly they can be used for research purposes to study the effects of MBT domains in

isolation. In fact, Stephen Frye’s laboratory at the University of North Carolina has already

identified such inhibitor for L3MBTL1 protein (Herold, Wigle et al. 2010). Our extensive

mutagenesis data provides insights into the Kme1 vs Kme2 recognition and may allow chemists

to design specific and pan-inhibitors of the MBT domains.

A greater challenge, however, is to identify the biological function of the MBT domains

and their association in vivo. There is a great confusion regarding the complexes that MBT-

containing proteins belong to and whether they change depending on the cell type or stage of

development. To gain better understanding of the molecular associations of MBT proteins inside the cells we would need to perform mass spectrometry followed by immunoprecipitations profiling in different cell lines that include disease and normal states. Additionally, in order to confirm in vivo the genome-wide correlation between certain MBT domains and their putative histone mark that was identified in vitro, chromatin immunoprecipitations with massively parallel DNA sequencing (ChIP-seq) should be performed. ChIP-seq data for all MBT- containing proteins and selected chromatin marks would provide insight into the locations in the genome where the two co-occur. However, such studies are inhibited by the lack of good quality antibodies for some MBTs or the marks they bind to (ex. SCML2, L3MBTL3, H2AK36). To circumvent this issue synthetic affinity reagents can be used. Synthetic antibodies produced from the phage-displayed libraries have emerged as a powerful in vitro alternative to the conventional hybridoma technology (Bradbury, Velappan et al. 2003; Bradbury, Velappan et al. 2003; Brekke and Loset 2003; Hoogenboom 2005). Overall, it appears critical that the scientific community 136

first develops good quality reagents, such as antibodies or small molecules, which, in turn will

enable us to study biological implications of the MBT domains more effectively.

5.2.3 Interplay between the UHRF1 TTD with the binary switch of H3K9me3 and H3S10ph

In eukaryotic cells, DNA is tightly packaged with histones in the form of chromatin. The higher order chromatin folding/unfolding events and chromosome segregation during the cell cycle remain unclear. During mitosis, DNA is compacted ~10,000 times to ensure proper segregation of the genetic material to daughter cells. Genome stability and function rely on faithful condensation of chromatin upon entry into mitosis and decondensation during exit from mitosis.

Cis-effects of acetylation and phosphorylation on histones correlate with dynamic aspects of the folding and unfolding of the chromatin fiber by manipulating the overall charge on histones. However, trans-effects have also been implicated in this process. During the cell cycle,

H3S10ph is the most characteristic PTM and is found associated with mitotic chromosomes

during the M phase (Wei, Mizzen et al. 1998). Interestingly, adjacent to it is Lys9, which can

also be modified. Previous studies have identified a “binary switch” whereby specific proteins

are sensitive to the modifications found on histone H3 K9/S10 cassette (Fischle, Wang et al.

2003). Proteins like HP1 have a binding pocket that can only recognize H3K9me3 when the

adjacent Ser10 is unmodified. As soon as the cell enters into mitosis and H3S10 gets

phosphorylated, HP1 dissociates from the chromatin even though H3K9me3 mark is still present

(Fischle, Tseng et al. 2005; Hirota, Lipp et al. 2005). On the other hand, the tandem tudor

domains of UHRF1 also recognize H3K9me3 mark, but our results on SPOT-blots showed that

TTD is insensitive to the presence of a phosphate on H3S10 (Nady, Lemak et al. 2011). In 137

collaboration with Brian Strahl at the University of North Carolina we are investigating the role

of this lack of sensitivity during the cell cycle in more depth.

First, NMR titration experiments will be performed using 15N-labeled TTD and

unlabelled H3K9me3/S10ph peptide. The affinity between this peptide and that of H3K9me3

will be compared. Also, we will analyze and compare the chemical shift perturbations observed

upon addition of H3K9me3 vs H3K9me3/S10ph peptide. If TTD is insensitive to the presence of

H3S10ph mark, there will be no difference in binding affinity and minimal local perturbations

upon addition of the H3K9me3/S10ph peptide. Next, the attempts will be made to design TTD

mutants that are sensitive to the presence of H3S10ph and test their binding in vitro. Lastly, association with chromatin during the cell cycle and cellular role of the TTD mutant sensitive to the H3S10ph mark will be investigated using UHRF1-deficient mouse embryonic stem cells

(Sharif, Muto et al. 2007). This study would provide an elegant example of how complex modification patterns can be translated into different biological responses depending on the binding protein module.

5.3 Closing remarks

The Histone Code hypothesis has emerged as an important mechanism connecting histone modifications with the downstream effects. The innovative method described in this dissertation allows for a simple, high-throughput, and cost effective manner to reveal the binding spectrum of purified recombinant proteins for any of the histone marks. This method served as an excellent screening approach and has offered insight into the recognition of histone modifications by two protein domains, MBT and Tudor. Screening of the entire human MBT domain family revealed binding diversity within the family. Dual recognition of the hallmarks of 138

heterochromatin, unmodified H3K4 and trimethylated H3K9, by the Tudor domains of UHRF1 added to the growing evidence in support of the Histone Code and combinatorial recognition.

Overall, the research described here attempted to address the types of modifications that are recognized by specific set of proteins, but we are still unaware how these proteins result in reorganization of chromatin structure or how they influence cellular processes. Due to the multifaceted functions of histone modifications, understanding their effects, interactions and alterations in health and disease, should be a priority in biomedical research.

I find that the field of epigenetics was founded on trying to understand unusual phenomena. With the increasing amount of knowledge collected over the past several decades has come our understanding that epigenetic mechanisms are responsible for a considerable part of the phenotype of higher organisms. However, even today there are many questions that still remain and as Watson noted in 2003, at the 50th anniversary of double helix proposal of DNA structure, the intricacy lies in the chromatin not the DNA sequence (Watson and Crick 2003). I am grateful that I was able to meet, interact and work alongside a number of great scientists that contributed seminal work to this field and my hope is that the research described herein contributed useful findings to the expanding field of epigenetic research.

5.4 References

Bonasio, R., E. Lecona, et al. (2010). "MBT domain proteins in development and disease." Semin Cell Dev Biol 21(2): 221-30. Bradbury, A., N. Velappan, et al. (2003). "Antibodies in proteomics I: generating antibodies." Trends Biotechnol 21(6): 275-81. Bradbury, A., N. Velappan, et al. (2003). "Antibodies in proteomics II: screening, high- throughput characterization and downstream applications." Trends Biotechnol 21(7): 312- 7. Brekke, O. H. and G. A. Loset (2003). "New technologies in therapeutic antibody development." Curr Opin Pharmacol 3(5): 544-50. 139

Chatterjee, C., R. K. McGinty, et al. (2010). "Disulfide-directed histone ubiquitylation reveals plasticity in hDot1L activation." Nat Chem Biol 6(4): 267-9. Chen, Y., R. Sprung, et al. (2007). "Lysine propionylation and butyrylation are novel post- translational modifications in histones." Mol Cell Proteomics 6(5): 812-9. Cipriany, B. R., R. Zhao, et al. (2010). "Single molecule epigenetic analysis in a nanofluidic channel." Anal Chem 82(6): 2480-7. Fischle, W., B. S. Tseng, et al. (2005). "Regulation of HP1-chromatin binding by histone H3 methylation and phosphorylation." Nature 438(7071): 1116-22. Fischle, W., Y. Wang, et al. (2003). "Binary switches and modification cassettes in histone biology and beyond." Nature 425(6957): 475-9. Furlan-Magaril, M., H. Rincon-Arano, et al. (2009). "Sequential chromatin immunoprecipitation protocol: ChIP-reChIP." Methods Mol Biol 543: 253-66. Herold, J. M., T. J. Wigle, et al. (2010). "Small-molecule ligands of methyl-lysine binding proteins." J Med Chem 54(7): 2504-11. Hirota, T., J. J. Lipp, et al. (2005). "Histone H3 serine 10 phosphorylation by Aurora B causes HP1 dissociation from heterochromatin." Nature 438(7071): 1176-80. Hoogenboom, H. R. (2005). "Selecting and screening recombinant antibody libraries." Nat Biotechnol 23(9): 1105-16. McGinty, R. K., J. Kim, et al. (2008). "Chemically ubiquitylated histone H2B stimulates hDot1L-mediated intranucleosomal methylation." Nature 453(7196): 812-6. Nady, N., A. Lemak, et al. (2011). "Recognition of multivalent histone states associated with heterochromatin by UHRF1." J Biol Chem. Neumann, H., S. M. Hancock, et al. (2009). "A method for genetically installing site-specific acetylation in recombinant histones defines the effects of H3 K56 acetylation." Mol Cell 36(1): 153-63. Neumann, H., S. Y. Peak-Chew, et al. (2008). "Genetically encoding N(epsilon)-acetyllysine in recombinant proteins." Nat Chem Biol 4(4): 232-4. Nguyen, D. P., M. M. Garcia Alai, et al. (2009). "Genetically encoding N(epsilon)-methyl-L- lysine in recombinant histones." J Am Chem Soc 131(40): 14194-5. Nguyen, D. P., M. M. Garcia Alai, et al. (2010). "Genetically directing varepsilon-N, N- dimethyl-L-lysine in recombinant histones." Chem Biol 17(10): 1072-6. Ruthenburg, A. J., H. Li, et al. (2007). "Multivalent engagement of chromatin modifications by linked binding modules." Nat Rev Mol Cell Biol 8(12): 983-94. Sharif, J., M. Muto, et al. (2007). "The SRA protein Np95 mediates epigenetic inheritance by recruiting Dnmt1 to methylated DNA." Nature 450(7171): 908-12. Simon, M. D., F. Chu, et al. (2007). "The site-specific installation of methyl-lysine analogs into recombinant histones." Cell 128(5): 1003-12. Tan, M., H. Luo, et al. (2011). "Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification." Cell 146(6): 1016-28. Watson, J. D. and F. H. C. Crick (2003). "Celebrating the Genetic Jubilee: A Conversation with James D. Watson." Scientific American(288): 66-69. Wei, Y., C. A. Mizzen, et al. (1998). "Phosphorylation of histone H3 at serine 10 is correlated with chromosome condensation during mitosis and meiosis in Tetrahymena." Proc Natl Acad Sci U S A 95(13): 7480-4.