Investigation of the specificity of protein methyltransferases

by

Qazi Muhammad Raafiq

A Thesis submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Biochemistry

Approved, Thesis Committee

______Prof. Dr. Albert Jeltsch

______Prof. Dr. Sebastian Springer

______Dr. Muhammad Naseem

Date of defense: February 22, 2013

School of Engineering and Science

II

Acknowledgements

I would like to thank my supervisor Prof. Dr. Albert Jeltsch for providing me an opportunity to work in his lab. I am grateful for his invaluable guidance, discussions and encouragement in the exciting field of epigenetics.

I am thankful to Prof. Dr. Sebastian Springer and Dr. Muhammad Naseem, Research Scientist, University of Wuerzburg Germany for being the co-referees of my PhD thesis.

I am thankful to Prof. Dr. Arunkumar Dhayalan for his useful guidance and contribution to my lab work when I started to work afresh in the lab.

Many thanks are due to my lab-members for creating a friendly work atmosphere.

I would like to thank all my friends and colleagues Aamir Hussain, Abdul Wakeel, Dr. Abhishek Srivastava, Ali Haider, Farhan Khattak, Ibrahim Sadiq, Imran Khan, Inamullah Safi, Jayanta Gauchan, Jehangir Khan, Marek Malicki, Dr. Martin Kangwa, Dr. Masooma Ibrahim, Dr. Naveed Umar, Nemanja Ivanovski, Qasim Khan, Rehan Latafat, Sandra Becker, Dr. Sanjay Chahar, Dr. Shoaib, Tahirzeb Khan and Tariq Khan for providing me a lively time as well as a healthy distraction.

I would like to thank Andreas Beger, Heinz Seemann, Sabine Beth, Rudolph Sobek, Mercy Sobek and Ilsa Rappold for a wonderful time I had with them.

I am gratefully thankful to my family members especially my mother, sisters and my wife for their great support during my entire stay in Germany.

III

To my family

Be not afraid from the intensity of opposing wind, O eagle It only blows, to make you fly higher and higher. (A translated verse from Allama Iqbal)

IV

Table of contents

Acknowledgements II Dedication III Table of contents IV Abbreviations VI

1. Abstract 1

2. Introduction 3 2.1. Epigenetics 3 2.2. Chromatin 3 2.3. Histone acetylation 6 2.4. Histone methylation 7 2.5. Lysine methyltransferases (KMTs) 9 2.6. Structures of catalytically active SET domains 11 2.9. MLL1/ KMT2A 14 2.9. NSD2 15 2.9. SET2 (KMT3A, SETD2, HYPB) 15

3. Objectives and Approach of this work 16

4. Results and discussion 18 4.1. MLL1 (KMT2A) 18 4.1.1 Substrate Specificity Analysis of MLL1 18 4.1.2. Deriving the specificity profile for MLL1 19 4.1.3 Search for the non-histone targets of MLL1 22 4.1.4 Auto-methylation of MLL1 25 4.1.5 Investigation of the auto-methylation activity of MLL1 26

V

4.2. NSD2 30 4.2.1 Screening of histone substrates methylated by NSD2 30 4.2.2. Screening of random peptides by NSD2 31 4.3. SET2 (KMT3A, SETD2, HYPB) 39 4.3.1 Substrate Specificity Analysis of SET2 39 4.3.2 Product Specificity Analysis of SET2 40 4.3.3 Deriving the specificity profile for SET2 41 4.3.4. Search for the non-histone targets of SET2 44

5. Conclusions 47

6. Materials and methods 48 6.1 Cloning 48 6.2 Site Directed Mutagenesis 50 6.3 Protein Expression and Purification 51 6.4 Synthesis of peptide SPOT arrays 52 6.5 Design of random peptide arrays 52 6.6 Methylation of peptide arrays 53 5.7 Methylation of purified Protein domains and MLL1 mutants 53 6.8 SDS-PAGE and Autoradiography 54 6.9 Scansite database 56 6.10 NSD2 Random Peptide Sequences 56

7. References 65

VI

Abbreviations

AdoHcy/SAH : S-adenosylhomocysteine CLIC1 : Chloride intracellular channel 1 protein COMPASS : Complex of proteins associated with Set1 H1.5K168 : Histone H1.5 lysine 168 H1.5 (161-175) : Histone H1.5 peptide with residues 161-175 H2B (1-20) : Histone H2B peptide with residues 1-20 H3 : Histone 3 H3 (1-20) : Histone H3 peptide with residues 1-20 H3 (15-35) : Histone H3 peptide with residues 15-35 H3 (28-48) : Histone H3 peptide with residues 28-48 H3K18 : Histone 3 lysine 18 H3K36 : Histone 3 lysine 36 H3K4 : Histone 3 lysine 4 H4 : Histone 4 H4 (38-52) : peptide with residues 38-52 H4K20 : Histone 4 lysine 20 H4K44 : Histone 4 lysine 44 HAT : Histone Acetyltransferase HDAC : Histone Deacetylase HMTase : Histone methyltransferase hnRNPL : Heterogeneous nuclear ribonucleoprotein L HYPB : Huntingtin yeast partner B, Huntingtin interacting protein B KAT : Lysine Acetyltransferase KDAC : Lysine Deacetylase KMT : Lysine methyltransferase me1 : Mono-methylation

VII

me2 : Di-methylation me3 : Tri-methylation MLL1 : Mixed lineage leukemia protein 1 NSD2 : Nuclear SET Domain containing protein 2 PHD : Plant homeodomain PRMT : Protein arginine methyltransferase PTD : Partial tandem duplications PTM : Post-translational modification PWWP : Proline-tryptophan-tryptophan-proline motif SAM : S-adenosylmethionine SET : Su(var)3–9, En(zeste) and Trithorax WRAD : WDR5, RbBP5, Ash2L and DPY-30

1

1. Abstract

We followed the specificity profile based method to identify the best substrate and/or to find novel non-histone targets of the MLL1, NSD2 and SET2 protein lysine methyltransferases. Specificity profile is the specific amino acid sequence motif recognized as substrate by the respective methyltransferase's active site.

The MLL1 methyltransferase plays its role in development and leukemia. It exists in vivo, in a multisubunit complex. We showed that MLL1 recognizes a five amino acid residue sequence in which the target lysine is in the center. We investigated the specificity profile to find possible non-histone targets for MLL1, first at the peptide level and then at the protein level in vitro. We found one novel non-histone target at the protein level, the MLL1 protein itself. This auto- methylation of MLL1 could have functional consequences. Of the two MLL1 constructs used in this study, the shorter MLL13812 was significantly more catalytically active than the longer MLL13745.

The NSD2 methyltransferase plays a role in some types of cancer. We showed that NSD2 methylates the peptides H3 (1-20), H3 (15-35), H3 (28-48), H4 (38-52) and H1.5 (161-175) which harbour the potential methylation sites H3K4, H3K9, H3K18, H3K27, H3K36, H4K44 and H1.5K168, but it did not methylate the peptide H4 (10-30) which harbours the H4K20 site. Since, the NSD2 methyltransferase was not specific for any of the histone tail peptides, we devised an unbiased random peptide array approach to investigate its substrate specificity and/or non-histone targets. We found that NSD2 methylated those substrates in which a hydrophobic residue preceded the target lysine. This led us to find a novel and stronger histone site, K168 of histone H1.5 than the original K36 of histone H3 in vitro at the peptide level. The methylation of H1.5K168

2

could have roles in the regulation of expression. It may have connections with nucleosome compaction.

The SET2 methyltransferase has tumor suppressive functions. We derived the substrate specificity profile for SET2 using peptide arrays. We showed that SET2 recognizes a six amino acid residue sequence in which the target lysine is at position 4. We report that SET2 is a robust H3K36 mono- and di- methyltransferase at the peptide level in vitro. We found three non-histone targets for SET2 at the peptide level in vitro. We report that lysine 119 of Chloride intracellular channel 1 (CLIC1, NCC27), lysine 2245 of A kinase anchoring protein 13 and lysine 683 of ASHIL (another H3K36 methyltransferase) are methylated by SET2 at the peptide level in vitro.

3

2. Introduction 2.1 Epigenetics With a PubMed search for the word 'epigenetics' from the year 2008 to 2011, there appears a list of some 3000 plus publications and the list is expanding each day, showing that the field of epigenetics is a fast growing area of science. Historically, in the early 1940s, the idea began to be expressed that there is something beyond the DNA level that is contributing to the phenotype. Basically all cells in the human body have the same genetic makeup. It is the timely expression of the at the right place that allows the cells to take up different biological roles, and they eventually become a part of distinct tissues, e.g., blood and brain (Cheung and Lau 2005). Conrad Waddington got the credit for coining the word epigenetics for describing this phenomenon (Goldberg et al. 2007). The definition of epigenetics has been changing in literature over time. Zhu and Reinberg recently defined epigenetics as ''the inheritance of variation (- genetics) above and beyond (epi-) changes in the DNA sequence'' (Zhu and Reinberg 2011). The modifications governing the epigenetic phenomenon include methylation of DNA and post-translational modifications (PTMs) of histone proteins (Goldberg et al. 2007).

2.2 Chromatin Chromatin is a nucleoprotein structure in eukaryotic cells. Chromatin functionally exists in two states as transcriptionally active euchromatin and transcriptionally inactive heterochromatin (Quina et al. 2006). In eukaryotic genomes, the nucleosome (Figure 1) is the fundamental unit of chromatin which consists of the nucleosome core, the linker DNA and the linker histone H1. The nucleosome core is an octamer which consists of two copies of each H2A, H2B, H3 and H4 histone proteins around which DNA (~146 base pairs) is wrapped in 1.65 superhelical turns (Luger et al. 1997). Histones have unstructured N- terminal and C-terminal tails and a central globular domain (Tan et al. 2011;

4

Peng et al. 2012). Histones are highly conserved DNA-binding, basic and hydrophilic small proteins. Lysine residue is frequently present in histone proteins (Sidoli et al. 2012; Peng et al. 2012). The linker histone H1 plays its role in the formation and compaction of higher order chromatin structures (Peng et al. 2012; Cheung and Lau 2005).

Figure 1: The crystal structure of the nucleosome core particle: brown and turquoise ribbon traces of ~146 bp DNA and the core histones (blue: H3; green: H4; yellow: H2A; red: H2B) (Taken from Luger et al. 1997).

The N-terminal and C-terminal tails, which extend from the central globular domain of the core histones, are subject to post-translational modifications (PTM). Also, the globular core domains of certain histones are subject to such modifications. There are many sites in histones that are modified in different manner (Figure 2; Kouzarides 2007; Mersfelder and Parthun 2006; Tan et al. 2011). Specific place these PTMs on specific amino acid residues. For example, acetyltransferases transfer acetyl groups on lysine residues and

5

methyltransferases place methyl groups on lysine and arginine residues. Similarly, specific enzymes remove these chemical marks (Kouzarides 2007; Berger 2007). In case of histone H3, the N-terminal amino acids can be also cleaved along with the associated modifications (Duncan et al. 2008; Santos- Rosa et al. 2009). Functionally, chromatin is affected by DNA methylation, which correlates with silencing of , and histones PTMs, which affect gene expression in multiple ways (Hermann et al. 2004; Kouzarides 2007). The different histone PTMs on the same tail or on different tails are thought to act sequentially or in combination to specify a biological event (Strahl and Allis 2000). The PTMs provide surfaces for interaction of other proteins called reading domains or ''effectors'' as well as influence the chromatin structure (Campos and Reinberg 2009).

Figure 2: Histone post-translational modifications. Abbreviations: me, mono-methylation; me2, di-methylation; me3, tri-methylation; fo, formylation; ac, acetylation; oh, hydroxylation; and cr, crotonylation. Amino acid residue number is indicated below its

6

sequence. Gray and blank boxes indicate N-terminal and globular core domains, respectively (Taken from Tan et al. 2011).

2.3 Histone acetylation The literature suggests that acetylation and methylation are the best characterized post-translational modifications of histones. Acetylation of histones is highly conserved in eukaryotic genomes (Choi and Howe 2009; Tan et al. 2011). It occurs exclusively on lysine residues. The sites of acetylation are in the N-terminal tails of all histones as well as in the globular core domains of certain histones (Figure 2; Tan et al. 2011). Acetylation renders the positive charge on lysine residues neutral. This character potentially weakens the interactions between DNA and histones leading to the unfolding of chromatin (Kouzarides 2007). The enzymes responsible for acetylation are called histone acetyltransferases (HATs, now called KATs) (Yang and Seto 2008). They utilize acetyl-coenzyme A as the acetyl group donor. The acetyl group is transferred to the ε-amino group on lysine residues (Rice and Allis 2001). The enzymes responsible for deacetylation are histone deacetylases (HDACs, now called KDACs) (Yang and Seto 2008). Acetyltransferases and deacetylases, in general, reside and operate in multisubunit complexes (Shahbazian and Grunstein 2007). Acetylation is correlated with activation of transcription and deacetylation with repression of transcription (Kouzarides 2007). Acetylation of chromatin also provides interaction surfaces mostly for bromodomains (found in e.g., some HATs) (Taverna et al. 2007) and the newly reported PHD1 domain of DPF3b tandem PHD fingers (Sanchez and Zhou 2011). Bromodomains and PHD domain are the reading domains, here, specifically reading the acetylated marks (Figure 3). Lysine residues in non-histone proteins have also been reported to be acetylated. p53 in this regard is important and well characterized because it is linked to cancer (Zhang and Dent 2005; Yang and Seto 2008).

7

Figure 3: Reading domains for acetylated lysine residues; (A) Recognition of H3 lysine 14 acetylated mark by novel PHD1 domain of DPF3b tandem PHD fingers (Adapted from Sanchez and Zhou 2011). (B) Recognition of H4 lysine 16 acetylated mark by the Gcn5p bromodomain. (Adapted from Taverna et al. 2007).

2.4 Histone methylation Methylation of lysine residues in calf thymus histones, was the first identified post-translational modification of histones, which opened up new horizons in the study of histone modifications (Murray 1964; Sidoli et al. 2012). Methylation occurs on lysine and arginine residues. The sites of methylation are in the N-terminal tails, the globular core domains as well as in the C-terminal tails of the histones (Figure 2; Tan et al. 2011). Methylation of histones is different from acetylation, because it does not change the charge of the histones. Also, methylation is more complex than other modifications, because lysine residues can be mono-, di- or tri-methylated. Arginine residues can be mono-methylated or di-methylated and the di-methylation of arginine residues can be symmetric or asymmetric (Bannister and Kouzarides 2011). The enzymes responsible for histone methylation fall in three different methyltransferase categories. Protein arginine methyltransferases (PRMTs) are responsible for arginine methylation.

8

The SET containing proteins (e.g., MLL1, SET2/HYPB and NSD2) and the non- SET containing DOT1L proteins are responsible for lysine methylation (Volkel & Angrand, 2007). All these enzymes utilize S-adenosylmethionine (SAM) as the methyl group donor. In case of lysine methylation, the methyl group is transferred to its ε-amino group, and in case of arginine methylation, the methyl group is transferred to its ω-guanidino group (Bannister and Kouzarides 2011). Histone methylation is reversible. The enzymes responsible for histone demethylation fall in two different demethylase categories based on the difference in reaction mechanisms; the LSD demethylases, and the JMJC demethylases (Kooistra and Helin 2012).

Acetylation of histones, in general, correlates with activation of transcription as stated before, but methylation of histones at lysine residues, correlates with either activation or repression of transcription. Here the biological outcome depends on the site of lysine residue being methylated and the number of methyl groups added (e.g., mono-, di- or tri-methylation) (Martin and Zhang 2005). For example, tri-methylated K4 of histone H3 is correlated with activation of transcription. In contrast, tri-methylated K27 of histone H3 is correlated with repression of transcription. The well characterized methylated lysine residues in histones are K4, K9, K27, K36 and K79 of histone H3, and K20 of histone H4. The reported methylated arginine residues are R2, R8, R17 and R26 of H3, and R3 of histone H4 (Greer and Shi 2012). However, the focus of this study is on lysine methylation. Unlike acetylation, lysine methylation marks the chromatin with methyl groups on lysine residues, which act as interaction surfaces for a variety of reading domains (e.g., PHD, chromo, tudor, PWWP and MBT) (Yun et al. 2011). Lysine residues in non-histone proteins have also been reported to be methylated. p53 protein is a model in this regard and probably best characterized (Huang and Berger 2008).

9

2.5 Lysine methyltransferases (KMTs) Lysine methyltransferases (the human SUV39H1 and the murine Suv39h1) were first discovered in 2000. Since these two enzymes were found to methylate Histone 3 at lysine residue 9, their name became histone methyltransferases (HMTases). They contained a characteristic structural domain called SET (Rea et al. 2000). It is a conserved domain of 130 amino acid residues. This domain was first characterized in three Drosophila genes called ''Su(var)3–9, En(zeste) and Trithorax'' and hence the domain name became SET (Cheng et al. 2005). Later SET domains were identified in other histone methyltransferases as well, with the exception of only one, the DOT1L. They mostly methylated the histone H3 at various lysine residues, and also some lysine sites in histone H4 and H1. However, some of them were found to methylate the non-histone substrates as well (see Table 1). Because of methylating the lysine residues in histone and non-histone proteins, their general accepted name, in literature, became lysine methyltransferases (KMTs) (Table 1; Zhang et al. 2012). Our lab was active and leading in the field of discovery of non-histone substrates (Rathert et al. 2008a; Rathert et al. 2008b; Rathert et al. 2008c; Dhayalan et al. 2011). Worthy to mention here, is the substrate specificity and product specificity of KMTs. Substrate specificity of a KMT is its ability to methylate a specific lysine residue in its substrate or substrates. For example, SET7/9 (KMT7) and MLL1 (KMT2A) methylate the lysine residue 4 in the histone H3 substrate. Product specificity of a KMT is its ability to add one, two or three methyl groups to its substrate or substrates. Based on this, KMTs can be distinguished into lysine mono-, di- or tri-methyltransferases (Del Rizzo and Trievel 2011). Now, it has been established that this product specificity partially comes from a conserved structural motif, 'ELx(F/Y)DY', within the SET domain of KMTs (Upadhyay and Cheng 2011). KMTs that have the tyrosine residue in this motif (shown in bold) are able to add only one methyl group to their substrate's lysine residue. SET7/9 and MLL1 both contain a tyrosine residue in this motif and

10

are, therefore, H3K4 mono-methyltransferases. In contrast, KMTs that have the phenylalanine residue in this motif are able to add more than one methyl group to their substrate's lysine residue e.g., G9a (KMT1C) (Upadhyay and Cheng 2011; Del Rizzo and Trievel 2011; Scharf and Imhof 2011).

Family Other name Substrates Histone Non-histone KMT1 KMT1A SUV39H1 H3K9me3 KMT1B SUV39H2 H3K9me3 KMT1C G9a/EHMT2 H3K9me2 p53, G9a, C/EBPβ, Reptin, RARα, H3K27me2 DNMT1, CDYL1, WIZ, ACINUS H1.4K26me2 H1.2K187me2 KMT1D GLP/EHMT1 H3K9me2 p53 KMT1E ESET/SETDB1 H3K9me3 Tat KMT1F SETDB2 H3K9me3 KMT2 KMT2A MLL H3K4me3 KMT2B MLL2 H3K4me3 KMT2C MLL3 H3K4me3 KMT2D MLL4 H3K4me3 KMT2E MLL5 H3K4me3 KMT2F hSET1A H3K4me3 Dam1 (Saccharomyces cerevisiae) KMT2G hSET1B H3K4me3 KMT2H ASH2 H3K4me3 KMT3 KMT3A SET2 H3K36me3 KMT3B NSD1 H3K36me2 NFκB H4K20me2 KMT3C SMYD2 H3K36me2 p53, RB H3K4me KMT3D SMYD1 H3K4me KMT3E SMYD3 H3K4me3 VEGFR KMT4 KMT4 DOT1L H3K79me2/3 KMT5 KMT5A SET8 H4K20me1 p53 KMT5B SUV420H1 H4K20me3 KMT5C SUV420H2 H4K20me3 KMT6 KMT6A EZH2 H3K27me3 KMT6B EZH1 H3K27me3 KMT7 KMT7 SET7/9 H3K4me1 p53, TAF7, TAF10, ERα, AR, DNMT1, NFκB, PCAF, RB, E2F1, STAT3, Tat KMT8 KMT8 PRDM2/RIZ1 H3K9me3

Table 1: The human lysine methyltransferases (KMTs) and their histone and non- histone substrates (Adapted from Zhang et al. 2012).

11

2.6 Structures of catalytically active SET domains The structures of some SET domains have been solved with the substrate peptide and the reaction product S-adenosylhomocysteine (AdoHcy) (Figure 4). The SET domain consists of three parts called SET-N, SET-I and SET-C (Figure 4C showing the MLL1-SET sub-domains). SET-I part varies among different KMTs. The region preceding SET-N is called pre-SET, it is important for stabilizing the structure (Cheng et al. 2005; Upadhyay and Cheng 2011). The region succeeding SET-C is called post-SET. The general structural characteristic of all SET domains is that they form ''a novel β-fold with a series of curved β-strands forming several small sheets'' (Cheng et al. 2005). The common feature among these structures is the formation of a knot-like structure which brings the RFINHxCxPN and ELx(F/Y)DY motifs located in the SET-C together and thereby forms the catalytically active centre (Upadhyay and Cheng 2011). This centre consists of two separate sites, one for SAM binding and the other for substrate peptide binding formed with the help of residues from post- SET (Upadhyay and Cheng 2011). The substrate binding site forms a cleft for interaction with the substrate peptide and a characteristic hydrophobic channel through which the target lysine side chain is passed. This channel is essentially required for two reasons; the tight interaction between the hydrophobic amino acid residues of the channel walls and the target lysine side chain, and efficient catalysis. The distinguishing structural feature of SET containing KMTs from other SAM-dependent methyltransferases is that the SAM binding place is on the opposite face of SET domain to the substrate binding place, and the channel passing through the core of the SET domain connects the substrate to the SAM binding place (Dillon et al. 2005).

However, it is worth to mention that the SET domain of MLL1 methyltransferase contains an open catalytic centre in its solved crystal structure. On the contrary, other solved crystal structures of SET domains possess a

12

closed catalytic centre (Southall et al. 2009; Cosgrove and Patel 2010; Qian and Zhou 2006). Both MLL1 and SET7/9 are histone H3 lysine 4 methyltransferases as stated before. The hydrophobic channel of the SET7/9 methyltransferase consists of residues from the SET-I, SET-C and post-SET regions (Figure 4D) (Southall et al. 2009). This narrow channel encloses the lysine side chain and positions it in the active center of the enzyme where the chemical environment stimulates the deprotonation of the lysine which is necessary for methyl transfer to take place (Southall et al. 2009). In contrast, in MLL1 the SET-I and post-SET domains form two separate sub domains and the substrate binds in a more open channel at the bottom of the cleft between these two parts. In this binding mode the target lysine remains relatively exposed and flexible which might explain why a conformational change leading to closure of the two subdomains is needed to activate the enzyme (Figure 4C) (Southall et al. 2009). Also, the SAM binding place in comparison to other solved SET domain structures is displaced in MLL1. The distinct structural features of the catalytically active site in MLL1 explain two things; the basis for its low methyltransferase activity, and the need for additional proteins e.g., WRAD-complex explained in the next section. Only the additional proteins can make the channel narrow and closed, similar to the channel found in SET7/9, and this will efficiently accommodate the side chain of the target lysine for catalysis (Southall et al. 2009; Cosgrove and Patel 2010).

13

Figure 4: Representative examples of SET domain structures with substrate peptide and AdoHcy. Ribbon diagrams of (A) MLL1 (B) SET7/9 (Taken from Upadhyay and Cheng 2011). (C) X-Ray crystal structure of MLL1 bound to AdoHcy (yellow) and histone H3 peptide (purple). pre-SET; white, SET-N; blue, SET-I; yellow, SET-C; green and post-SET; grey (Taken from Cosgrove and Patel 2010). (D) Surface representation of the lysine access channel viewed from the peptide-binding side in Set7/9. Key residues are shown in stick formation and the substrate lysine side chain is shown in yellow (Taken from Xiao et al. 2003).

14

2.7 MLL1/ KMT2A Mixed Lineage Leukemia 1 gene was first characterized in translocations of . Later it was reported as a histone methyltransferase. The translocations in cancer patients affect MLL1 expression by two mechanisms, which lead to a disease condition called leukemia. The N-terminus of MLL1 is fused with more than 50 translocation partner genes and these fusions do not have methyltransferase activity. Secondly, some exons from the MLL1 itself fuse with one another creating MLL-PTD (partial tandem duplications) and such fusions have the methyltransferase activity as well (Krivtsov and Armstrong 2007; Chi et al. 2010; Smith et al. 2011).

MLL1 is a histone H3 lysine 4 methyltransferase (Nakamura et al. 2002). Catalytically, it is a slow methyltransferase on its own (Southall et al. 2009; Patel et al. 2009). Like the yeast H3K4 methyltransferase Set1, MLL1 exists in a COMPASS (complex of proteins associated with Set1)-like complex (Smith et al. 2011). The complex members reported to associate with MLL1 are WDR5, RbBP5, Ash2L and DPY-30 (WRAD-complex). Addition of these complex members to MLL1 has been found to enhance the MLL1 driven H3K4 methylation (Southall et al. 2009; Patel et al. 2009; Odho et al. 2010). The knockdown of RbBP5, Ash2L and WDR5, has been found to negatively correlate with MLL1 driven histone H3 lysine 4 di- and tri-methylation. Only mono- methylation remained in place (Dou et al. 2006; Cao et al. 2010). There is an arginine containing '' WIN motif '' in MLL1 with which one face of WDR5 interacts, while the opposite face of WDR5 interacts with a conserved motif in RbBP5 (Avdic et al. 2011).

15

2.9 NSD2 The Nuclear receptor SET domain-containing protein 2 (NSD2) histone methyltransferase belongs to the NSD family of proteins. This protein family includes two other members, NSD1 and NSD3. All these three members have 70–75% similarity in the nucleotide sequence (Kurotaki et al. 2001). There is not much in literature about the NSD family of proteins. Amongst other reported substrates, H3K36 is the common substrate for the NSD family of proteins. Defects in any of the three NSD family members have been reported to be linked to various diseases and some types of cancer (Wagner and Carpenter 2012).

NSD2 is a histone H3 lysine 36 methyltransferase (Li et al. 2009; Kuo et al. 2011). It is a mono- and di-methyltransferase in vitro but in vivo it exhibits only the di-methyltransferase function on histone H3 lysine 36 (Kuo et al. 2011). Its over-expression is linked to some diseases including some types of cancer (Kuo et al. 2011). Currently, NSD2 is not present in the KMT classification despite being a histone lysine methyltransferase (Table 1; Zhang et al. 2012). Its crystal structure has not been solved so far.

2.10 SET2 (KMT3A, SETD2, HYPB) SET2 was initially found to interact with huntingtin protein and was named as Huntingtin interacting protein (Faber et al. 1998; Passani et al. 2000; Rega et al. 2001). It is a histone H3 lysine 36 methyltransferase (Sun et al. 2005) acting as mono-, di-, and tri-methyltransferase in vitro. However, in vivo it is responsible only for tri-methylation function on histone H3 lysine 36 (Yuan et al. 2009). The details about its crystal structure are not yet known.

16

3. Objectives and Approach of this work

Protein lysine methylation was first discovered in histone proteins, hence the respective enzymes were named as histone methyltransferases. However, lysine methylation is not solely confined to histone proteins, but non-histone proteins are also subject to methylation on their lysine residues by the so called ''histone methyltransferases'' (also see table 1) (Rathert et al. 2008a; Rathert et al. 2008b; Rathert et al. 2008c; Dhayalan et al. 2011). In a long term effort to understand the role of lysine methylation as post-translational modification and refine our knowledge about the biological function of lysine methyltransferases, it was the aim of this work to identify novel non-histone substrates of known enzymes, previously categorized as ''Histone'' methyltransferases. By finding novel methylation substrates, identifying the responsible enzymes and later studying the biological role of the modification, we aim to contribute to the better understanding of protein lysine methylation as PTM of proteins.

To identify potential MTases, suitable for our work, I started this PhD study with the cloning of some prospective SET containing histone methyltransferases as well as some putative reading domains. After activity analysis, we selected the histone methyltransferases MLL1, NSD2 and SET2 for the further work, because they showed activity in our assay system which employs peptide arrays, which were synthesized on cellulose membranes by the SPOT-synthesis technique (Frank 2002). To set a starting point, we first screened the histone tail peptides with each of these methyltransferases. The main goal for each of these three methyltransferases was to determine its specificity profile. By specificity profile, we mean the specific amino acid sequence motif that is recognized as substrate by the respective methyltransferase’s active site. Based on that, we predict the potential novel non-histone target proteins. We searched data bases with the derived recognition motif, to retrieve human proteins containing corresponding motifs in an effort to find the potential targets.

17

Once the potential targets were identified in the search engine, the next step was to prepare a peptide array containing the predicted target sites of each novel substrate. In this way, the potential target sequences were first investigated at the peptide level for methylation activity with the respective methyltransferase. It turned out that some peptides were methylated while others not. The sequences which were methylated represented possible methylation targets at the peptide level. The proteins corresponding to the peptide substrates were cloned either in whole or as sub-domains which contained the target site. Subsequently, the purified target proteins were investigated for methylation activity with the respective methyltransferase. Unlike MLL1 and SET2, we designed novel random peptides for NSD2 since it was not specific for any of the histone tail peptides.

18

4. Results and discussion 4.1 MLL1 (KMT2A) 4.1.1 Substrate Specificity Analysis of MLL1 Lysine methylation in non-histone proteins by lysine methyltransferases (KMTs) is an emerging field (table 1). A C-terminal fragment of the MLL1 methyltransferase (residues 3745–3969, including the pre-SET and post-SET segments) was cloned in pGEX-6P-2, because this part was published as a catalytically active domain by Nakamura et al. 2002. Later a second fragment of MLL1 comprising residues 3812–3969 (including the pre-SET and post-SET segments) was cloned in pGEX-6P-2 for expression as GST-MLL1 fusion proteins (Figure 5A). Both the constructs yielded soluble proteins, which precipitated in dialysis buffer I, for which we had to add elution buffer for dilution.

We first incubated the MLL1 methyltransferase with the histone tail screening array in the presence of radioactively labeled SAM. The histone tail screening array consists of ten peptides of 20 amino acid length containing the known lysine methylation sites on histone proteins. The transfer of methyl groups to immobilized peptides was detected by autoradiography and showed that MLL1 specifically methylated histone H3 peptide (residues 1-20) (Figure 5B with the MLL13745 construct). To determine which lysine residue/residues was/were specifically methylated in histone H3 peptide (residues 1-20) by MLL1, we performed an additional experiment in which lysine 4 and lysine 9 were replaced with alanine in histone H3 peptides (residues 1-20). MLL1 did not methylate the peptide with lysine 4 exchanged to alanine which means that the site of methylation was lysine 4 in the histone H3 peptide (residues 1-20) (Figure 5C) consistent with the reported site in literature (Nakamura et al. 2002). This indicated that for the next step of finding the specificity profile we could use the histone H3 tail sequence containing lysine 4 as a starting point.

19

Figure 5: Methyltransferase activity of MLL1 (A) Coomassie Blue stained SDS PAGE gel showing both the constructs MLL13812, size ~45 kDa and MLL13745, size ~55 kDa in lane 1 and lane 2 respectively. (B) MLL1 specifically methylated histone H3 (residues 1- 20) peptide. (C) MLL1 specifically methylated histone H3 (residues 1-20) peptide at lysine 4 position. In images B and C, the reaction components were SAM, MLL13745 and peptides and the images were developed by autoradiography. Dark spots represent methylation activity.

4.1.2 Deriving the specificity profile for MLL1 The next experiment was designed to determine the critical substrate residues around lysine 4 required for the successful methyl group transfer by MLL1. Each residue in histone H3 tail (residues 1-15) was exchanged against each of the 20 natural amino acids resulting in a peptide array comprising 320 peptide spots. This array also included one wild type histone H3 (residues 1-15) peptide in the beginning of each row as a positive control. The array was incubated with MLL13745 in the presence of SAM. This experiment showed that

20

exchange of lysine at position 4 against any other amino acid residues, abolished MLL1 methyltransferase function. This means that the wild-type lysine is the only substrate residue accepted at this position by the catalytically active site in MLL1 (thus we designated this position as 0) (Figure 6A). The array further showed that the enzyme mainly recognizes the motif (RILK) -2 (Not DG) -1 (K) 0 (RNQHMSTY) +1 (Not RK) +2 which is a relaxed motif. There also exists discrimination in the degree of acceptance amongst the accepted amino acid residues. For example, arginine at the designated position -2 which is also the wild-type residue in the H3 tail (Figure 6A) is more preferable than isoleucine, leucine and lysine, but we also included them in the specificity profile. Instead of the wild-type threonine at position -1 (Figure 6A), isoleucine, leucine, valine and tyrosine are the preferable residues at this position. The only residues at position -1 that led to the complete loss of MLL1 methyltransferase function were aspartic acid and glycine. So at position -1, we took all the other residues except aspartic acid and glycine into the specificity profile. The wild-type residue glutamine at position +1 can be exchanged with other amino acids like arginine, asparagine, histidine, methionine, serine and threonine equally over the rest. The wild-type threonine at position +2 (Figure 6A) as well as isoleucine and valine are the preferable residues at this position. Based on the discrimination of methylation signal among peptides, the specificity profile can be narrowed down to include only the most preferred residues: (R)-2 (ILYV)-1 (K)0 (RNQHMST)+1 (TVI)+2.

21

Figure 6: Peptide based specificity profile of MLL1 (A) Testing the library of peptides with all possible amino acid exchanges in histone H3 (residues 1-15) with MLL1. The exchanged amino acids are shown on the left side of the image. The wild type H3 tail sequence is shown on top of the image with the target lysine 4 highlighted in red and designated as 0. The residues on the right of this lysine are designated with + sign and to the left with - sign. The enzyme mainly recognizes the motif (RILK)-2(Not DG)-1 (K)0(RNQHMSTY) +1 (Not RK) +2. Not DG means that MLL1 accepts all the residues at this position except aspartic acid and glycine. Not RK means that MLL1 accepts all the residues at this position except arginine and lysine. The reaction components were SAM, MLL13745 and peptides, and the image was developed by autoradiography. Dark spots represent methylation activity. The experiment was repeated at least 2 times and was reproducible. (B) Standard deviations of two independent experiments.

Our data do not provide information whether the substrate specificity profile of MLL1 might change in the presence of the other complex proteins (WRAD) to MLL1. However, what we know from the current study is that MLL1 SET domain alone with its pre-SET and post-SET is sufficient to determine

22

substrate recognition. One additional feature that we came to know from the specificity profile of MLL1 is that although both Set7/9 and MLL1 methylate the same substrate (histone H3 at lysine 4), there is little common in the substrate specificity profiles of these two enzymes (Dhayalan et al. 2011).

4.1.3 Search for the non-histone targets of MLL1 We initially searched the scansite database with the derived recognition motif (RILK) -2 (Not DG) -1 (K) 0 (RNQHMSTY) +1 (Not RK) +2 and found numerous proteins matching this motif. Finally, we searched the HPRD (Human Protein Reference Database) website for the known MLL1 interaction partners and found 18 proteins interacting with MLL1 which also included the MLL1 protein itself. Except Menin, all these interacting proteins had the above recognition motif; all were nuclear proteins except two. Based on their interaction with MLL1, we thought of these proteins as the most promising novel non-histone targets. We first took their amino acid sequences containing the putative methylation sites with the lysine in the center, and synthesized a peptide array, which consisted of peptides of 20 amino acid residues. Subsequently, the peptide array was investigated for the methyltransferase activity of MLL1 in the presence of SAM (Figure 7). We also included histone H3 (residues 1-20) peptides as positive controls, two peptides in the beginning and two peptides in the end of the array. The highest activity was observed on spots 12D and 13D, both contain sequences from MLL1 itself. Their peptide sequences are located in between the CXXC motif and the first PHD finger of MLL1. Other putative targets were also methylated, but their methylation signal was lower than the H3 tail sequence (Table 2). For the rest of the putative targets, we did not observe a methylation signal.

23

Figure 7: Putative non-histone target peptides of MLL1 based on the specificity profile. The reaction components were SAM, MLL13745 and peptides, and the image was developed by autoradiography. Dark spots represent methylation activity. The protein sequences are listed in table 2. Each peptide spot consists of 20 amino acid residues. The starting two peptides (1A and 2A) and the ending two peptides (10E and 11E) represent histone H3 (residues 1-20) peptides as positive controls.

Peptide Peptide sequence Protein Name No. 1A ARTKQTARKSTGGKAPRKQL Histone H3 (1-20)

2A ARTKQTARKSTGGKAPRKQL

3A HSGNTYFLRKQANLKEMCLS ASH2 like

8A PPLPLPPLKKRGNHSTGLCL FAS ligand

15A RWEWKRLKAKTPKNGPPPCP Host cell factor C1

5B VKGDGKSKKKQAGRPKGSKG Retinoblastoma binding protein 5

11B IIASAALENDKTIKLWKSDC WD repeat domain 5

12B VYKLVPGLFKNEMKRRRDFY Polycomb complex protein BMI1

13B MHLRKFLRSKMDIPNTFQID

19B GFVCDNCLKKTGRPRKENKF CREBBP

3C CSLPSCQKMKRVVQHTKGCK

24

4C VQHTKGCKRKTNGGCPVCKQ

7C IVLMYRRKSKQALRDYKKVQ Plexin B1

9C CDTISQAKEKMLDQLYKGVP

16C LPLRLQRRLKRPETPTHDPD Protein phosphatase 1, subunit 15A

19C CLTAPNYRLKSLQKKASTSA TAF9 RNA polymerase II

12D ETSVRGPRIKHVCRRAAVAL The un-catalytic portion of MLL1

13D PKFGGRNIKKQCCKMRKCQN

17D WICTKCVRCKSCGSTTPGKG

18D PLDLEGVKRKMDQGNYTSVL MLL1 BROMO (The un-catalytic portion of MLL1) 19D NYHFMCSRAKNCVFLDDKKV MLL1 PHD4 (The un-catalytic portion of MLL1) 6E QEARSNARLKQLSFAGVNGL The un-catalytic portion of MLL1

8E MPMRFRHLKKTSKEAVGVYR The Catalytic portion in our cloned MLL1 9E RSIQTDKREKYYDSKGIGCY The Catalytic portion in our cloned MLL1 10E ARTKQTARKSTGGKAPRKQL Histone H3 (1-20)

11E ARTKQTARKSTGGKAPRKQL

Table 2: The peptide sequences of MLL1 interacting proteins from HPRD website correspond to peptides in Figure 7.

Next, we proceeded to clone the methylated non-histone targets in pGEX- 6P-2 and purified them, but at the protein level the methylation of non-histone targets was not convincing in SDS gel based radioactive assays and required longer than usual x-ray film exposure times (data not shown). However, in these experiments, MLL1 was active on its original substrate, the recombinant histone H3. Of the two MLL1 constructs that we used in this study, the shorter MLL13812 was significantly more active than the longer MLL13745 although we followed the

25

same expression, purification and methylation conditions (Figure 8). Patel and colleagues also used the type of constructs matching ours, but they did not see any difference in the activity of the two constructs (Patel et al. 2008). One difference in our setup was that the construct MLL13812 was one residue shorter than theirs. Alternatively, they may not have detected a difference in the activity of the two constructs because their methylation buffer was different from ours.

4.1.4 Auto-methylation of MLL1 While investigating the methylation of putative non-histone proteins by MLL1, we observed a strong methylation signal in a place which corresponded to the size of the respective GST-MLL1 protein itself. This MLL1 auto-methylation was much stronger than the methylation of any other non-histone protein (Figure 8). Next, we evaluated the specificity profile data in the light of the auto- methylation activity of MLL1. The protein sequences of the two cloned MLL1 constructs were checked for the substrate recognition motif of MLL1. We found two recognition motifs in the MLL1 which are LKKTS and REKYY with the predicted automethylated central lysine highlighted in red. These two recognition motifs are located in the SET-N (K3825R) and the variable region of SET-I (K3873R) of both the MLL1 constructs, MLL13745 and MLL13812 (Figure 4C). Peptides 8E and 9E in figure 7 and table 2 represent these two auto-methylation sites.

26

Figure 8: Auto-methylation and recombinant H3 methylation by MLL13745 and MLL13812. Dark bands represent methylation activity. The reaction components were SAM, the respective MLL1 and the recombinant histone H3, and the image was developed by autoradiography. H3 was added as substrate first. MLL1 equally methylated both its original substrate histone H3 as well as itself.

4.1.4 Investigation of the auto-methylation activity of MLL1 To find the auto-methylation site/sites, we did site-directed mutagenesis of these two lysine residues (indicated in red with the number) in motifs LKK3825TS and REK3873YY and replaced them with arginines in each of the MLL1 methyltransferases. This resulted in two single site mutants for MLL13745 and two single site mutants for MLL13812.

In methylation reactions, the wild-type MLL1 was included as a positive control (lane 1 and 4, Figure 9). We were expecting that one of the mutants, MLL1 (K3825R) or MLL1 (K3873R), will not show the auto-methylation signal,

27

because the potential target lysine has been exchanged to arginine. Also, to investigate if these lysine to arginine exchanges may have a possible role in changing the active site of the MLL1, we included the recombinant H3 in the same experiment to see if mutants are still methylating the original substrate H3 (lane 5 and 6, Figure 9). Surprisingly, the results showed that there is no change in both the auto-methylation signal as well as the substrate methylation signal with these two mutants (lane 2 and 3, Figure 9 showing mutants of MLL13812).

Figure 9: The methyltransferase activity of the wild-type MLL13812, mutant MLL13812 K3825R and mutant MLL13812 K3873R. (A) Coomassie Blue stained SDS PAGE gel showing the wild-type MLL13812, mutant MLL13812 K3825R, mutant MLL13812 K3873R and H3. (B)The reaction components were SAM and the respective MLL1 in the first three lanes. The reaction components were SAM, the respective MLL1 and the recombinant histone H3 in the last three lanes. H3 was added as substrate first. Dark bands represent methylation activity. The image was developed by autoradiography. There is no loss/change of auto-methylation as well as recombinant H3 methylation in case of mutants. The experiment was repeated twice. Information about mutants is given in the site directed mutagenesis section of materials and methods.

28

How can this result be explained? Based on the specificity profile, we did not find any other potential lysine residue in MLL1 that could be the target for automethylation. One possibility is that MLL1 may methylate both these (lysine 3825 and lysine 3873). Since, we prepared single mutants (K3825R and K3873R), it may happen that the methylation signal that we see is because one of the potential lysines is still available for MLL1 to deposit the methyl groups to, and that may be the reason that we still see the auto-methylation activity. For this, the next step to investigate auto-methylation would be to generate a double mutant in which both the potential lysines are unavailable for MLL1 to deposit the methyl groups to.

In addition, the question arises, whether perhaps the arginine used to replace the lysine because of its similar physicochemical properties may be methylated by MLL1, because arginine methylation is chemically possible and Protein arginine methyltransferases are a known class of enzymes. To test this, the residues need to be exchanged to any other residue except lysine and arginine, like alanine and investigate if the mutant has lost the auto-methylation activity.

Alternatively, auto-methylation may occur at another site, which was not tested in our experiment. This would open up a new quest, in the search of auto- methylation site/sites, which will need us to scan the whole backbone of MLL1. For this, the next step would be, to design a peptide array comprising the whole amino acid sequence of MLL1. This array will comprise peptides of 15 amino acid length and then based on its scanning find the auto-methylation site/sites. In addition, the MLL SET domain protein needs to be investigated by mass spectrometry after tryptic digest in order to try to identify the residue of modification.

29

The self-methylation of MLL1 has not been reported in literature before. Histone H3 lysine 4 methylation is well documented for its roles in the regulation of gene expression. About auto-methylation of MLL1, we at this point, cannot say whether it will affect some epigenetic mechanisms. Auto-methylation may relate to regulation of gene expression in a number of ways. It may regulate the MLL1 activity. It may provide a docking site for the binding of reading domains. MLL1 itself contains four PHD domains and one bromodomain that may bind to methylated MLL1. It is also an interesting question if addition of the complex proteins (WRAD) to MLL1 would have any effect on the auto-methylation behavior of MLL1. Maybe they are binding to auto-methylated MLL1.

30

4.2 NSD2 4.2.1 Screening of histone substrates methylated by NSD2 A fragment of NSD2 (residues 991-1212) which includes the pre-SET and post-SET region, was cloned in pGEX-6P-2 for expression as GST-NSD2 fusion protein (Figure 10A). Earlier, we cloned the SET domain of NSD2 alone, but it was inactive despite a good purification (Figure 10A).

Incubation of the NSD2 methyltransferase with the histone tail screening array showed that NSD2 did not specifically methylate any of the histone tail peptides (Figure 10). It showed methyltransferase activity on histone H3 (1-20), H3 (15-35), H3 (28-48) almost equally which harbour potential methylation sites like lysine 4, lysine 9, lysine 18, lysine 27 and lysine 36. Also, NSD2 showed methyltransferase activity on histone H2B peptides with residues H2B (1-20) and H2B C terminal 20 residues (106-125) which harbour potential methylation sites (Figure 10 and Figure 2). Though the literature strongly suggests that histone H3 lysine 36 is the primary target for NSD family of proteins including NSD2 (Li et al. 2009; Kuo et al. 2011), in our experiments, NSD2 did not specifically methylate any of the histone tail peptides (Figure 10). The reported lysine 20 site in histone H4 (Pei et al. 2011), does not seem to be a true methylation site in our peptide array experiment (Figure 10), because the peptide H4 (10-30) was not significantly methylated.

31

Figure 10: Methyltransferase activity of NSD2 (A) Coomassie Blue stained SDS PAGE gels showing the two proteins from constructs GST-NSD2 with its pre-SET and post-SET (~47kDa) and GST-NSD2 SET only (~39kDa) (B) Dark spots represent methylation activity. The reaction components were SAM, NSD2 (~47kDa) and peptides, and the image was developed by autoradiography. NSD2 methyltransferase activity is not specific for any of the histone tail sequences at the peptide level.

4.2.2 Screening of random peptides by NSD2 Based on the observation that NSD2 was not specific for any of the histone peptides, we devised a random peptide approach to study the specificity of NSD2 which also may be applied to other SET domains which did not show activity on the histone tails (e.g., SET5 and PRDM1). The basic principle of this approach was that each KMT recognizes its substrate by interaction with its specific recognition sequence. We might find this recognition sequence by providing all possible amino acid residues, in different positions, in substrates. The activity and preference of the KMT on these substrates would lead us to its specific recognition sequence. The first random peptide array was designed by placing a lysine in the centre of a 15-mer peptide because KMTs methylate lysine

32

residues. The residues immediately adjacent to this central lysine (at +1 and -1 position) were substituted by each of the 17 amino acid residues except cysteine, methionine and tryptophan in every possible permutation. The remaining 12 residues (6 on each side) were randomly assigned such that there was a statistical representation of each amino acid at each position in every possible permutation. This resulted in 17×17 = 289 peptides plus 4 histone tail peptides representing some lysine sites in histones H3 and H4, a total of 293 peptides (Figure 11). We investigated these peptides for the methyltransferase activity of NSD2 in the presence of SAM. The peptide which showed the highest methylation was assigned a 100% activity value. Relative to this peptide, other peptides were also assigned values in percent activity depending on the methylation activity on them (Figure 11). We selected 8 highly methylated peptides from this experiment which became the basis for the second round of random peptides.

Figure 11: NSD2 relative methyltransferase activity on first random peptide array with 293 peptide spots. Dark spots represent methylation activity. The reaction components were SAM, NSD2 (~47kDa) and peptides, and the image was developed by autoradiography. At the top right, are eight selected peptides which were strongly methylated in the array shown at left and were the basis for the second round of random peptides. All the peptide sequences for this array are given in Table 8.

33

Further we designed a second round of random peptides for NSD2 methylation. We took the information from the methylated peptides of the first random peptide array. The residues in these methylated peptides indicated the preferred substrate residues around the central lysine by NSD2. This helped us to narrow down the choice of amino acids around the central lysine. We took the preferred residues around the central lysine to the -4 position on one side and to the +4 position on the other side. The remaining 3 amino acid residues in the 15- mer peptide on each side were randomly selected, in a way, that each site represented each amino acid residue statistically in every possible permutation. This resulted in an array of 168 peptide spots containing the histone H3 lysine 36 peptide in the beginning. In this array, we also included the previous 5 best peptides from the first random peptide array because they contained multiple lysines. Here, we included them with single site lysine exchange to alanine (the last 17 peptides in figure 12). We investigated this peptide array for the methyltransferase activity of NSD2 in the presence of SAM. We observed that at position -1, a hydrophobic residue was present in most highly methylated peptides. Also, we observed that if one of the lysine residues is converted to alanine, the other left over lysine residues in the peptide show some residual methylation signal. This means that here we also see the additive role of multiple lysines (Figure 12; peptide No. 5H-21H).

34

Figure 12: NSD2 relative methyltransferase activity on second random peptide array with 168 peptide spots. Dark spots represent methylation activity. The reaction components were SAM, NSD2 (~47kDa) and peptides, and the image was developed by autoradiography. All the peptide sequences for this array are given in Table 9.

Unlike the first and second random peptide arrays, the next round of third random peptide array was designed to include only a single lysine in each peptide to exclude the effect of multiple lysines. We took the information from the methylated peptides in the first and second random peptide arrays and shuffled the preferred residues in all possible ways surrounding the central lysine like before. This resulted in a total of 175 peptides, which also included the histone H3 (residues 10-25) peptide harboring the H3 lysine 18 site, H3 (residues 30-45) peptide harboring the H3 lysine 36 site, H4 (residues 38-52) peptide harboring the H4 lysine 44 site and H1.5 (residues 161-175) peptide harboring the H1.5 lysine 168. We included these peptides, because their sequences were similar to the histone H3K36 sequence. We did not include the histone H4 lysine 20 peptide because it was not similar to the H3K36 sequence and we showed before that it was not methylated. We also included lysine to alanine mutations for H3K36, H4K44 and H1.5K168 peptides. We investigated this peptide array for

35

the methyltransferase activity of NSD2 in the presence of SAM. We observed four highly methylated peptides from those sequences which we selected based on the presence of single lysine (peptides 7C, 16C, 4D and 2F; Figure 13 and table 3). Also, all the four histone tail peptides were methylated. The predicted target lysine to alanine mutated peptides of histone tails did not show any methylation signal. This means that NSD2 methylated histone H3 at lysine 36, histone H4 at lysine 44, and histone H1.5 at lysine 168 (peptides 1I, 2I, 4I and 6I; Figure 13 and table 3). Moreover, the most highly methylated peptide was that of H1.5K168 (residues 161-175) (6I; Figure 13).

Figure 13: NSD2 relative methyltransferase activity on third random peptide array with 175 peptide spots. Dark spots represent methylation activity. The reaction components were SAM, NSD2 (~47kDa) and peptides, and the image was developed by autoradiography. All the peptide sequences for this array are given in Table 10.

36

Peptide Peptide sequence Protein No. 7C LARFSGFKTLHPDRQ Randomized sequence

16C ANPDHRVKTLSRQIP Randomized sequence

4D GGHVSGIKGVNPNGR Randomized sequence

2F YNGFSFIKAVQPRMG Randomized sequence

1I STGGKAPRKQLATKA H3K18 (residues 10-25) peptide

2I SAPATGGVKKPHRYR H3K36 (residues 30-45) peptide

4I ARRGGVKRISGLIYE H4K44 (residues 38-52) peptide

6I KPAAAGVKKVAKSPK H1.5K168 (residues 161-175) peptide

Table 3: The peptide sequences of highly methylated substrates which correspond to peptides in Figure 13.

For the eight highly methylated peptides (table 3) in Figure 13 from the third random peptide experiment, we designed an alanine scan experiment. Each amino acid residue in each of the eight peptides was sequentially replaced with alanine, to find the residues that contributed most to the substrate recognition of NSD2. This resulted in a peptide array which comprised 128 peptides. The first peptide in each row represented the wild-type sequence. Each of the rest of the peptides represented the single amino acid residue exchange to alanine (Figure 14 and table 3). This alanine scan experiment confirmed the earlier observation that NSD2 preferably methylates the peptide (residues 161-175) representing H1.5 lysine 168 (Figure 14). Secondly, it also showed the importance of amino acid residues at crucial sites that were mostly adjacent to the target lysine. The central target lysine exchange to alanine abolished the methyltransferase activity of NSD2 in all the histone and random peptides. All peptides except, H3K18

37

(residues 10-25) contained a hydrophobic residue before the target lysine and its replacement with alanine abolished the methyltransferase activity of NSD2 at all or weakened it to some extent. This means that NSD2 prefers a hydrophobic residue in front of the target lysine in its substrate peptides with the exception of H3K18 peptide. Amongst all these peptide substrates, the peptide of H1.5K168 was the most preferred substrate which has not been reported in previous studies with NSD2. Its sequence is similar to H4K44, H3K36 and to some extent with H3K18, and we do not know at the moment its methylation in cells and the potential biological consequences. The histone H1, in general, compacts chromatin (Cheung and Lau 2005; Peng et al. 2012). Histone H1.5 is a variant of H1. The methylation of histone H1.5 at lysine 168 may be connected with nucleosome compaction. Also, it may provide a binding surface for some reading domains, because the methylated histone H1.4 lysine 26 has been reported to provide interaction surface for L3MBTL1 (Min et al. 2007).

We did not proceed to find the specificity profile for NSD2, but in future the H1.5K168 sequence should be used rather than the H3K36 sequence for this and also for searching the non-histone targets, because NSD2 prefers the H1.5K168 peptide as substrate over H3K36 peptide. One study in human cells has found acetylation on lysine 168 of histone H1.5 in vivo through the use of mass spectrometry (Wisniewski et al. 2007), which also may be a hint for lysine 168 to be the probable methylation site. Since H1.5 is a variant of histone H1, all the variants of histone H1 can be investigated for the methyltransferase activity of NSD2 as a next step of experiments.

38

Figure 14: NSD2 methyltransferase activity on the peptide spots (4 histone and 4 random peptides selected from third random peptide experiment) with alanine exchanges for the entire backbone of amino acids. Dark spots represent methylation activity. The reaction components were SAM, NSD2 (~47kDa) and peptides, and the image was developed by autoradiography. The peptide array comprised 128 peptide spots, for labeling purpose the rows have been separated. The wild-type peptide is present in the beginning of each row with its name on top of it and its sequence next to its name. The amino acid letters on the top of each peptide spot, represent its exchange with A (alanine) shown on the left side of the image. Peptide H1.5K168 is the highly methylated peptide.

39

4.3 SET2 (KMT3A, SETD2, HYPB) 4.3.1 Substrate Specificity Analysis of SET2 A C-terminal fragment of SET2 (residues 939–1195) including the pre- SET and post-SET region was cloned in pGEX-6P-2 for expression as GST- SET2 fusion protein. Earlier, we cloned only the SET domain of SET2, but it was inactive (Figure 15A) although both constructs gave soluble proteins. Methylation of the histone tail screening array showed that SET2 specifically methylated the histone H3 peptide (residues 28-48) contain K36 (Figure 15B). To confirm that lysine 36 is specifically methylated in the histone H3 peptide (residues 28-48) by SET2, we replaced it with alanine in the histone tail peptide and studied methylation of this pepitde. Indeed, we did not observe methylation of the histone H3K36A peptide which means that SET2 specifically methylated lysine 36 in histone H3 peptide (residues 28-48) (Figure 15C). This result is consistent with the reported site in literature (Sun et al. 2005; Yuan et al. 2009). Based on the resemblance of histone H4 lysine 44 peptide to histone H3 lysine 36 peptide, we included histone H4 lysine 44 peptide (residues 32-55) with its lysine 44 mutated to alanine in this experiment. This showed that SET2 is also methylating histone H4 lysine 44 peptide but with lower methylation signal as compared to histone H3 lysine 36 methylation (Figure 15C). This indicated that for the next step of finding the specificity profile we could use the histone H3 tail sequence containing lysine 36 as a starting point.

40

Figure 15: Methyltransferase activity of SET2 (A) Coomassie Blue stained SDS PAGE gels showing SET2 with its pre-SET and post-SET (~52 kDa) and SET2 with only SET (~42.3 kDa). (B) SET2 specifically methylated histone H3 (residues 28-48) peptide. (C) SET2 specifically methylated histone H3 (residues 28-48) peptide at lysine 36 position and histone H4 (residues 32-55) peptide at lysine 44 position. (D) SET2 exclusively methylated the H3K36 unmodified and H3K36 mono-methylated peptides (residues 28-48). In images B, C and D, the reaction components were SAM, SET2 (~52 kDa) and peptides, and the images were developed by autoradiography. Dark spots represent methylation activity.

4.3.2 Product Specificity Analysis of SET2 To investigate the product specificity of SET2 at the peptide level, we prepared a peptide array containing histone H3 lysine 36 unmodified peptide (residues 28-48), histone H3 lysine 36 mono-methylated peptide (residues 28- 48), histone H3 lysine 36 di-methylated peptide (residues 28-48) and histone H3 lysine 36 tri-methylated peptide (residues 28-48). This peptide array was incubated with SET2 in the presence of SAM. The result showed that SET2 is

41

exclusively methylating both histone H3 lysine 36 unmodified peptide (residues 28-48) and histone H3 lysine 36 mono-methylated peptide (residues 28-48) with no methylation at the other peptides (Figure 15D). This means that SET2 is both a mono-methyltransferase and di-methyltransferase at the peptide level in vitro.

4.3.3 Deriving the specificity profile for SET2 To derive the specificity profile for SET2, each residue in the histone H3 tail (residues 31-50) was exchanged against each of the 20 natural amino acids resulting in a peptide array comprising 420 peptide spots. This also included the wild type histone H3 peptides (residues 31-50) in the beginning of each row. The resulting peptide array was incubated with SET2 in the presence of SAM, to determine the critical substrate residues required for the successful methyl group transfer. This experiment showed that the enzyme mainly recognizes the motif (GPA)-3 (GMF) -2 (VILF) -1 (K) 0 (not EGP) +1 (PGAIVCLST) +2 (Figure 16A) which includes the most preferred residues at each position. For example, glycine at position -3, which is also the wild-type residue in histone H3 tail is the much more preferable and accepted residue than the rest of the amino acid residues. Alanine and proline at position -3 show a methylation signal that is significant in comparison to other amino acid residues at this position. Similarly at position -2, glycine, methionine and phenylalanine show a significant methylation signal (Figure 16A). Instead of the wild-type valine at position -1, isoleucine is the preferable residue at this position. Also, leucine and phenylalanine at this position show a significant methylation signal (Figure 16A). The wild-type lysine at position +1, when exchanged with other amino acid residues shows that it preferably accepts a variety of other amino acid residues more readily than the wild type lysine at this position but not glutamic acid, glycine and proline (Figure 16A). The wild-type proline at position +2 can be replaced by alanine, cysteine, glycine, isoleucine, leucine, serine, threonine and valine at this position (Figure 16A). However, in order to find novel non-histone substrates of SET2, we were

42

concerned that a target peptide could be a valid candidate even if its methylation is only weak, if the corresponding protein binds strongly to SET2, which would present the target peptide close to the enzyme. We, therefore, decided to define a more relaxed profile with the aim to include all possible substrate sequences and only exclude sequence that are not favorable for methylation. This profile was defined as (GPAIMFV)-3 (GMFYW) -2 (VILF) -1 (K) 0 (not EGP) +1 (PGAIVCLST > other aa but not RKED) +2 (Figure 16A). In this relaxed motif, at position -3, we also included isoleucine, methionine, phenylalanine and valine (Figure 16A). Similarly at position -2, we also included tyrosine and tryptophan (Figure 16A). At position +2, we observed that there was no detectable level of methylation signal only for arginine, lysine, glutamic acid and aspartic acid and therefore we defined this position as “not RKED”.

43

Figure 16: Peptide based specificity profile of SET2 (A) Testing the library of peptides with all possible amino acid exchanges in histone H3 (residues 31-50) with SET2. The exchanged amino acids are shown on the left side of the image. The wild type H3 tail sequence is shown on top of the image with the target lysine 36 highlighted in red and designated as 0. The residues on the right of this lysine are designated with + sign and to the left with - sign. The enzyme mainly recognizes the motif (GPAIMFV) -3 (GMFYW) -2 (VILF) -1 (K) 0 (not EGP) +1 (not RKED) +2. Not EGP means that SET2 accepts all the residues at this position except glutamic acid, glycine and proline. Not RKED means that SET2 accepts all the residues at this position except arginine, lysine, glutamic acid and aspartic acid. The reaction components were SAM, SET2 (~52 kDa) and peptides, and the image was developed by autoradiography. Dark spots represent methylation activity. The experiment was repeated at least three times and was

44

reproducible. (B) Specificity plot of SET2 (C) Standard deviations of three independent experiments.

4.3.4 Search for the non-histone targets of SET2 We searched the scansite database with the SET2 recognition motif (GPAIMFV)-3 (GMFYW) -2 (VILF) -1 (K) 0 (not EGP) +1 (not RKED) +2 from the specificity profile and found numerous proteins matching this motif. We selected some nuclear non-histone proteins from the scansite search for analysis. We designed a peptide array with 91 putative non-histone target peptides. Also, we included two histone H3 lysine 36 peptides (residues 30-49; 1A/10E) with two of their variants in which lysine 36 was replaced with alanine (residues 30-49; 2A/11E) as positive and negative controls, respectively. Each peptide was 15 amino acid residues long with the target lysine in the centre. Subsequently, we investigated the methyltransferase activity of SET2 on these peptides in the presence of SAM (Figure 17; table 4). The highest methyltransferase activity of SET2 was observed on histone H3 lysine 36 peptide (residues 30-49; 1A/10E) and to a lesser extent but significant activity on chloride intracellular channel 1 (9B) and ASH1L (14C) peptides (Figure 17A). Few more peptide spots showed weak methylation signal and the rest of the peptides did not show any methylation signal.

We next took those peptide sequences which were methylated and prepared another array in which we included their lysine to alanine exchanges, to confirm the site of methylation. We incubated this array with SET2 again in the presence of SAM. The result showed that the same peptides were methylated with the same relative methylation signal. Their lysine to alanine mutated peptides did not show any methylation signal which confirmed that the methylation occured at the central lysines (Figure 17B). The signal of methylated non-histone peptides was weaker than that of methylated histone H3 lysine 36

45

peptide. Based on this observation, we stopped to investigate the methylation at the protein level in vitro and in vivo.

Figure 17: Putative non-histone target peptides of SET2 based on the specificity profile. (A) The reaction components were SAM, SET2 (~52 kDa) and peptides, and the image was developed by autoradiography. Dark spots represent methylation activity. Peptides 1A/10E are H3K36 (residues 30-49) and peptides 2A/11E are H3K36A (residues 30-49) as positive and negative controls respectively. The peptide sequences are listed in table 4 along with the names of the proteins. Each peptide spot consists of 15 amino acid residues. (B) The methylated peptides from figure A along with lysine to alanine mutated peptides next to them.

Peptide Peptide sequence Protein No. 1A/10E P A T G G V K K P H R Y R P G Histone H3K36 (30-49)

2A/11E P A T G G V A K P H R Y R P G Histone H3K36A (30-49)

2B R D G S V F L K N A A G R L K A kinase anchoring protein 13

9B A K F S A Y I K N S N P A L N Chloride intracellular channel 1 (CLIC1, NCC27) 13B A E M T I Y I K Q L G R R I F ATP-dependent RNA helicase A

46

2C N RG K AG V K R A A E M YG Heterogeneous nuclear ribonucleoprotein C 14C F S N K P F L K L G A V S A S ASH1L

Table 4: The peptide sequences of putative non-histone targets of SET2 correspond to the peptides in figure 17.

The target lysine residue 119 of the Chloride intracellular channel 1 protein (CLIC1, NCC27) has been reported as being acetylated (HPRD), which indicates that it is accessible for post translational modifications and make likely that it also could be a probable site of methylation at the protein level. CLIC1 (NCC27) is a member of the highly conserved class of chloride ion channels that exists in both soluble and integral membrane forms in the nucleus (Harrop et al. 2001). hnRNPL (Heterogeneous nuclear ribonucleoprotein L) forms a complex with SET2 in vivo (Yuan et al. 2009) and contains one recognition motif in which the predicted lysine is methylated at the peptide level but only to a detectable level. Also, the methylated lysine 683 in ASHIL peptide (another H3K36 methyltransferase) needs further investigation at the protein level because there are also some reported PTMs in ASH1L (HPRD).

47

5. Conclusions We planned this study to characterize the specificity profiles and mechanism of lysine methyltransferases. We showed that MLL1 specifically methylated the histone H3 peptide and the site of methylation was lysine 4. We employed this H3K4 peptide as a starting point to derive the specificity profile for MLL1. The specificity analysis of MLL1 demonstrated that it recognizes a five amino acid motif (-2 to +2). We showed that the MLL1 methyltransferase equally methylates itself and its original substrate, the recombinant histone H3 at the protein level. Similarly, we also showed that SET2 specifically methylated histone H3 but the site of methylation was lysine 36. We employed this H3K36 peptide to derive the specificity profile for SET2. We demonstrated that SET2 recognizes a six amino acid motif (-3 to +2). We characterized SET2 as a mono- and di- methyltransferase at the peptide level. Unlike MLL1 and SET2, NSD2 was not specific for any of the histone tail peptides. To investigate its specificity, we designed and employed a randomized peptide array approach which led us to find a novel and stronger histone site, the lysine 168 of histone H1.5 in vitro at the peptide level. We demonstrated that NSD2 methylates those substrates in which a hydrophobic residue precedes the target lysine.

48

6. Materials and methods 6.1 Cloning The sequences encoding MLL13745 (residues 3745–3969), MLL13812 (3812–3969), NSD2 (residues 991-1212) and SET2 (residues 939–1195) all of which include the Pre-SET and Post-SET, were amplified from cDNA (Advantage® HD Polymerase Mix, Clontech) got through reverse transcription from RNA extracted from HEK293 cells (PureLinkTM Micro-to-Midi Total RNA Purification System, invitrogenTM). The primers used in the amplification of the respective gene segments are given in table 6. The amplification conditions are given in table 5. Both the constructs of MLL1 were cloned in pGEX-6P-2 (GE Healthcare) with restriction sites BamHI and XhoI for expression as GST-MLL1 fusion proteins. NSD2 was cloned in pGEX-6P-2 (GE Healthcare) with restriction sites BamHI and XhoI for expression as GST-NSD2 fusion protein. SET2 was cloned in pGEX-6P-2 (GE Healthcare) with restriction sites BamHI and SalI for expression as GST-SET2 fusion proteins. Before ligation into the pGEX-6P-2 cassette, both the vector and the respective PCR products were digested with the respective restriction enzymes (NEB) at 37oC for two hours. One µl of restriction enzyme (10 units) was used to digest 1µg of DNA. For ligation, 60ng of vector was taken with 30ng of insert when the insert size was 1000bp (the amount depends on insert size) in a total reaction volume of 20 µl and incubated at 22oC for one hour. The ligated products were transformed into electro- competent E.coli XL1-Blue strain (Novagen) using BioRadTM Multipulser at 1.8 kV for 4 milli seconds. Colony PCR screening with Taq DNA polymerase (NEB) confirmed the correct insertions followed by insert release using the respective restriction enzymes.

49

1-Primer amplification 2-Colony PCR

Step Segment Time Temperature (oC) Time Temperature (oC) 1 Initial denaturation 3min 95 5min 95 2 Denaturation 10sec 98 45sec 94

3 Annealing 10sec 55 35sec 55 4 Extension 1min 72 1min 72 5 Cycle step 2-4 30cycles 30cycles 6 Final extension 5min 72 5min 72 7 Final hold ∞ 8 ∞ 8

Table 5: PCR amplification programs. 1-Primer amplification program was used in cloning as well as generation of megaprimer PCR. 2-Colony PCR was used to screen colonies in cloning as well as in site directed mutagenesis.

Primer sequences 1 MLL13745 Forward Primer 5`GAGTGGATCCTTCCGTTTCCACAAGCCAGAGGA3` Reverse Primer 5`GCTACTCGAGTTAGTTTAGGAACTTCCGGCA3` 2 MLL13812 Forward Primer 5`GAGTGGATCCATGGATCTGCCAATGCCCAT3` Reverse Primer 5`GCTACTCGAGTTAGTTTAGGAACTTCCGGCA3` 3 MLL1 site directed Mutagenic Reverse Primer with DraI restriction site 5`TGCCTCCTTAGAAGTCCTTTTTAAATGCCGGAA3` mutagenesis Mutagenic Reverse Primer with BstUI restriction site CTTGCTGTCGTAATACCTTTCGCGCTTGTCAGT3`

4 NSD2 with pre-SET Forward Primer and post-SET 5`GTTCGGATCCTACAAGCACATCAAGGTGAA3` Reverse Primer 5`GAGTCTCGAGAAGGGTCGTCGAGGTCTTTGGTCTA3` 5 NSD2 with SET Forward Primer 5`GTTCGGATCCGGGTGGGGCCTGGTCGCCAAG3` domain only Reverse Primer 5`GAGTCTCGAGGTTGTAGTTAAAAGTCAGCT3`

50

6 SET2 with pre-SET Forward Primer 5`GTTCGGATCCTCAGCACTGGTTGGGCCCTCCTGTG3` and post-SET Reverse Primer 5`GAGTGTCGACTCTGATGCTGACTCTGTTTTCTCC3` 7 SET2 with SET Forward Primer 5`GTTCGGATCCGCAGATGTGGAAGTCATACTCA3` domain only

Reverse Primer 5`GAGTGTCGACTTTTCCATATCTCTGGAACTG3`

Table 6.1: Primers used in PCR amplification and site directed mutagenesis.

6.2 Site Directed Mutagenesis The K3825R and K3873R mutations of the SET-N and the variable region of SET-I of MLL1 respectively, were introduced by using a PCR-megaprimer mutagenesis method (Jeltsch & Lanio 2002). Briefly, a two-PCR reaction protocol was followed using Advantage® HD Polymerase Mix (Clontech). In the first PCR reaction, pGEX-6P-2 forward primer was taken with the respective mutagenic reverse primer (Table 6) containing a restriction site (DraI for MLL1 mutant K3825R and BstUI for MLL1 mutant K3873R). The template used was 30ng of wild-type MLL1 in pGEX-6P-2. The amplification program used is in table 5. This generated a megaprimer (300 ng was used) for each of the mutants for the second rolling circle PCR reaction (which amplify the entire plasmid). The template used was 50ng of wild-type MLL1 in pGEX-6P-2 in the second PCR reaction. The PCR program was different from the previous one, which is given in table 7. The resulting PCR product was digested with DpnI, to remove the methylated wild-type plasmid. The leftover newly synthesized plasmid was transformed into electro-competent E.coli XL1-Blue strain (Novagen) using BioRadTM Multipulser at 1.8 kV for 4 milli seconds. Colony PCR screening with Taq DNA polymerase (NEB; colony PCR program table 5) was followed by restriction marker site analysis (DraI for K3825R and BstUI for K3873R) which were introduced during the design of mutagenesis primers and DNA sequencing to confirm the presence of mutagenesis.

51

Rolling circle PCR

Ste Segment Time Temperature (oC) p 1 Initial denaturation 3min 95 2 Denaturation 10sec 98

3 Annealing 10sec 55 4 Extension 6min 72 5 Cycle step 2-4 30cycles 6 Final extension 10min 72 7 Final hold ∞ 8 Table 7: PCR amplification conditions for rolling circle PCR.

6.3 Protein Expression and Purification The E. coli BL21 strain (Novagen) was used for expression of the cloned proteins. BL21 cells transformed with the corresponding pGEX-6P-2 vectors were grown for over-expression in LB (Luria-Bertani) medium at 37 C to OD600  0.6 and subsequently the cell cultures were shifted to 22 C for 15-20 minutes and induced for overnight with 1 mM IPTG (isopropyl ß-D-thiogalactoside). Afterwards, the cells were harvested and resuspended in 20 mM HEPES (pH 7.5), 0.5 M KCl, 0.2 mM DTT, 1 mM EDTA and 10% glycerol, and protease inhibitor cocktail (1ml/20 gram of cells; Sigma-Aldrich) to homogeneity. The cells were subsequently disrupted by sonication (10 cycles, 15 seconds bursting, 30 seconds cooling with 30% amplitude). The cell lysate was then centrifuged at 20,000 RPM (Avanti Ultracentrifuge) for one hour and 20 minutes, subsequently taking the supernatants and passing them through glutathione sepharose matrix (Amersham bioscience) to trap the respective protein. Washing was done with the same buffer used for disruption of cells. The bound proteins were eluted with a similar buffer that additionally contained 40 mM reduced glutathione, dialyzed in 20 mM HEPES (pH 7.5), 0.2 M KCl, 0.2 mM DTT, 1 mM EDTA and 10% glycerol for 2-3 hours and then shifted in 20 mM HEPES (pH 7.5), 0.2 M KCl, 0.2

52

mM DTT, 1 mM EDTA and 60% glycerol for overnight. Only in the case of MLL1 constructs, we found the proteins getting precipitated in the first dialysis step, for which we had to add 2-4 ml of elution buffer to avoid precipitation. The purified proteins were then visualized on SDS-PAGE for checking the quality of proteins prior to performing the activity assay.

6.4 Synthesis of peptide SPOT arrays Peptide arrays were synthesized by Dr Srikanth Kudithipudi on cellulose membranes by the SPOT-synthesis technique (Frank 2002). Each peptide spot contained approximately 9 nmol of peptide and covered a diameter of 2 mm on the cellulose membranes (Autospot Reference Handbook, Intavis AG). Successful synthesis of each peptide was confirmed by bromophenol blue staining of the membranes.

6.5 Design of random peptide arrays The first random peptide array was designed such that we placed a lysine in the centre of a 15-mer peptide. The residues immediately adjacent to this central lysine (at +1 and -1 position) were substituted by each of the 17 amino acids except cysteine, methionine and tryptophan in every possible permutation. The remaining 12 residues (6 on each side) were randomly assigned such that there was a statistical representation of each amino acid at each position in every possible permutation. This resulted in 17×17 = 289 peptides + 4 histone tail peptides representing some lysine sites in histones H3 and H4, a total of 293 peptides. The highly methylated peptides from the first randomization experiment became the basis for the second round of randomized peptides, taking into account the appeared residues around the central lysine to the -4 position at one side and to the +4 position at the other side. The remaining 3 amino acids in the 15-mer peptide at each side were randomly assigned like the way we did in the first round. The basis for the third round of randomization was to exclude the

53

effect of multiple lysine methylations by selecting the single lysine highly methylated peptides from the first and second round of randomization. Also, the observed preferred amino acids around the central lysine in the second randomized peptide array were taken into account, and the rules for shuffling the amino acids around the central lysine remained the same as in the first and second round.

6.6 Methylation of peptide arrays All the membranes containing the peptide SPOTs (for MLL13745, MLL13812, SET2 and NSD2) were incubated in a methylation buffer (50 mM Tris/HCl (pH8.5), 50 mM NaCl, and 0.5 mM DTT (dithiothreitol) for 5 minutes at ambient temperature. Membranes were then incubated in the same methylation buffer that additionally contained 0.35 µM labeled [methyl-3H]-SAM (specific activity: 2.7 TBq/mmol; Perkin Elmer) and the corresponding enzyme (3-7 µM MLL1, 1-3 µM SET2 and NSD2) for 1-2 hours. Afterwards, the membranes were washed with washing buffer containing 100 mM NH4HCO3 and 0.1% SDS four times for 5 minutes. The membranes were then incubated for 10 minutes in Amplify NAMP100V solution (GE Healthcare) and dried between Whatman paper (Whatman GmbH). Membranes were then exposed to HyperfilmTM high performance autoradiography films (GE Healthcare) in the dark at -80 0C for 3-7 days. Methylation was detected by autoradiography using AGFA Curix 60 developing machine (Agfa Deutschland Vertriebsgesellschaft mbH & Co. KG) (Rathert et al. 2008b). PhoretixTM Array software was used to discriminate among the methylation signal intensities of individual peptide spots in analysing the peptide arrays quantitatively.

6.7 Methylation of purified Protein domains and MLL1 mutants Methylation of protein domains and MLL1 mutants was performed in methylation buffer (50 mM Tris/HCl (pH8.5), 50 mM NaCl, and 0.5 mM DTT)

54

supplemented with 0.76 M tritium-labeled SAM (specific activity: 2.7 TBq/ mmol; Perkin Elmer), 1 - 2 µM MLL1 and 4 – 5 µM of target protein. The reactions were incubated overnight at ambient temperatures. The reactions were stopped by the addition of SDS loading dye and followed by boiling at 100 0C for 5 minutes.

6.8 SDS-PAGE and Autoradiography Methylated reaction products of MLL13745 and MLL13812 were loaded on a 16% SDS-PAGE gel and separated. Afterwards, the gel was incubated with Amplify NAMP100V solution (GE Healthcare) for 45-60 minutes and subsequently dried on Whatman paper (Whatman GmbH) in gel drier (Model 583, BIORAD) at 80 0C for two hours before being incubated with HyperfilmTM high performance autoradiography films (GE Healthcare) in the dark at -80 0C for 7-15 days. Methylation was detected by autoradiography using AGFA Curix 60 developing machine (Agfa Deutschland Vertriebsgesellschaft mbH & Co. KG).

Table 6.2: Primers used in PCR amplification of the MLL1 non-histone targets. Primer sequences 1 ASH2 like Forward Primer 5` GTTCGGATCCGGTGAGGTAGAGCTGCAATGTGG3` Reverse Primer 5` GAGTCTCGAGATGAGAGGCCCTCACCATAGAGT3` 2 FAS ligand Forward Primer 5` GTTCGGATCCATGCAGCAGCCCTTCAATTA3` Reverse Primer 5` GAGTCTCGAGCCTTGAGTTGGACTTGCCT3` 3 Host cell factor C1 Forward Primer 5` GTTCGGATCCATGGCTTCGGCCGTGTCGCC3` Reverse Primer 5`GAGTCTCGAGCCACGTCAGGGTGTCAATATC3`

4 Retinoblastoma Forward Primer binding protein 5 5` GTTCGGATCCGAGGTAGAAGACCCAGAAGA3` Reverse Primer 5` GAGTCTCGAGTAACAGTTCTGAGATTGCTC3`

55

5 WD repeat domain 5 Forward Primer 5` GTTCGGATCCGGGAAGTGCCTGAAGACGTAC3` Reverse Primer 5` GAGTCTCGAGGCAGTCACTCTTCCACAGTT3`

6 Polycomb complex Forward Primer 5` GTTCGGATCCATGCATCGAACAACGAGAATC3` protein BMI1 Reverse Primer 5` GAGTCTCGAGAGTAGGTCGAACTCTGTATT3` 7 Protein phosphatase Forward Primer 5` GTTCGGATCCACAGAGGAAGAGGAAGCTGC3` 1

Reverse Primer 5` GAGTCTCGAGGCCCTGGCGGGCGGCCTGGGCCGGC3`

8 TAF9 RNA Forward Primer 5` GTTCGGATCCATGGAGTCTGGCAAGACGGCT3` polymerase II

Reverse Primer 5` GAGTCTCGAGCAGATTATCATAGTCATCAT3`

9 MLL segment A2 Forward Primer 5` GAGTGAATTCTGCTCCTAAAAAAGGCCAAAGCTCA3`

Reverse Primer 5` GAGTCTCGAGTGGTTTTTGTTTTACAGGGAT3`

10 MLL segment B1 Forward Primer 5` GAGTGAATTCTGACAAAGCAGCTGCTGGAGTG3`

Reverse Primer 5` GAGTCTCGAGCTCATAGTCATCATCATCAT3`

11 MLL segment C1 Forward Primer 5` GTTCGGATCCAAGTCCTTCTTCATTCGGCA3`

Reverse Primer 5` GAGTCTCGAGGCAGTCGATTGTCATAGACC3`

12 MLL segment D2 Forward Primer 5` GTTCGGATCCAATTTCAGCTCCCCACTGAT3`

Reverse Primer 5` GAGTCTCGAGGTTTAGGAACTTCCGGCATT3`

13 CREBBP a1 Forward Primer 5` GTTCGGATCCGACATCGTAAAGAATCCCATG3`

Reverse Primer 5` GAGTCTCGAGGAAGAAATGAATACTATCCAG3`

56

14 CREBBP b2 Forward Primer 5` GTTCGGATCCAAGATGGTGAAGTGGGGGCTG3`

Reverse Primer 5` GAGTCTCGAGGCGGAGCTTGTGTTTGATGTTGA3` 15 Plexin B1 Forward Primer 5` GTTCGGATCCCGCTTCTCCCTGGGTCACGT3`

Reverse Primer 5` GAGTCTCGAGCTGGACCTCAGAAGTGACATC3` 16 MLL PHD3 Forward Primer 5` GAGTGGATCCTTCTGCCCTCTCTGTGACAAATG3`

Reverse Primer 5` GCTACTCGAGAGTACAGTTCACACAAGTGTAG3`

6.9 Scansite database http://scansite.mit.edu/ database

6.10 NSD2 Random Peptide Sequences Table 8: First Random peptide array with 293 spots corresponds to Figure 11.

1 H D G V T P A K A V S N L H T 147 K Q P D R S L K P K Y I P Y P 2 D R L A R A A K D L L L P R D 148 T E E H Y D L K Q F V N A H Y 3 I Q Q N K K A K E G K L E S V 149 F T R H E K L K R P V R T K S 4 P T N L G L A K F I H H L T L 150 L R I S R H L K S N H F E T T 5 A L P R L V A K G E S S L Q Q 151 N H H A R F L K T F F H V N N 6 H A A F S I A K H Q D T D E G 152 Y F S V D T L K V I D L H R S 7 P G F D D T A K I A L G V S I 153 G E N E G L L K Y F T I T A G 8 L P N A F Y A K K I R V E F Q 154 S N T H H T N K A N N Q L V T 9 T A F K V E A K L Y Y K L F Q 155 H A L S F D N K D V E N Q A L 10 Y A L E S L A K N K V T Y V D 156 V Q K P V R N K E E E E Q N A 11 H K I D K Q A K P G S S P E F 157 T P Y S T P N K F D R R F E R 12 P H F P D D A K Q T V L T P G 158 Q I K R I G N K G A F P K A V 13 A S N K A F A K R D Q G E S Y 159 Q H N T A Q N K H P P G T I R 14 K H T G H Q A K S L L L N E Y 160 T N G Q D F N K I F V K L T E 15 E H E D D H A K T Y D A H Q N 161 K D Y F L I N K K Q R F T P N 16 P S D P E S A K V E R Y S V A 162 N T L K P Y N K L I V L H H L 17 F Y D Y T P A K Y S I R A K F 163 A G V N S S N K N L Q L H E A

57

18 I R K N P V D K A L L D I G F 164 E Q S G T T N K P T Y D R Y K 19 A E T K E R D K D P T D A Y I 165 E G V Q L Q N K Q Y E A Y L Q 20 S P A E H N D K E R T L P A N 166 I Q S V K G N K R A A S I F N 21 F Q E V E Q D K F E D G I D P 167 Y N I S F L N K S K F K T H D 22 F V G F A A D K G Q I Y F K Y 168 H K Y R Y K N K T L Y T V Y E 23 I K E I P S D K H K P Q L D N 169 Q R E D A G N K V E S R V H Y 24 E A P E N D D K I I L Y K R G 170 L N K D D T N K Y R T A I K D 25 I H S R V Q D K K V S N G R N 171 Y V P Q R F P K A Y R A R R F 26 F Y F V N H D K L P Q A R R K 172 D H R G G S P K D L N F A L H 27 P I S V P E D K N T N A E T D 173 D Y T S P F P K E Y L N P V P 28 Y R H N V G D K P D E R K N S 174 S K D Y E Y P K F N D E G Q K 29 V F N Q Q Q D K Q D P D I D T 175 A A V F K Y P K G E K E E V S 30 S I I P Y T D K R T V H T Q I 176 F G A E K G P K H T R E Y I F 31 Q S K V Q P D K S S S I Q G N 177 Y K I L D G P K I E D F T G P 32 H N G N H D D K T R D Q V N D 178 V N G H I L P K K Y H H F A Y 33 P L L G E P D K V E F S L L V 179 G Y G Q K H P K L K V T R D Y 34 H V N T Q P D K Y A N S D H K 180 R D G G N N P K N F H P K N P 35 S V I H K G E K A P I V E E A 181 N Q D I R Q P K P P G I H F E 36 Q V P N S E E K D I S R L L E 182 A I T D A E P K Q E I K A N Y 37 T K Y K T E E K E L Y S V N P 183 R R A V N L P K R I P Y Y S T 38 N T V V K H E K F P Y I A P Q 184 P R V E A N P K S Q K F D G S 39 K Y I N L F E K G T A E A L Q 185 L I A F F S P K T T I E Y G R 40 T F H A A S E K H N R N I I T 186 E S T R E P P K V K H H L K R 41 L G Y A N Y E K I R V R K R S 187 H E T H L S P K Y T F E G F S 42 S A I E Q G E K K A P H Q L H 188 T R I E Q A Q K A G F D Y P D 43 V T Q H I L E K L S G T D S Q 189 S E K A T A Q K D N E G Q Q G 44 R L N A N S E K N A I G I Q I 190 P I Q R I N Q K E Q Q K N V H 45 Q Y I F D L E K P H H V D L I 191 G Y L E P R Q K F Q A I R S V 46 Y G F L Y D E K Q K R P D P G 192 H A T V R K Q K G R T F Y D I 47 I N I G V R E K R N R I T N K 193 T N S E F S Q K H H Y A Q P T 48 L S L L N Y E K S S N G Q K V 194 N H F P Q D Q K I S T I N S G 49 G K S Y Y H E K T L N P G V G 195 N P Y Y Y T Q K K D N K L H Q 50 L Y R I H D E K V P I H G P N 196 G P N Q A A Q K L S K S I F T 51 S F S L E Y E K Y K G R K G E 197 S K D P I F Q K N V Q Q N Y S 52 D Q K P L E F K A G Y Q K I N 198 K L Q K I P Q K P G V V R K L 53 K D R T I N F K D R Q I A Q Q 199 D K H R P V Q K Q S S A K T D 54 Y A P Q I F F K E R P P N Y V 200 L E P Q V V Q K R D A T S F D 55 T P H D V I F K F G D K A G D 201 I T Y E T I Q K S A A P S I H 56 N P A D K K F K G N R Q E A A 202 D P Q L V D Q K T I N R F K K 57 N Q A H F G F K H D I F T G N 203 L G A R N D Q K V I Q G D Y L

58

58 F T I G S H F K I V N A P E P 204 I P F E R V Q K Y D F T S V L 59 P D F V H F F K K I K N V I A 205 A F A Y R F R K A Q P S I E K 60 K S P A G K F K L F G L F Q I 206 L T P G Y N R K D I H T P V F 61 G I P F H Y F K N H P K Y F R 207 K Q A R H E R K E E T K D V H 62 R L V R D A F K P Q A F P G K 208 E G K G Y P R K F A T P R L G 63 R Y F I L R F K Q K Y Q N D D 209 R V N T S R R K G F Q Q I D S 64 V R Q V V T F K R Q F A Q H S 210 Y K R L D D R K H Y I Q R N I 65 K I D T I L F K S A F L V V Q 211 R I G I F E R K I V E I T H Y 66 A K P D N P F K T K N L F I K 212 Q D Q K R S R K K F H I N D G 67 Q Y E T A I F K V P K K G P N 213 H F S K Y H R K L H D T F D V 68 G H T S L N F K Y V Y R S P K 214 V Y T A R V R K N G N I Y I A 69 Q G H R E E G K A L H Y Q G E 215 A R H P Q K R K P L G P T L E 70 R H A I E N G K D D N D V K I 216 A S V S S I R K Q T H N H R F 71 H S D A E S G K E Q E P T H D 217 R N Y L G A R K R T V P V E I 72 P K E G P I G K F Y P E G V Q 218 S V L D P I R K S H R E A N L 73 K Y K A G L G K G L G R R E R 219 D Q A S T S R K T T H K G K A 74 E F E Q I D G K H R F F S D Y 220 D F V K G P R K V K V D H R P 75 T E G K S A G K I V R S H I R 221 P D H S P S R K Y S P D S N H 76 E Q F L N T G K K I Y A R V E 222 Q R A V F L S K A H L Y F T P 77 F R N I H F G K L P G I P I R 223 F L D L Q V S K D Y Q G L Q E 78 Y L D H I T G K N H A G Q L E 224 G D Q T E R S K E F L D V H F 79 P E H H T F G K P N P D T F D 225 D A R Y D F S K F E S P Y K V 80 G E L N A A G K Q G P F H T D 226 S I E V F H S K G R G P E G L 81 L A R S N K G K R P T N K E T 227 H A I Y S K S K H P S E P N G 82 A N S T T V G K S P A L Y F T 228 F L H F I A S K I N H G A N H 83 V G E I N G G K T I I H T L I 229 P P E Y R K S K K F A D E R Y 84 N A N S K I G K V G N L K I Y 230 G S T N H T S K L K T V K L K 85 Q G K N R P G K Y H D T I R L 231 K S E K E G S K N K F S N T V 86 S S V Q T K H K A A L K R A E 232 V L V Q V E S K P Y R K K D N 87 T S G R N H H K D I H Q L T E 233 R E P E D P S K Q G D Y R T R 88 I D R Q V K H K E Q E G Y A V 234 Q F R F D A S K R A K S P K Q 89 D R L F H G H K F N Q F D G R 235 S S I S F A S K S V Y H F E V 90 H P T Q S E H K G V Q F G A L 236 Q N Y L D N S K T G F G L N H 91 S Y S N E R H K H K A S F A I 237 F H P H F L S K V S L F V Q I 92 K V Q P A D H K I D S Y F Y R 238 K P S F L R S K Y R Y S N L N 93 E P Y A G A H K K Q I L I Y L 239 I V Y K G H T K A A N A G Y F 94 L V G K I Q H K L S F N D P K 240 R T E P Q I T K D T H T Q S Q 95 N H D T I R H K N N Q V G I H 241 H I L E H Y T K E G E D H A A 96 H E G N I N H K P Y G N A F I 242 R R S Y I N T K F V I G Y L A 97 F L L Y L I H K Q R E R F P G 243 Y I K P Y V T K G A E F I D V

59

98 Y V Y D V K H K R S Y V H F E 244 S D L A P Q T K H E I R P D F 99 I K T R P Y H K S F I H G H G 245 A S P V I E T K I A K V D S T 100 I I D D S I H K T H G H G A K 246 P L P P S N T K K H D Y L E R 101 F K Q I P L H K V L G Q S T A 247 S H Q D Y F T K L Q F A A F V 102 Y V R H E V H K Y Q P V P S Q 248 D H Q R H Y T K N S T R V N H 103 F L R R A T I K A F P A K F R 249 G G G Y T P T K P R A E R S K 104 G K V K Q R I K D H T D Q R A 250 P N S L Y S T K Q D R H H A V 105 T D G S R Y I K E H H K E I P 251 T F L P T D T K R E G D D R H 106 D F K K K P I K F H P I F P E 252 G K I P F H T K S I N P H Y L 107 N H F R Y I I K G V L K E H H 253 N P H L Q G T K T E H V A K L 108 I R V E F F I K H Q K V D V I 254 T T P G L Q T K V A L S K F S 109 A N D T V A I K I S V N N P G 255 F I V I A S T K Y L E F V H S 110 D E E H Y D I K K P G Q N S R 256 Y S N A G V V K A S P R E G E 111 V N R Y Q H I K L V A A I Q F 257 V E T D H T V K D T D R S S L 112 F T T S A R I K N G K S L P F 258 E F R A V P V K E R L H K K S 113 P D K I P V I K P D F F H E K 259 V Q H I A L V K F G V R P Y P 114 I P R T L D I K Q E G V E I T 260 Q G I Q Q E V K G G A Q V F G 115 E R R H P R I K R T E Y K E Y 261 I G Y S T N V K H G T F S S P 116 E S I A N A I K S F K Y P L H 262 L Q E N L Y V K I L Y K D D L 117 R L D R H R I K T D I E Q V T 263 R T E Y V I V K K Y Y L R R T 118 L A D P Q K I K V N S Q E L V 264 Q L H H E H V K L I G P S N N 119 N V A T N Y I K Y V Y V G E A 265 E A K G H Y V K N F K T R P S 120 E D Y V K K K K A P S H K G S 266 K N H A S K V K P A E E P T H 121 D H H I R G K K D L I I N D A 267 Q L N N G N V K Q K V Q N R I 122 Q V N Y S V K K E H R E T G N 268 G F F K L R V K R Y F Y S V L 123 H T V Y P P K K F R P A A Q F 269 H V A V K N V K S E F T Y K P 124 R N T E S T K K G F S S S Y R 270 N G V I L R V K T P V E V I F 125 T F D F D F K K H Y A Y S N Q 271 N D G F F T V K V H A V H S V 126 A T K Q G I K K I Y K D R Y P 272 T S D I G H V K Y R R P E Q I 127 F Y S L A V K K K A T I I H L 273 G E L G L A Y K A I E Q H H P 128 Y T Q Y R N K K L K L H G T D 274 A Y Y S G E Y K D T Q V P T P 129 I F G L Q N K K N N E G T Q F 275 L Q Q N Y F Y K E V N T G D P 130 R V S T K L K K P T T P S V H 276 T T N S N Q Y K F R L L D Q S 131 Y E N H Q E K K Q F L V N E H 277 D F E G G T Y K G L G Y F G F 132 G E Y F D H K K R F D Q Y A Q 278 A Y F L K R Y K H N K P D S K 133 K N V G F Q K K S D A Y F Y Y 279 E L Q T H Q Y K I D E V R T V 134 V D T G F L K K T T D H A T F 280 V Y Y H T G Y K K H S I E R S 135 V I K Q E Q K K V Y K N Q I A 281 I A S P V V Y K L D Q T D A T 136 V G Q T R Y K K Y V Q T Q Q G 282 N R V Y S K Y K N S R E N K D 137 V Q A T S G L K A Q D N Q K H 283 K T L F K V Y K P R S K Y A T

60

138 V D R D V S L K D S I Y Q Y R 284 Y H H K F V Y K Q V Q D V Y N 139 L G A Q P E L K E N K N H Q T 285 G A F N T Q Y K R Q S D N S E 140 R D K G G I L K F S T T Y F E 286 S I R F K T Y K S G T G F T A 141 L K H L T E L K G H N Y F R A 287 D I F I Y A Y K T D L N I P D 142 D P F P Q L L K H P A S I P Y 288 A E Q E Y G Y K V K V A S D Y 143 E V P T A Q L K I L D G N L Y 289 N Q G D L Y Y K Y N G L L G K 144 E F F N N D L K K E Q N S A G 290 A R T K Q T A R K S T G G K A 145 K P L F D I L K L N K E G I Q 291 K Q L A T K V A R K S A P A T 146 P L D I G H L K N Y H H A H R 292 P A T G G V K K P H R Y R P G 293 G G A K R H R K V L R D N I Q

Table 9: Second Random peptide array with 168 spots corresponds to Figure 12.

1 P A T G G V K K P H R Y R P G 85 R V L L T R L K T I N N R N Q 2 P E P L A E P K G V N L E G R 86 H L E D G F P K K R R L S H P 3 F Y E Q N P F K N R G R I T T 87 I R T L H G P K A I R P I G I 4 Y T I V N P I K N Y F P D E Y 88 A M D P R V V K A R R R P T Y 5 G L D D A N P K T F H L F V Q 89 R H R G S G F K A R Q P R P G 6 D Q S T N N I K R V G P H E V 90 G M D F S F T K N F F P D L G 7 I E M G S Q H K N R R L F E L 91 I Q D T H F F K T R P F A T V 8 M S E P F G L K N F G P E S E 92 F H V R F V H K A F H P R D H 9 H L V G G V L K K L Q R L A N 93 H F I V H E P K T I S R I H G 10 V Y N G R F I K K V H N D S Q 94 S V N V S F P K K R R H Y Q F 11 N F E L T F H K G Y R R R E V 95 D R F G S R L K K R S R G L S 12 V E D G N F T K K V R P Y Q M 96 N G T R R G L K T V H F H F F 13 Y A V G F R F K K F P P P H M 97 V P S V N E F K G V S F L E H 14 R N H G Y R I K K L P R P N H 98 N N Y R T F I K T L H N I F A 15 T N M L N F P K N V R N D Q D 99 Y S I T F Q F K R R P P P F D 16 H S V G F Q F K K F Q R I F L 100 M M Q T G E H K N L R R Y N T 17 R T N V N F T K N Y R P N R Q 101 L G N V R F H K N Y H F I T P 18 F F T Q S R V K K L P H D I P 102 M Y I L N P P K N V P I V V G 19 D S L P R P Y K R I P L Q D Y 103 Q M N V S F F K R V S I A I T 20 V Y D Q Y G P K V F R P S L Y 104 E T I P R F I K T I P H N S V 21 D M I L F T P K T R F F I M T 105 S Q N V T L V K K I R P N D A 22 D P R T Y G F K V V S P D I L 106 L Q I P T E V K L F F R M M F 23 E Y Q V Y P F K R R F P H H M 107 F I N R G L I K R V Q P R V A 24 T P P Q S F T K L P G H V M F 108 G E F A F G P K V L S N H F T 25 P E V V F N L K P V S Y D Y D 109 Y T A A R L H K P R R P Y A P 26 D R D R Y F V K R F S P A V L 110 I I V V Y V H K G I P P F M M

61

27 Y N H G T P L K K V N P T I Y 111 F P M G T Q P K P L F P T A I 28 A E I F T R F K K P P L V D Q 112 N L F F G G L K N Y R H M N D 29 G N Y G F T L K K R F R V S Q 113 Q I E A S E V K L L R R A F P 30 E E G R A L L K N I N H A H N 114 H N G D N R L K A R H F G V Y 31 F F E T Y F V K P V S N Q M R 115 A T L A H P F K A P N N E N H 32 G H R V N G P K G R R L T A G 116 D Y E L Y G I K R L H F S T Q 33 G H D G T G I K L R F P Q H P 117 H D T F Y F V K N F Q R R R D 34 L L G A H V L K V V R Y Y T E 118 L I R T Y F V K N V S L I I T 35 R N A R T N P K L V R N V R D 119 M Y F V S N P K T Y R L P D R 36 T R G F S P V K N V R R M I D 120 S A Q Q G F F K N I H H G T S 37 Q F H T F P V K G Y R L A H R 121 A M Q V T R H K N Y G L T V H 38 D G F L F V T K R L H R N N S 122 G Y F R G P I K N Y R L G L G 39 T E T L F T P K P R F R M P F 123 F Q I T F R I K G F G L R L Q 40 T T T G S Q P K T V R F A R D 124 L F F D N R F K L V P I R M G 41 N P E V T P L K V I G L Y A T 125 D T A L H G P K A I R N V M G 42 A L Q P R P V K N R H L L T I 126 I L E G F F Y K T F Q Y L G R 43 M T V G N L L K N I S N T L S 127 Y V T L T N V K T V H H L Q S 44 N L Y P S F V K K F R I I L G 128 A Y Y G G F Y K G Y Q I D M H 45 M A S P S T F K P V F Y G Y G 129 H H Q V H F F K K P H N Q Y I 46 S D T G T R P K A Y Q H F A Y 130 Y R P F H P L K R I F N F L G 47 M G S V H R P K K R S H I H H 131 P P I D Y R P K T R R F R Y I 48 F R G L H E P K R Y R R M S Q 132 Y H L G R F L K P I G P R L R 49 D Q P T R N P K P Y R I I R N 133 H H N G H P P K K V N I H P I 50 D P S D T P P K V V H P G A I 134 E M T G S P V K T I R P E E L 51 Q S I V T F T K K R P R Q A D 135 Y Y G P T G I K T V N I F V D 52 Q H F V S V L K N R R L H I I 136 R R L P N F T K P V R Y V V A 53 R E Y R S R F K K P S I N L L 137 G F Q F Y F Y K K Y N I D T S 54 D G P D H P L K R R Q P I V S 138 V Y S G T P F K A F H F I L Y 55 L G L V H N I K R R H I P I M 139 V L T Q Y P H K R R R F M F R 56 A F G V N V F K R V R F G D T 140 D N Q A N N F K T P P N P E V 57 S E Y A A P L K P R R I F R F 141 G H G F N Q P K A Y R P L A D 58 N H H V S R F K P V F N T P S 142 D N E R S P Y K R P G L R V A 59 G P Y V Y V F K T R S N T Q D 143 E L I V Y N L K T R H R T F R 60 M Y H G Y R F K T P H P Q R I 144 S F Y G G F F K L Y P P D Q L 61 E G A L H G L K G R Q R P A V 145 H A N F Y F I K G V N H I I S 62 P N Q L Y F L K K V R P V G L 146 V L L A Y N V K G V G P H N N 63 G I I A T R F K K P F P Q R H 147 R E D D A P P K N F H R L G S 64 N F Y F S Q F K T I R R H Q V 148 M G F T N L F K P R P H L F R 65 L F D G Y E F K G Y H I M R L 149 R D E D R Q F K K V S L E S D 66 D Q L A Y Q F K V V R N T L S 150 V G Q A Y T F K T Y R R G S G

62

67 Q E F G G E V K A L Q N S Q I 151 G P P A S F H K V V H H F T H 68 N G H L N N H K V I R I S A F 152 Q I K R I G N K G A F P K A V 69 H H M V G T H K T L P R M A G 153 Q I A R I G N K G A F P K A V 70 A H P A R F L K N Y H P L P H 154 Q I K R I G N A G A F P K A V 71 S D T L N R L K R V N H S H L 155 Q I K R I G N K G A F P A A V 72 D I L L Y Q L K R V H N Q I Y 156 G S T N H T S K L K T V K L K 73 E D V T N F F K T R H H G Y Q 157 G S T N H T S A L K T V K L K 74 T D H P Y T Y K V P N Y H I E 158 G S T N H T S K L A T V K L K 75 R T S F N F Y K A R R P I T S 159 G S T N H T S K L K T V A L K 76 F S A L F G L K A L R N N E M 160 P D F V H F F K K I K N V I A 77 S F M G F T Y K K R H R E G M 161 P D F V H F F A K I K N V I A 78 S Y P Q H F P K R Y R F A D D 162 P D F V H F F K A I K N V I A 79 E L A V F F I K K I H N H L M 163 P D F V H F F K K I A N V I A 80 H E M V F V I K L R R R L F M 164 H T V Y P P K K F R P A A Q F 81 T N Q V S N I K K Y Q R D E N 165 H T V Y P P A K F R P A A Q F 82 R T P D R E P K K R R H R T A 166 H T V Y P P K A F R P A A Q F 83 Q R L G N Q F K K V R P H H M 167 A K P D N P F K T K N L F I K 84 Y H M R T P L K T I P N Q A D 168 A K P D N P F A T K N L F I K

Table 10: Third Random peptide array with 175 spots corresponds to Figure 13.

1 H R T V F E P K P P H D S P Y 88 A H T D Y R I K L R Q R R H P 2 F H L F Y F A K T L H N Q P M 89 Y Q L V Y E I K N L Q H Q D R 3 D E G G Y T I K L V Q H S G S 90 P H E G T E R K T F Q F F M E 4 L Y L V G F F K G S H P N S Q 91 V S G V S F L K F V Q F A S T 5 Q H T G T R H K T R Q D V E N 92 Q N G F S F H K T R F R N I A 6 E V G V F F R K N S N F G F R 93 R Q N V Y R H K T F R F H D P 7 L V G V N S P K Y P H F F R Y 94 A I M G Y L P K F F H R F Y G 8 A T R G Y R F K G L F F D G D 95 P L G F G R H K A V H D H E D 9 L F L G G L I K G P Q D A T Q 96 E I Y S N E A K T I F F T N I 10 D F P V P R R K F R Q H G Y V 97 P V H F S L A K L R F D Y Q F 11 Y T A G Y L V K T V S F H F T 98 T Q A G Y E F K T F H R S Q N 12 H D P Q Y R I K P R H R V Q F 99 P S R L G R I K G R Q R G S Q 13 T G F Q Y T H K T R P P R D A 100 P F P V Y S L K T R Q P Y H M 14 S L L F S G F K T S H P Y L M 101 S G S Q T G F K N I S R D Y E 15 L Q M V G V H K A R Q P Y L P 102 N G F G N R H K A R F P I Y L 16 H G T D H F P K L R R H N A A 103 I T G V G E A K F R H P E V P

63

17 N E D F T G H K T R F N T R S 104 R A M F T G H K F I H N D E A 18 D P P V Y R L K A S S R F R L 105 S I G D S G H K L V N R Y T R 19 S V M V F S I K G R S N G D Q 106 D N H F Y F I K T L Q F N H G 20 A S H S Y S I K L V Q R H N A 107 Y N G F S F I K A V Q P R M G 21 R G T S Y G I K T R H R D L A 108 Y H I G Y E F K A R R F Q S E 22 T H Y L Y E I K L F S P G F M 109 N Q I V T G H K N I P H M L I 23 Q S D V F T I K Y R H R Q E N 110 I F L S F F A K G L Q N D D R 24 Y Q E G T S V K T I F D H P H 111 G N I V Y V L K P R H R P F G 25 E S N F Y G P K G V Q H M T R 112 G E F F H T P K G V S N F L A 26 L G D D N G I K L I H H R T M 113 R A F G S L F K L P N R A D A 27 M G M S T V I K L L H H A E Y 114 G L G F N F I K L L H F Q L R 28 A M V F H R H K T I F P I R L 115 H R M G G T F K A R Q P P G I 29 T I P V Y S R K G V H H E H Y 116 V Q R V G F I K P L H R M R G 30 T P V Q G G H K F R S R T A D 117 L V N F Y F H K T R H P Y N M 31 M T V L G L H K F I F P D E H 118 A L A G G R L K T R H P H Y P 32 Q I G V S F F K T F F H S G S 119 G F M Q G R I K F R F H G P Y 33 Y G A Q S L H K Y P N R H P F 120 Q R Q Q S R I K Y I Q F V M N 34 Q D I G Y E I K G P Q N V M H 121 N D Y V Y V V K F V S R D E R 35 F Y F V S G F K Y I P R A P Q 122 D N L S G S A K T R N R S S I 36 F R F G G V F K L R Q P F R S 123 S P V F Y V I K L P H D M R L 37 Q M P Q G L H K L S H P T D F 124 V T P Q S T F K A V P F D T P 38 D Q A G Y L F K F R P P E A D 125 N F H Q F E F K T F Q R I L T 39 F I Y F Y E I K F R R R Q N H 126 G S D D T G H K T L H R F R L 40 M Q R V F F F K T L F F L Y A 127 S P E F F V I K T R Q R F V D 41 Q V T F F R F K G L Q P M F P 128 N G Y G S R A K A I F D P S T 42 Y D G S S V P K N V H F I L D 129 S L E V Y R F K T V Q R F F Q 43 T T T F G R I K F V F H G E F 130 N P E G T G F K T R R F I E H 44 P Y N V G T F K N R H N Q H T 131 P Q M D Y S V K T R S H V N Y 45 M N Q S N R H K P V Q R R Y N 132 R P F F S R L K T F N D L Q D 46 L H V G N E H K T I P N R T Y 133 E G V V G E F K L L N D M H E 47 P G Y Q Y R H K L R N D E M N 134 A F E S F L H K F P P N P H V 48 H H R D Y T F K G S R R L V V 135 M T A V N S I K L P N F V I P 49 L A R F S G F K T L H P D R Q 136 Y G E G S V P K A P F F S I A 50 R M I L S F I K L R P D Q N N 137 S N A G S E V K L R P R F M M 51 N T V G G E F K T R H F N I F 138 G H D D N F I K T I H F N V Q

64

52 V R M Q S R I K Y R S H V L H 139 V F D L N R A K T V S R D T E 53 P V T S Y R I K L V R R D L A 140 L M F Q Y F I K G F P P H H R 54 S R Y F Y F I K N V F R Y E P 141 E Y I V H F I K T I R H I L I 55 Q L P V S T R K Y V Q D E V Q 142 P H V G S E H K G V H F P A L 56 I F F F P T L K P R P R V D V 143 I G M G S L P K L V H H M T G 57 F F N Q Y T F K N P F R A D T 144 Q P Q F G L P K A P S P M S G 58 A N P D H R V K T L S R Q I P 145 H N Q L S G I K T F P H V Y E 59 G Q S G Y L H K Y I Q P M L V 146 A P T V G G F K A R H N F S E 60 S N I L N F L K A R H H V P L 147 A P S G S E F K Y R H P E R G 61 A E D D F L V K T R N F P Y V 148 M H Q G Y F R K G R S R V T R 62 I N M G Y G F K L V P D I S R 149 N Q Y L N R L K L V Q R A E P 63 R H I D Y S A K F V Q P L T T 150 F V N V F V L K T P H H V D V 64 R Q M G Y F V K P R F R P T Q 151 L E Y V S F L K G I Q N P N I 65 H R N V F E H K L V Q F F T T 152 V L A F Y L P K L L H R R E N 66 G T Q V N F A K P V F N L Q P 153 D Y E G S R P K L V Q R S M E 67 G G H V S G I K G V N P N G R 154 L I F L G E I K P V H H S V N 68 M A P G Y R P K N R Q H E G F 155 E A T D Y F V K T F S R Q P H 69 T L L F G R L K N R R F V E R 156 Q R I V Y F L K G L N R E V S 70 I P N S Y F I K T F P P F N I 157 A V T D N V P K N V Q P V R M 71 F E S V Y T H K N P F F M G F 158 N H M F S R R K G V H P L E Y 72 S D F F F L V K N R F H I E R 159 G I P Q G F P K T R Q F G D T 73 D L E V F F F K A L F N G S F 160 Q G A V F R V K L V N R G T A 74 H T P V S F A K L I Q N H Q E 161 D E S F Y V F K F L P P P Q N 75 I E F G T R F K G R F P R L S 162 F T S V G E I K G P P F R Q F 76 L P M V F E F K L I P F Q I H 163 P S Q G G R L K F V P R L D H 77 V V V G F G F K A P Q N Y M L 164 D N G V S G F K A R Q R G F I 78 M I E G F R F K G V Q P S R L 165 L P F V Y F R K T L Q H F Y Q 79 Y R D G N R I K N R F R G D A 166 G D P V T E F K T V R P Y Q M 80 R H A G F E F K N I Q R N G P 167 L P Q G T E F K T I Q H F R L 81 E S E F G F P K Y V H R Y D Q 168 F P P G H F I K T S F P S F N 82 L L I L S V A K T R F R Q V G 169 S T G G K A P R K Q L A T K A 83 Y E N G Y G R K L R N F D H F 170 S A P A T G G V K K P H R Y R 84 M Y H G G F P K G I H P A V G 171 S A P A T G G V A K P H R Y R 85 Y I P G F L F K P I Q H Q V T 172 A R R G G V K R I S G L I Y E 86 I N V V G F R K T L P P I I D 173 A R R G G V A R I S G L I Y E

65

87 Y S N L H F L K A L H R G D S 174 K P A A A G V K K V A K S P K 175 K P A A A G V A K V A K S P K

References

Aka, J. A., Kim, G. W., & Yang, X. J. (2011). K-acetylation and its enzymes: Overview and new developments. Handbook of Experimental Pharmacology, 206, 1-12.

Avdic, V., Zhang, P., Lanouette, S., Groulx, A., Tremblay, V., Brunzelle, J., et al. (2011). Structural and biochemical insights into MLL1 core complex assembly. Structure (London, England : 1993), 19(1), 101-108.

Bannister, A. J., & Kouzarides, T. (2011). Regulation of chromatin by histone modifications. Cell Research, 21(3), 381-395.

Berger, S. L. (2007). The complex language of chromatin regulation during transcription. Nature, 447(7143), 407-412.

Campos, E. I., & Reinberg, D. (2009). Histones: Annotating chromatin. Annual Review of Genetics, 43, 559-599.

Cao, F., Chen, Y., Cierpicki, T., Liu, Y., Basrur, V., Lei, M., et al. (2010). An Ash2L/RbBP5 heterodimer stimulates the MLL1 methyltransferase activity through coordinated substrate interactions with the MLL1 SET domain. PloS One, 5(11), e14102.

Cheng, X., Collins, R. E., & Zhang, X. (2005). Structural and sequence motifs of protein (histone) methylation enzymes. Annual Review of Biophysics and Biomolecular Structure, 34, 267-294.

Cheung, P., & Lau, P. (2005). Epigenetic regulation by histone methylation and histone variants. Mol Endocrinol., (19(3):563-73.)

Chi, P., Allis, C. D., & Wang, G. G. (2010). Covalent histone modifications-- miswritten, misinterpreted and mis-erased in human cancers. Nature Reviews.Cancer, 10(7), 457-469.

66

Choi, J. K., & Howe, L. J. (2009). Histone acetylation: Truth of consequences? Biochemistry and Cell Biology = Biochimie Et Biologie Cellulaire, 87(1), 139- 150.

Cosgrove, M. S., & Patel, A. (2010). Mixed lineage leukemia: A structure-function perspective of the MLL1 protein. The FEBS Journal, 277(8), 1832-1842.

Del Rizzo, P. A., & Trievel, R. C. (2011). Substrate and product specificities of SET domain methyltransferases. Epigenetics : Official Journal of the DNA Methylation Society, 6(9)

Dhayalan, A., Kudithipudi, S., Rathert, P., & Jeltsch, A. (2011). Specificity analysis-based identification of new methylation targets of the SET7/9 protein lysine methyltransferase. Chemistry & Biology, 18(1), 111-120.

Dillon, S. C., Zhang, X., Trievel, R. C., & Cheng, X. (2005). The SET-domain protein superfamily: Protein lysine methyltransferases. Genome Biology, 6(8), 227.

Dou, Y., Milne, T. A., Ruthenburg, A. J., Lee, S., Lee, J. W., Verdine, G. L., et al. (2006). Regulation of MLL1 H3K4 methyltransferase activity by its core components. Nature Structural & Molecular Biology, 13(8), 713-719.

Duncan, E. M., Muratore-Schroeder, T. L., Cook, R. G., Garcia, B. A., Shabanowitz, J., Hunt, D. F., et al. (2008). Cathepsin L proteolytically processes histone H3 during mouse embryonic stem cell differentiation. Cell, 135(2), 284-294.

Faber, P. W., Barnes, G. T., Srinidhi, J., Chen, J., Gusella, J. F., & MacDonald, M. E. (1998). Huntingtin interacts with a family of WW domain proteins. Human Molecular Genetics, 7(9), 1463-1474.

Frank, R. (2002). The SPOT-synthesis technique. synthetic peptide arrays on membrane supports--principles and applications. Journal of Immunological Methods, 267(1), 13-26.

Goldberg, A. D., Allis, C. D., & Bernstein, E. (2007). Epigenetics: A landscape takes shape. Cell, 128(4), 635-638.

Greer, E. L., & Shi, Y. (2012). Histone methylation: A dynamic mark in health, disease and inheritance. Nature Reviews.Genetics, 13(5), 343-357.

Harrop, S. J., DeMaere, M. Z., Fairlie, W. D., Reztsova, T., Valenzuela, S. M., Mazzanti, M., et al. (2001). Crystal structure of a soluble form of the

67

intracellular chloride ion channel CLIC1 (NCC27) at 1.4-A resolution. The Journal of Biological Chemistry, 276(48), 44993-45000.

Hermann, A., Gowher, H., & Jeltsch, A. (2004). Biochemistry and biology of mammalian DNA methyltransferases. Cellular and Molecular Life Sciences : CMLS, 61(19-20), 2571-2587.

Huang, J., & Berger, S. L. (2008). The emerging field of dynamic lysine methylation of non-histone proteins. Current Opinion in Genetics & Development, 18(2), 152-158.

Jeltsch, A., & Lanio, T. (2002). Site-directed mutagenesis by polymerase chain reaction. Methods in Molecular Biology (Clifton, N.J.), 182, 85-94.

Kooistra, S. M., & Helin, K. (2012). Molecular mechanisms and potential functions of histone demethylases. Nature Reviews.Molecular Cell Biology, 13(5), 297-311.

Kouzarides, T. (2007). Chromatin modifications and their function. Cell, 128(4), 693-705.

Krivtsov, A. V., & Armstrong, S. A. (2007). MLL translocations, histone modifications and leukaemia stem-cell development. Nature Reviews.Cancer, 7(11), 823-833.

Kuo, A. J., Cheung, P., Chen, K., Zee, B. M., Kioi, M., Lauring, J., et al. (2011). NSD2 links dimethylation of histone H3 at lysine 36 to oncogenic programming. Molecular Cell, 44(4), 609-620.

Kurotaki, N., Harada, N., Yoshiura, K., Sugano, S., Niikawa, N., & Matsumoto, N. (2001). Molecular characterization of NSD1, a human homologue of the mouse Nsd1 gene. Gene, 279(2), 197-204.

Li, Y., Trojer, P., Xu, C. F., Cheung, P., Kuo, A., Drury, W. J.,3rd, et al. (2009). The target of the NSD family of histone lysine methyltransferases depends on the nature of the substrate. The Journal of Biological Chemistry, 284(49), 34283-34295.

Luger, K., Mader, A. W., Richmond, R. K., Sargent, D. F., & Richmond, T. J. (1997). Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature, 389(6648), 251-260.

Martin, C., & Zhang, Y. (2005). The diverse functions of histone lysine methylation. Nature Reviews.Molecular Cell Biology, 6(11), 838-849.

68

Mersfelder, E. L., & Parthun, M. R. (2006). The tale beyond the tail: Histone core domain modifications and the regulation of chromatin structure. Nucleic Acids Research, 34(9), 2653-2662.

Min, J., Allali-Hassani, A., Nady, N., Qi, C., Ouyang, H., Liu, Y., et al. (2007). L3MBTL1 recognition of mono- and dimethylated histones. Nature Structural & Molecular Biology, 14(12), 1229-1230.

Murray, K. The occurrence of epsilon-N-methyl lysine in histones. . Biochemistry., (1964 Jan;3:10-5.)

Nakamura, T., Mori, T., Tada, S., Krajewski, W., Rozovskaia, T., Wassell, R., et al. (2002). ALL-1 is a histone methyltransferase that assembles a supercomplex of proteins involved in transcriptional regulation. Molecular Cell, 10(5), 1119-1128.

Odho, Z., Southall, S. M., & Wilson, J. R. (2010). Characterization of a novel WDR5-binding site that recruits RbBP5 through a conserved motif to enhance methylation of histone H3 lysine 4 by mixed lineage leukemia protein-1. The Journal of Biological Chemistry, 285(43), 32967-32976.

Passani, L. A., Bedford, M. T., Faber, P. W., McGinnis, K. M., Sharp, A. H., Gusella, J. F., et al. (2000). Huntingtin’s WW domain partners in Huntington’s disease post-mortem brain fulfill genetic criteria for direct involvement in Huntington’s disease pathogenesis. Human Molecular Genetics, (vol.9, no. 14 2175-2182)

Patel, A., Dharmarajan, V., Vought, V. E., & Cosgrove, M. S. (2009). On the mechanism of multiple lysine methylation by the human mixed lineage leukemia protein-1 (MLL1) core complex. The Journal of Biological Chemistry, 284(36), 24242-24256.

Patel, A., Vought, V. E., Dharmarajan, V., & Cosgrove, M. S. (2008). A conserved arginine-containing motif crucial for the assembly and enzymatic activity of the mixed lineage leukemia protein-1 core complex. The Journal of Biological Chemistry, 283(47), 32162-32175.

Pei, H., Zhang, L., Luo, K., Qin, Y., Chesi, M., Fei, F., et al. (2011). MMSET regulates histone H4K20 methylation and 53BP1 accumulation at DNA damage sites. Nature, 470(7332), 124-128.

Peng, Z., Mizianty, M. J., Xue, B., Kurgan, L., & Uversky, V. N. (2012). More than just tails: Intrinsic disorder in histone proteins. Molecular bioSystems, 8(7), 1886-1901.

69

Qian, C., & Zhou, M. M. (2006). SET domain protein lysine methyltransferases: Structure, specificity and catalysis. Cellular and Molecular Life Sciences : CMLS, 63(23), 2755-2763.

Quina, A. S., Buschbeck, M., & Di Croce, L. (2006). Chromatin structure and epigenetics. Biochemical Pharmacology, 72(11), 1563-1569.

Rathert, P., Dhayalan, A., Ma, H., & Jeltsch, A. (2008a). Specificity of protein lysine methyltransferases and methods for detection of lysine methylation of non-histone proteins. Molecular bioSystems, 4(12), 1186-1190.

Rathert, P., Dhayalan, A., Murakami, M., Zhang, X., Tamas, R., Jurkowska, R., et al. (2008b). Protein lysine methyltransferase G9a acts on non-histone targets. Nature Chemical Biology, 4(6), 344-346.

Rathert, P., Zhang, X., Freund, C., Cheng, X., & Jeltsch, A. (2008c). Analysis of the substrate specificity of the dim-5 histone lysine methyltransferase using peptide arrays. Chemistry & Biology, 15(1), 5-11.

Rea, S., Eisenhaber, F., O'Carroll, D., Strahl, B. D., Sun, Z. W., Schmid, M., et al. (2000). Regulation of chromatin structure by site-specific histone H3 methyltransferases. Nature, 406(6796), 593-599.

Rega, S., Stiewe, T., Chang, D. I., Pollmeier, B., Esche, H., Bardenheuer, W., et al. (2001). Identification of the full-length huntingtin- interacting protein p231HBP/HYPB as a DNA-binding factor. Molecular and Cellular Neurosciences, 18(1), 68-79.

Rice, J. C., & Allis, C. D. (2001). Histone methylation versus histone acetylation: New insights into epigenetic regulation. Current Opinion in Cell Biology, 13(3), 263-273.

Sanchez, R., & Zhou, M. M. (2011). The PHD finger: A versatile epigenome reader. Trends in Biochemical Sciences, 36(7), 364-372.

Santos-Rosa, H., Kirmizis, A., Nelson, C., Bartke, T., Saksouk, N., Cote, J., et al. (2009). Histone H3 tail clipping regulates gene expression. Nature Structural & Molecular Biology, 16(1), 17-22.

Scharf, A. N., & Imhof, A. (2011). Every methyl counts--epigenetic calculus. FEBS Letters, 585(13), 2001-2007.

Shahbazian, M. D., & Grunstein, M. (2007). Functions of site-specific histone acetylation and deacetylation. Annual Review of Biochemistry, 76, 75-100.

70

Sidoli, S., Cheng, L., & Jensen, O. N. (2012). Proteomics in chromatin biology and epigenetics: Elucidation of post-translational modifications of histone proteins by mass spectrometry. Journal of Proteomics,

Smith, E., Lin, C., & Shilatifard, A. (2011). The super elongation complex (SEC) and MLL in development and disease. Genes & Development, 25(7), 661- 672.

Southall, S. M., Wong, P. S., Odho, Z., Roe, S. M., & Wilson, J. R. (2009). Structural basis for the requirement of additional factors for MLL1 SET domain activity and recognition of epigenetic marks. Molecular Cell, 33(2), 181-191.

Strahl, B. D., & Allis, C. D. (2000). The language of covalent histone modifications. Nature, 403(6765), 41-45.

Sun, X. J., Wei, J., Wu, X. Y., Hu, M., Wang, L., Wang, H. H., et al. (2005). Identification and characterization of a novel human histone H3 lysine 36- specific methyltransferase. The Journal of Biological Chemistry, 280(42), 35261-35271.

Tan, M., Luo, H., Lee, S., Jin, F., Yang, J. S., Montellier, E., et al. (2011). Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell, 146(6), 1016-1028.

Taverna, S. D., Li, H., Ruthenburg, A. J., Allis, C. D., & Patel, D. J. (2007). How chromatin-binding modules interpret histone modifications: Lessons from professional pocket pickers. Nature Structural & Molecular Biology, 14(11), 1025-1040.

Upadhyay, A. K., & Cheng, X. (2011). Dynamics of histone lysine methylation: Structures of methyl writers and erasers. Progress in Drug Research.Fortschritte Der Arzneimittelforschung.Progres Des Recherches Pharmaceutiques, 67, 107-124.

Volkel, P., & Angrand, P. O. (2007). The control of histone lysine methylation in epigenetic regulation. Biochimie, 89(1), 1-20.

Wagner, E. J., & Carpenter, P. B. (2012). Understanding the language of Lys36 methylation at histone H3. Nature Reviews.Molecular Cell Biology, 13(2), 115-126.

Wisniewski, J. R., Zougman, A., Kruger, S., & Mann, M. (2007). Mass spectrometric mapping of linker histone H1 variants reveals multiple

71

acetylations, methylations, and phosphorylation as well as differences between cell culture and tissue. Molecular & Cellular Proteomics : MCP, 6(1), 72-87.

Xiao, B., Wilson, J. R., & Gamblin, S. J. (2003). SET domains and histone methylation. Current Opinion in Structural Biology, 13(6), 699-705.

Yang, X. J., & Seto, E. (2008). Lysine acetylation: Codified crosstalk with other posttranslational modifications. Molecular Cell, 31(4), 449-461.

Yuan, W., Xie, J., Long, C., Erdjument-Bromage, H., Ding, X., Zheng, Y., et al. (2009). Heterogeneous nuclear ribonucleoprotein L is a subunit of human KMT3a/Set2 complex required for H3 lys-36 trimethylation activity in vivo. The Journal of Biological Chemistry, 284(23), 15701-15707.

Yun, M., Wu, J., Workman, J. L., & Li, B. (2011). Readers of histone modifications. Cell Research, 21(4), 564-578.

Zhang, K., & Dent, S. Y. (2005). Histone modifying enzymes and cancer: Going beyond histones. Journal of Cellular Biochemistry, 96(6), 1137-1148.

Zhang, X., Wen, H., & Shi, X. (2012). Lysine methylation: Beyond histones. Acta Biochimica Et Biophysica Sinica, 44(1), 14-27.

Zhu, B., & Reinberg, D. (2011). Epigenetic inheritance: Uncontested? Cell Research, 21(3), 435-441.

72

Qazi Muhammad Raafiq Career Objectives To utilize my potentials and capabilities for the achievements/ challenges of an organization especially in the field of Science.

Personal Information Date of Birth December 01, 1980 Sex Male Marital status Married Nationality Pakistani Cell. +49 17637418921 Email [email protected]

Academic Record Exam Pass Year Division Marks/ Percentage Board/University CGPA PhD 2013 - - - Jacobs University Biochemistry Bremen Germany M.Phil IBGE Biotechnology 2005 Ist 3.85/4.00 89.21% NWFP Agricultural & Genetic University Peshawar, Engineering Pakistan

B.Sc. (Hons.) 2002 Ist 3.73/4.00 85.16% NWFP Agricultural Agriculture University Peshawar Pakistan H.S.S.C 1998 Ist 711/1100 64.64% FBISE Islamabad Pakistan S.S.C 1996 Ist 661/850 77.76% BISE Peshawar Pakistan

73

Awards and Positions

 Currently hold a faculty position in Kohat University of Science and Technology Kohat, Pakistan.

 Worked as Research Associate in the BMBF project as a partial funding for my PhD in the Laboratory of Prof. Albert Jeltsch.

 Got the scholarship for PhD under the Human Resource Development (HRD) Program of Kohat University of Science and Technology Kohat, Pakistan.

 Worked as lecturer in Sarhad University of Science and Information Technology, Peshawar.

 Worked as Officer in Khushhali Bank.

 Got 2nd position in MPhil Biotechnology and Genetic Engineering session 2003-2004.

 Got 2nd position in Entomology department in the session 1998-2002 in BSc (Hons.)

 Got the award of HEC (Higher Education Commission, Pakistan) subsistence allowance for one year during MPhil under the title “Support to Scientific Talent”.

 Got the University merit scholarship for one year during MPhil.

 Got the University merit scholarship for three years during BSc (Hons.).