Structural and Functional Elucidation of PRDM

by

Danton Ivanochko

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Medical Biophysics University of Toronto

© Copyright by Danton Ivanochko, 2021

Structural and Functional Elucidation of PRDM Proteins Danton Ivanochko Doctor of Philosophy Department of Medical Biophysics University of Toronto 2021 Abstract

Epigenetic signalling dictates the dynamic patterns of expression that are required for life. In humans, epigenetic lysine methylation is produced by chromatin-bound transcription factors that contain a SET (Su(var)3-9-E(z)-Trx-homology) domain or a Rossmann-fold domain. The PRDM (PRDI-BF1-RIZ homology domain containing) proteins are identified by an N-terminal PR/SET domain that shares the canonical SET domain fold, but with just 20-30% amino acid sequence identity. Humans possess 19 PRDM-coding with roles in cellular proliferation and differentiation, and dysregulated PRDM gene expression is often associated with diseases. I hypothesize that only a subgroup of PRDM proteins are active lysine methyltransferases and pursue the objective of this thesis to expand the fields of PRDM biology by revealing novel insights regarding the enzymatic and non-enzymatic properties of PRDM proteins. To address this objective, I characterized MRK-740, which is the first and only chemical probe of a PRDM . Here I examined the mechanism of inhibition of MRK-740 for its target PRDM9, using biophysical and structural biology techniques. I characterized specificity of MRK-740 within the PRDM family and uncovered the mechanism of inhibition. Remarkably, a structural survey of all chemical probes targeting protein methyltransferases indicated that MRK-740 functions by a previously unobserved mechanism of inhibition. Next, I examined how PRDM proteins regulate epigenetic signaling through a mechanism that is independent of intrinsic methyltransferase activity. Here I examined the PRDM paralogs, PRDM3 (also known as MECOM or MDS1-EVI1) and PRDM16 (also known as MEL1), which may lack intrinsic methyltransferase activity and identified how they directly interact with the Nucleosome Remodeling Deacetylase (NuRD) complex. Finally, I examined 13 of the 19 human PR/SET domains for their ability to bind a chemical analog of the enzyme cofactor, to identify proteins with enzymatic potential. I used nuclear magnetic resonance spectroscopy to detect binding of fluorinated S-adenosyl-L-homocysteine and identified several previously uncharacterized PR/SET domains that likely bind the cofactor. Taken together, these findings provide the impetus, tools and insights for further research into the roles of PRDM proteins in health and disease. ii Acknowledgments

First and foremost, I have found myself incredibly fortunate to have Dr. Cheryl Arrowsmith as my thesis advisor for these past four and a half years. To Cheryl, I thank you for the opportunity that you have provided me. Your guidance has helped to expand my research horizons farther than I thought I could see. You provided me the freedom to explore my own ideas and always nudged me back course whenever I got lost in the weeds. I am incredibly grateful to be part of your research ecosystem spanning labs at UHN, the SGC, and with collaborations at Boehringer Ingelheim and MSD. When I first reached out to you for a PhD, I sought to get training in a field that would open doors for me in both academia and industry. Thank you for helping me fulfill this goal and thank you for helping to make me a better scientist.

Just like the African proverb that states “It takes a village to raise a child”, it has become apparent to me that my PhD has unquestionably benefited from the large community that has supported me throughout my academic journey.

To my thesis supervisory committee, Drs. Gil Prive and Matthieu Schapira, thank you for lending your time to share you expertise with me and helping to guide me towards achieving the best possible PhD thesis that I could achieve. To Matthieu, thank you for going above and beyond your role in the committee and as a collaborator. I am grateful that you included me as a co-author in your review article and thank you for sharing your knowledge of methyltransferases, which has been invaluable to my education and research.

My PhD would not have been possible without the mentorship from an extensive network of researchers and scientists. To Shili Duan, thank you for your support and guidance. You are one of the most pleasant and patient people I have ever met and working with you made my day-to- day activities in the lab a joy to take part in. To Dr. Evelyne Lima-Fernandes, thank you for helping me to get started researching the PRDMs and for fostering a fun social culture among the group. To Drs. Levon Halabelian and Scott Houliston, thank you for taking your time to train me in X- ray crystallography and NMR spectroscopy. These are two of the most interesting techniques I have ever learned, and I am incredible grateful for your explanations, guidance and insights. To Drs. Masoud Vedadi and Dalia Barsyte-Lovejoy, thank you for providing me opportunities to work

within your groups and for the helpful advice that has guided me towards being a better scientist. To Drs. Rachel Harding, Abdellah Allali-Hassani, Guillermo Senisterra, and Mani Ravi, your advice, guidance, and patience formed an indispensable component of my training and research. To Dr. Jark Boettcher, thank your for taking me under your wing during my time in Vienna. You instantly made me feel like I was a part of the team and I am very grateful that you opened the door to the inner workings of pharmaceutical development for me. I had an excellent experience and look forward to our next meeting!

To my friends and family, I thank you for your cheer. To my parents, I am only here today because you have always nurtured my goals and supported my endeavours. To all my friends, I want you to know that this would have been a drag without you. Whether around the foosball table, on the rugby pitch, or just kicking back on a patio, you all have been a stable source of energy that has kept me moving forward. To Marcus, Mike, Max, and Julian, thank you for standing with me on my wedding day. Three of you are friends that I can call my brothers, and one of you is a brother I am glad to call my friend.

Finally, to my beautiful and brilliant wife, Victoria, thank you for being the smiling face beside me on this adventure. You have been my stability in times of uncertainty and my inspiration on a cloudy day. You showed me how to find my motivation and since then, the two of us have been moving forward together. I can’t wait to add the third!

«

iv Table of Contents

Acknowledgments...... iii

Table of Contents ...... v

List of Figures ...... x

List of Tables ...... xiii

Chapter 1 ...... 1

General Introduction ...... 1

1.1 Background ...... 1

1.1.1 The and beyond...... 1

1.1.2 On epigenetics – “above” the gene...... 1

1.1.3 On epigenetic lysine methylation...... 2

1.1.4 Writers, readers and erasers of lysine methylation...... 5

1.1.5 On SET-fold as KMTs...... 5

1.1.6 On drugging SET-fold domains...... 6

1.2 The PRDM proteins...... 7

1.2.1 The PRDM family...... 7

1.2.2 Discovery of the PRDM family...... 9

1.2.3 Biological functions of PRDMs in health and disease...... 9

1.2.4 PRDMs: enzymes or pseudoenzymes? ...... 11

1.2.5 Comparing PR/SET domains with canonical SETs...... 13

1.2.6 Structural dynamics in PR/SET enzymes...... 15

1.3 Structural and functional elucidation of PRDM proteins ...... 17

1.3.1 Rationale for investigating PRDMs...... 17

1.3.2 Research objective...... 19

1.3.3 Aim 1. Characterization of the first chemical probe for a PRDM protein...... 19

1.3.4 Aim 2. Characterization of the PRDM3 and PRDM16 interaction with the NuRD complex...... 20

1.3.5 Aim 3. Identification of potential cofactor interactions among PR/SET domains...... 20

Chapter 2 ...... 21

Characterization of the first PRDM9 methyltransferase inhibitor ...... 21

2.1 Authorship attribution statement...... 21

2.2 Introduction ...... 22

2.3 Results ...... 27

2.3.1 MRK-740 specifically inhibits PRDM9 methyltransferase activity...... 27

2.3.2 MRK-740 is a substrate-competitive, cofactor-dependent PRDM9 inhibitor...... 33

2.3.3 PRDM9 overexpression is a rare occurrence in diverse tumor types...... 51

2.4 Discussion ...... 58

2.4.1 On the specificity of inhibition...... 58

2.4.2 An unusual mechanism of action...... 59

2.4.3 On PRDM9 in cancer...... 61

2.5 Methods...... 63

2.5.1 Protein production ...... 63

2.5.2 DSF assays ...... 64

2.5.3 Isothermal titration calorimetry (ITC) ...... 64

2.5.4 NMR spectroscopy...... 65

2.5.5 Crystallization and structure determination ...... 65

2.5.6 Methyltransferase inhibitor survey ...... 66

2.5.7 TCGA data analysis ...... 66

Chapter 3 ...... 67

Methyltransferase inhibitors that exploit the bound cofactor...... 67

3.1 Authorship attribution statement...... 67 vi

3.2 Introduction ...... 67

3.3 Results ...... 69

3.3.1 Exploiting the bound cofactor ...... 69

3.3.2 PRMT5 ...... 73

3.3.3 Type I PRMTs...... 75

3.3.4 EHMT1/2 (G9a/GLP) ...... 76

3.3.5 SETD7, SMYD2, and SUV420H1/2 ...... 76

3.3.6 PRDM9 ...... 77

3.3.7 Neurotransmitter methylation ...... 78

3.4 Discussion ...... 82

3.5 Methods...... 83

Chapter 4 ...... 84

Interrogating PR/SET domains for the ability to bind SAH ...... 84

4.1 Authorship attribution...... 84

4.2 Introduction ...... 84

4.3 RESULTS ...... 86

4.3.1 Establishment of the F-SAH binding assay ...... 86

4.3.2 Screening human PR/SET domains with the F-SAH binding assay...... 88

4.4 Discussion ...... 91

4.4.1 Using F-SAH to probe KMT domains ...... 91

4.5 Interpreting the absence of binding evidence ...... 94

4.5.1 Future analysis of PRDMs ...... 95

4.6 Methods...... 96

4.6.1 Protein production ...... 96

4.6.2 F-SAH binding by 19F NMR ...... 97

Chapter 5 ...... 99 vii

Direct interaction between the PRDM3 and PRDM16 tumour suppressors and the NuRD chromatin remodelling complex ...... 99

5.1 Authorship attribution...... 99

5.2 Introduction ...... 100

5.3 Results ...... 104

5.3.1 PRDM3 is encoded by the MECOM gene and is depleted in solid tumors ...... 104

5.3.2 N-termini of Prdm3 and Prdm16 interact with the NuRD complex ...... 108

5.3.3 N-termini of PRDM3 and PRDM16 bind directly to RBBP4 ...... 115

5.3.4 Co-crystal structure of RBBP4 with PRDM3/PRDM16 N-terminal peptide ...... 121

5.4 Discussion ...... 132

5.5 Methods...... 138

5.5.1 Gene and isoform expression analysis ...... 138

5.5.2 Co-Immunoprecipitation and mass spectroscopy ...... 138

5.5.3 Co-IP validation experiment and Western blot ...... 139

5.5.4 Mass Spectrometry analysis ...... 139

5.5.5 Cellular colocalization analysis ...... 140

5.5.6 Protein expression and purification ...... 141

5.5.7 Structural determination...... 142

5.5.8 Isothermal titration calorimetry (ITC) ...... 143

5.5.9 In silico alanine scan ...... 143

Chapter 6 ...... 144

Characterization of BTRC/FBXW11 binders ...... 144

6.1 Authorship attribution statement...... 144

6.2 Preamble ...... 144

6.3 Introduction ...... 145

6.4 Results ...... 151

viii

6.4.1 Identifying BTRC substrate pocket binders ...... 151

6.4.2 Structural characterization of FBXW11 ...... 153

6.5 Discussion ...... 160

6.6 Methods...... 163

6.6.1 Protein production ...... 163

6.6.2 19F-NMR library screen ...... 164

6.6.3 Crystallization and structure determination ...... 164

6.6.4 Surface plasmon resonance ...... 165

Reference ...... 166

ix List of Figures

Figure 1. Lysine methyltransferase (KMT) reaction mechanism. 4

Figure 2. PRDMs are a subset of SET fold proteins. 8

Figure 3. Comparison of the SET and PR/SET amino acid sequences. 14

Figure 4. Structures of all PR/SET domains available from the . 16

Figure 5. The PRDM-family is less represented in academic research compared to 18 canonical SET proteins.

Figure 6. MRK-740 is a potent, cell active PRDM9 inhibitor. 28

Figure 7. MRK-740 selectively inhibits PRDM9 in vitro. 30

Figure 8. MRK-740 binding among PRDM family. 32

Figure 9. Assessing MRK-740 off-target binding. 34

Figure 10. Binding of H3 peptide and SAM to PRDM9 by NMR. 36

Figure 11. MRK-740 mechanism of binding by NMR. 38

Figure 12. MRK-740 mechanism of action by enzyme activity assay. 40

Figure 13. Crystal structure of SAM-bound PRDM9 in complex with MRK-740 42 inhibitor.

Figure 14. Structural basis for MRK-740 inhibition of PRDM9. 45

Figure 15. MRK-740 binding is incompatible with SAH. 47

Figure 16. MRK-740 binding is unique among methyltransferase inhibitors. 49

Figure 17. PRDM9 is aberrantly expressed in a variety of cancers. 54

Figure 18. PRDM9 expression does not stratify patient survival outcomes. 56

Figure 19. Cofactor binding induces structural rearrangements in methyltransferase 70 domains.

Figure 20. SAM or SAH interface bound by methyltransferase inhibitors. 72

Figure 21. Substrate competitive methyltransferase inhibitors interact with SAM and 74 SAH.

Figure 22. Substrate competitive methyltransferase inhibitors interact with SAM and 80 SAH for neurotransmitter methylation.

Figure 23. Characterization of F-SAH binding to PRDM9. 87

Figure 24. Screening for F-SAH binders among PR domains. 90

Figure 25. Full-length PRDM3 depletion is prevalent in MECOM-deficient solid 105 tumors.

Figure 26. The N-termini of full-length PRDM3 and PRDM16 isoforms function as 109 protein-protein interaction scaffolds.

Figure 27. The NuRD complex is enriched in full-length PRDM3 and PRDM16 prey. 111

Figure 28. The NuRD complex members are enriched interactors with full-length 113 PRDM3 and PRDM16.

Figure 29. Endogenous full-length PRDM3, but not EVI1 co-immunoprecipitate with 114 RBBP4.

Figure 30. The first 12 amino acids of full-length PRDM3 and PRDM16 enable 116 RBBP4 binding.

Figure 31. The RBBP4-Histone H3 interaction is disrupted by K4 methylation. 117

Figure 32. The first 12 residues of PRDM3/16 interact with RBBP4. 119

Figure 33. Weak cellular co-localization of N-terminally truncated PRDM3 with 120 RBBP4.

Figure 34. Crystal structure of RBBP4 in complex with the PRDM3 (1-12 amino acid) 122 peptide.

Figure 35. Crystal structure of RBBP4 in complex with the PRDM16 (1-12 amino 124 acid) peptide.

Figure 36. Amino acid interactions at the PRDM3 peptide-RBBP4 interface. 126

Figure 37. Electron density maps of PRDM3 and PRDM16 peptides bound to RBBP4. 128

Figure 38. A conserved lysine in PRDM3 and PRDM16 is crucial for RBBP4 binding. 129

Figure 39. PRDM3 and PRDM16 mimic the RBBP4-histone H3 interaction. 131

Figure 40. PRDM3 and PRDM16 lack the conserved catalytic tyrosine observed in 135 SET domain structures of Michaelis complexes.

Figure 41. Targeting the BTRC and FBXW11 paralogs for PROTAC handles. 147

xi

Figure 42. Finding BTRC-binders from a 19F library screen. 152

Figure 43. FBXW11-SKP1 in complex with a pSer33/pSer37 β- peptide. 154

Figure 44. Binding characterization of pSer33/pSer37 β-catenin peptide on FBXW11. 158

xii List of Tables

Table 1. Reported methyltransferase-related functions of PRDMs. 12

Table 2. Crystallographic data collection and refinement statistics of the 43 PRDM9:SAM:MRK-740 complex.

Table 3. The cancer genome atlas (TCGA) studies with >10 patient-matched 52 healthy tissue control samples.

Table 4. PRDM9 mRNA expression statistics in human tumors. 52

Table 5. PRDM9 expression does not stratify patient survival outcomes. 57

Table 6. Known dissociation constant (KD) and half maximal inhibition values 93 (IC50) of SAM and SAH for SET domain proteins.

Table 7. TCGA Patient information. 106

Table 8. Statistical analysis of differential MECOM gene expression. 106

Table 9. Data for X-ray Crystal Structure of PRDM3/16 1-12aa with RBBP4. 123

Table 10. Interaction distances at the interface of the PRDM3 N-terminal peptide and 127 RBBP4.

Table 11. Crystallographic data collection and refinement statistics for FBXW11- 155 SKP1 complex.

Table 12. Protein-peptide interactions between FBWX11 and β-catenin. 159

Chapter 1

General Introduction

1.1 Background

1.1.1 The genome and beyond.

Advancements in genomics over the past two decades have enabled the development of medical genetics offering the promise of precision or “personalized” medicine. Genomic sequencing has provided pragmatic insights into disease mechanisms and has led to the creation of genetic therapies, such as for sickle anemia [1] and haemophilia [2]. But as many genetic therapies progress, shortcomings in some areas suggest that genetic medicine may not be the panacea that was promised [3]. So, if the genome is not the summit for modern medicine to overcome, what is “above the gene”?

1.1.2 On epigenetics – “above” the gene.

The term epigenetics describes the transient, yet heritable information encoded upon genomic DNA (the Greek prefix “epi” meaning “upon” or “above”). Epigenetics explains how cells can share an identical genomic sequence yet diverge into a multitude of distinct cell-types during development. Genetic information in DNA is stored as polymers of nucleotides, distinguished by 4 aromatic bases, adenine (A), cytosine (C), guanine (G), and thymine (T), wound into a duplex of complementary strands. In all eukaryotic organisms, including humans, genomic DNA is stored within an organelle called the nucleus. Here the double-stranded DNA wraps around histone octamer proteins (H2A, H2B, H3, and H4), 147 base pairs at a time, forming nucleosomes, which are in turn packaged in a structure called chromatin. Within chromatin, epigenetic processes function to control the gene expression, by altering the chromatin environment at specific genetic loci. Epigenetic mechanisms included covalent chromatin modifications, such as cytosine

1

methylation and histone post-translation modifications, chromatin remodeling, such as the repositioning of nucleosomes and incorporation of histone variants, and non-coding RNAs, such as microRNAs (miRNAs), which affect gene expression after transcription. Epigenetic regulators are vitally important for health and development, while epigenetic dysregulation is recognized to drive a variety of diseases, such as cancers and developmental disorders. Over the past decade pharmacological targeting epigenetic pathways has progressed from “a new frontier in drug discovery” [4], to an established drug class with ~12 approved epigenetic drugs, as well as at least 3 dozen in various stages of clinical trials [5]. Expectantly, furthering our understanding of how specific epigenetic mechanisms, such as lysine methylation on [6], function at the molecular level, will provide insights for understanding health and disease, as well as reveal new opportunities for therapeutic development.

1.1.3 On epigenetic lysine methylation.

The concept that epigenetic information is encoded through covalent modifications of histone proteins was summarized in 2000 as the “histone code hypothesis” [7]. It was recognized that lysine residues on the disordered histone tails could be post-translationally modified with the addition of methyl moieties, and later that year, an enzyme called suppressor of variegation 3–9 homologue 1 (SUV39H1) was identified as the first lysine methyltransferase (KMT; lysine is demarcated by the single-letter amino acid code: K) in humans [8]. KMTs catalyze the transfer of up to 3 methyl groups from donor S-adenosyl-L-methionine (SAM) molecules to the side chain ε- amino group of lysine residues on proteins in a site-specific manner (Figure 1a). Well described lysine methylation marks include K4, K9, K27, K36, and K79 on histone H3 and K20 on histone H4 [9]. Epigenetic information controlling genetic expression is encoded by specific methyl-lysine signatures, described by the presence or absence of specific lysine methylation marks, the degree of lysine methylation (i.e. mono-, di- or tri-methylated), as well as the cross-talk of other histone modifications such as arginine methylation, lysine acetylation and mono-ubiquitination, serine and threonine phosphorylation, and other non-histone epigenetic signals. Bearing that in mind, in

2

general, methylation of histone H3 at position(s) K4, K36 and/or K79 are often associated with active transcription, whereas methylation of K9 and/or K27, as well as on histone H4 at position K20 can be associated with repressing gene expression [9]. Moreover, bivalent promoters at genes poised for rapid activation during cellular differentiation are marked by both the active mark of trimethylated lysine 4 on histone H3 (H3K4me3), as well as the repressive mark of H3K27me3, and await specific developmental cues signaling for H3K27me3 removal [10]. Prior to its role in epigenetic signalling, methylated lysine was first detected on a flagellin protein of the bacteria, Salmonella typhimurium, in 1959 [11], and has since been recognized in many non-histone proteins in humans, such as the tumor-suppressor [12]. Nevertheless, lysine methylation is an epigenetic signal essential for health and critically dysregulated in disease [13] [9].

3

Figure 1. Lysine methyltransferase (KMT) reaction mechanism. Reaction diagrams indicating (a) KMT catalyzed mono-, di- and tri- methylation using the methyl-donor SAM and regeneration of the reaction by- product SAH, as well as (b) the SN2 reaction mechanism indicating the transition state (‡). (c) A structural model of how SET-fold domain tyrosine residues (pink) coordinate methyl transfer from SAM (yellow) to a substrate lysine sidechain (blue). This model was assembled from PRDM9 structures (PDB accession codes: 6nm4 and 4c1q).

4

1.1.4 Writers, readers, and erasers of lysine methylation.

The “histone code hypothesis” proposes a language encoded onto histone proteins and implies the presence of writers, readers, and erasers of these epigenetic marks. Writers of lysine methylation are called KMTs, utilize SAM as a cofactor, and can be grouped into proteins that contain a domain with a Su(var)3-9-E(z)-Trx-homology (SET)-fold or a Rossmann-fold [14]. Presently, SET-folds are only associated with lysine methylation [15], while Rossmann-folds are associated with lysine methylation and arginine methylation, as well as other enzymatic reactions that utilize the cofactors nicotinamide adenine dinucleotide (NAD) or flavin adenine dinucleotide (FAD) [16]. Lysine methylation does not alter the electrochemical charge of the residue, but the mono-, di- and tri-methylated states of lysine (Kme1, Kme2 and Kme3, respectively) modify the sidechain’s topology, which dictates compatibility for certain protein-protein interactions (Figure 1a). Accordingly, several lysine methylation reader domains (e.g. WDR, Tutor, PHD, MBT, Chromo, PWWP, BAH, ADD, ankyrin repeat, and zn-CW) have been identified that recognize lysine residues at specific histone regions that also harbour particular lysine methylation modifications [14].

Lysine methylation was once speculated to be a static and irreversible modification, but the discovery of lysine demethylases, the epigenetic erasers, demonstrated the dynamic nature of lysine methylation [17]. Lysine demethylases fall into two groups characterized by the presence of either an amine oxidase-like (AOL) domain or a Jumonji C (JmjC) domain. Demethylation by AOL domains is driven by amino-oxidation reaction, while JmjC domains catalyze hydroxylation to remove the methyl group of the lysine sidechain [18]. Over 20 lysine demethylases possess JmjC domains, while only two proteins, LSD1 and LSD2, possess an AOL domain [14].

1.1.5 On SET-fold as KMTs.

The protein family characterized by SET-fold domains compose the largest class of KMT in humans [15]. The SET domain was first observed in the melanogaster proteins

5

Su(var)3-9, E(z), and Trx, and has since been detected across every branch of the eukaryotic family tree, as well as in some bacteria, archaea, and viruses [19]. As a KMT, SET-fold domains catalyze the transfer of a methyl group from SAM to the de-protonated and neutrally charged ɛ-amino group of a lysine sidechain, followed by release of the reaction by-product S-adenosyl- L-homocysteine

(SAH) (Figure 1a). The catalytic mechanism is believed to occur though an SN2 reaction, whereby hydroxyl groups on tyrosine sidechains geometrically coordinate a nucleophilic attack by the lysine amino group onto the methyl group of the SAM sulfonium, subsequently stabilizing the reaction transition state before separation of the methylated lysine and SAH (Figure 1b-c) [20] [21] [22] [23]. It is proposed that in various SET-fold domains, the capacity for additional rounds of methylation producing Kme2 and Kme3 is specified by a tyrosine/phenylalanine switch which sterically accommodates distinct methylations states [24] [22]. The SET-fold family is further diversified by the presence of accessory domains, which impart a broad range of functionalities that affect substrate specificity, cellular localization, and enzymatic modulation.

1.1.6 On drugging SET-fold domains.

Aberrant epigenetic lysine methylation by SET-fold enzymes underlies several disease mechanisms in cancers, as well as in immunological and developmental disorders [4]. This has driven medicinal chemistry towards the discovery of several chemical modulators of KMT function [25] and has led to the approval of the first-in-class oncological drug targeting the SET- fold domain of EZH2 [26]. The EZH2-targeting drug, called Tazemetostat, has been approved to treat a rare type of solid tumor called epithelioid sarcoma, and may soon be approved to also treat a common form of blood cancer called follicular lymphoma. Tazemetostat is the first epigenetic drug approved for the treatment of a solid tumor (as other epigenetic drugs have already been approved for blood cancers), as well as being the first approved drug that targets a histone methyltransferase. Tazemetostat inhibits the EZH2 SET-fold domain by a cofactor-competitive, substrate-noncompetitive mechanism [27]. This means that Tazemetostat blocks SAM from binding but is indifferent to the presence of the substrate histone H3 protein. The success of Tazemetostat is pursued by additional small-molecule inhibitors of EZH2 that are currently in

6

clinical trials, as well as another KMT inhibitor targeting DOT1L, and two inhibitors targeting the lysine demethylase, LSD1 [5]. But future success in drug development targeting KMTs will require a broader understanding of SET-fold proteins [28]. For instance, while many SET-fold domains have chemical probes, certain branches of the SET-fold family tree are critically underrepresented for chemical probes [25] [29].

1.2 The PRDM proteins.

1.2.1 The PRDM family.

The PRDI-BF1-RIZ1 homologous domain-containing (PRDM) proteins comprise a distinct family of epigenetic proteins, known for their diverse roles in development, homeostasis and disease [30] [31] [32]. The gene encoding the first proto-PRDM first emerged in metazoans and subsequent genetic expansion in vertebrates has culminated with the largest sets of PRDM genes in higher-order primates [33] [34]. The PRDM proteins are a subgroup of the SET-fold containing proteins, phylogenetically distinct from other canonical SET-fold domains (Figure 2a). The PRDM protein family is identified the presence of an N-terminal PR/SET domain that adopts the canonical SET-fold structure, but only shares 20-30% amino acid sequence identity [35] [36]. For example, the EHMT1 SET-fold domain and PRDM9 PR/SET domain adopt similar folds, but only share 24.1% sequence identity (Figure 2b). The PRDM proteins are further distinguish by a variable number of C-terminal C2H2 (ZnF) motifs. Humans possess 19 PRDM-coding genes, some of which are regulated by differential splicing events that result in transcripts encoding the full-length proteins, as well as shorter isoforms lacking either the PR/SET domains or ZnF regions (Figure 2c).

7

Figure 2. PRDMs are a subset of SET fold proteins. (a) A phylogenetic tree showing the relatedness of proteins with canonical SET-fold domains (yellow) and PR/SET domains (blue). (b) The EHMT1 SET- domain and PRDM9-PR/SET domains adopt similar folds, but only share 24.1% sequence identity. (c) Generic domain diagrams for PRDM proteins and isoforms lacking the PR/SET domain (ΔPR/SET) or zinc finger motifs (ΔZnF) are also shown. The tree diagram was generated from amino acid sequences using the R programming language (version 4.0) [37] with the ape (v5.4.1) and phytools (v0.7.70) packages. The PDB accession codes for EHMT1 and PRDM9 are 3hna and 6nm4, respectively. 8

1.2.2 Discovery of the PRDM family.

The discovery of the PRDM family arose from a foundation of research that aimed to uncover key genetic regulators of developmental pathways and oncogenes. In 1988, PRDM3, otherwise known as MECOM or MDS1-EVI1, was identified from a screen of retrovirus induced myeloid leukemia cell lines that were examined for viral insertions in proto-oncogenes [38]. Next in 1991, PRDM1, then known as PRDI-BF1 and then briefly BLIMP1, was identified as a novel zinc finger transcriptional repressor of beta-interferon gene expression [39]. PRDM2, then known as RIZ1, was found in 1995, as a novel binder of the (Rb) [40]. Buyse et al. identified a homologous domain shared been PRDI-BF1 (PRDM1) and RIZ (PRDM2) that was called the PRDI-BF1 and RIZ (PR) domain [40], and a subsequent study from the same research group identified the PR domain of MDS1-EVI1 (PRDM3) [41]. In 1998, a homology analysis revealed that the PR domain is a derivative of the SET domain, implicating the PR/SET domain in chromatin regulation [36]. PRDM4 is the first protein to bare the “PRDM” moniker [42], and most additions to the family have adopted the same name convention, with the exception of the most recent additions, FOG1/ZFPM1 and FOG2/ZFPM2 [43].

1.2.3 Biological functions of PRDMs in health and disease.

The 19 human PRDM-coding genes are extensively reviewed for their roles in cellular proliferation and differentiation, and how dysregulated PRDM gene expression is often associated with diseases [30] [31] [44] [32] [45]. Certain PRDMs are associated with tumor suppressor activity. For example, PRDM1 regulates normal B-lymphocyte differentiation and is a tumor suppressor that is frequently silenced in B cell lymphomas [46] [47] [48]. PRDM5 is another tumor suppressor, which is silenced in breast, liver and ovary tumors, and in these cancer types, PRDM5 overexpression was shown to induce apoptosis [49].

Other PRDMs are linked to roles that may be oncogenic. For example, PRDM14 expression is normally restricted to embryonic stem cells but when mis-expressed, acts a an

9

oncogene in diverse cancerous tissues [50] [51] [52] [53] [54]. PRDM12 is expressed in nociceptor neuronal progenitors where it is essential for differentiation, while loss-of-function mutations in PRDM12 are known to cause a neuronal disorder called congenital insensitivity to pain [55]. Like PRDM12, PRDM13 is also expressed in neural progenitors and both PRDM12 and PRDM13 have been observed to be overexpressed in multiple cancer types [56] [57]. PRDM9 expression is normally restricted to male and female fetal gonad tissues and testes in cells undergoing , where it binds to specific genomic loci and epigenetically marks sites of meiotic recombination to ensure DNA crossover events only occurs between homologous [58] [59] [60] [61] [62]. It is possible that PRDM9 may be an oncogene. This is evidenced by that find that PRDM9 overexpression in tumors correlates with genomic instability associated with chromosomal rearrangements at PRDM9 binding sites [63] [57] [64] [65] [66].

Some PRDMs are expressed as full-length proteins, as well as ΔPR/SET and/or ΔZnF isoforms (Figure 2c) and dysregulated expression of these isoforms has also been linked to several cancers. Under normal circumstances, the PRDM expression patterns of the full-length and ΔPR/SET isoforms are tightly controlled by altering transcriptional initiation between distinct transcription start sites (TSS) upstream and downstream of the PR/SET domain-encoding exon. For example, the differential expression of PRDM3 isoforms regulates haematopoiesis, wherein the full-length PRDM3 isoform appears to be important for haematopoietic stem cell (HSC) quiescence [67], whereas the ΔPR/SET PRDM3 isoform appears to promote HSC expansion by supressing differentiation [68]. Disruption of the PRDM3 gene can be caused by retroviral insertions that block the PR/SET containing exons from transcription, resulting in exclusive expression of the PR/SET isoform in several cancer types [69]. For example, the ΔPR/SET PRDM3 isoform, is associated with highly aggressive acute myeloid leukemia (AML) [70] and drives metastasis in colon cancer [71]. The PRDM1, PRDM2, and PRDM16 genes each encode a full-length isoform and a ΔPR/SET isoform. Similar to PRDM3, the ΔPR/SET isoforms are considered to be oncogenic, while the full-length proteins act as a tumor suppressor [72] [36] [73] [74] [75] [76] [77].

10

1.2.4 PRDMs: enzymes or pseudoenzymes?

While PRDM2 was the first reported KMT in the PRDM family [78], PRDM9 is the most extensively characterized enzyme. The PRDM9 PR/SET domain catalyzes the mono-, di- and tri- methylation of lysine 4 and lysine 36 on histone H3 (H3K4me1/2/3 and H3K36me1/2/3), which has been reported in vitro and in vivo [61] [79] [80]. An additional study has also reported in vitro KMT activity for H3K9me1/2, H3K18me1 and H4K20me1/2 [81]. PRDM7 is closely related to PRDM9 but has a 100-fold slower rate of catalysing the trimethyl mark and is only capable of modifying H3K4, but not H3K36 [82]. Furthermore, PRDM9 and PRDM7 are the only PRDM KMTs that have reported enzyme kinetics data concerning methyltransferase activity [79] [82]. Currently, evidence for KMT activity has only been reported for 8/19 PRDM proteins and some PRDMs are believed to be pseudoenzymes that regulate histone methylation by binding to other KMTs (Table 1). Here, two important considerations should be addressed: first, the absence of evidence for KMT activity does not equate to the evidence of absence and second, many ostensibly active PR/SET domains lack residues typically found in the majority of active SET-fold KMTs [83]. While demonstrating KMT activity in previously uncharacterized proteins addresses the first consideration, reconciling the second consideration will require a deeper and more thorough understanding of the PR/SET domains, that could revise the currently accepted paradigms about the KMT-mediated catalysis.

11

Table 1. Reported methyltransferase-related functions of PRDMs. PRDM proteins are listed indicating any reports of intrinsic lysine methyltransferase activity, as well as protein-protein interactions with other methyltransferases. Histone tail substrates are also shown. PRDM aliases are written in parentheses.

Intrinsic KMT Activity Methyltransferase interactions Substrate Study Substrate (Enzyme) Study PRDM1 (BLIMP1) H3K9me3 (G9a) [84] H4R3me2s (PRMT5) [85] PRDM2 (RIZ1) H3K9me1 [78] [86] H4K20me1 (Set7) [86] PRDM3 (MECOM) H3K9me1 [87] H3K27me3 (EZH2) [88] PRDM4 H4R3me2s (PRMT5) [89] PRDM5 H3K9me2 (G9a) [90] PRDM6 H4K20me [91] PRDM7 H3K4me3 [82] PRDM8 H3K9me2 [92] PRDM9 H3K4me3, H3K9me1/2, [61] [81] H3K36me3, H3K18me1, [79] [80] H4K20me1/2 PRDM10 PRDM11 PRDM12 H3K9me2 (G9a) [55] [93] PRDM13 H3K9me, H3K27me [56] PRDM14 H3K27me3 (EZH2) [94] PRDM15 (ZNF298) PRDM16 (MEL1) H3K9me1, H3K4me2 [87] [56] [95] PRDM17 (ZNF408) FOG1 (ZFPM1) FOG2 (ZFMP2)

12

1.2.5 Comparing PR/SET domains with canonical SETs.

A core prerequisite for KMT activity is the ability to bind to the cofactor SAM, but it is unclear if all PR/SET domains (including reportedly active ones) are capable of SAM binding based on amino acid conservation (Figure 3). All domains with a SET-fold are understood to interact with SAM through a conserved hydrogen bonding network split across two spatially distinct domain regions [96]. Comparing the residue conservation of PR/SET domains and canonical SET domains in the SAM-binding region 1 (Figure 3a-b), we observe lower shared identity among the PR/SET proteins. This contrast is starker in the SAM-binding region 2 (Figure 3a, c), where many PR/SET domains lack an asparagine-histidine pair (NH) that is perfectly conserved in all canonical SET domains except the paralogs SETD5 and MLL5. Furthermore, it is understood that KMTs utilize the hydroxyl group of a “catalytic tyrosine” residue to stabilize the transfer of the methyl group from SAM to lysine [97]. All the canonical SET-domains in humans possess this “catalytic tyrosine” (Figure 3a, Region 3), except SETD5 and MLL5 which possess a phenylalanine instead. Notably, both SETD5 and MLL5 reportedly lack intrinsic methyltransferase activity [98] [99] [100]. The “catalytic tyrosine” is poorly conserved among the human PR/SET domains, and is absent in PRDMs 3, 7, 16, 17, FOG1 and FOG2 (Figure 3a, d). Of the PR/SET domains lacking a “catalytic tyrosine”, only PRDM7 possesses a hydroxyl- containing residue (serine), which explains how PRDM7 retains KMT activity [82]. Studies that validate the reported activity in KMTs, such as PRDM 2, 3, 6, 8, 13 and 16, which lack some or all of the residues thought to be important for the methylation mechanism will broaden our understanding of epigenetic biology and enzyme biochemistry.

13

14

Figure 3. Comparison of the SET and PR/SET amino acid sequences. (continued from previous page) (a) amino acid alignment all human SET and PR/SET domains highlighting 3 regions of the cofactor pocket. (b-d) amino acid sequence logos comparing SET and PR/SET. A SAM-bound PRDM9 structure (PDB accession code 6nm4) is shown highlighting the cofactor- binding residues in regions 1, 2, and 3. The “catalytic tyrosine” is located at the 3rd last position within region 3.

1.2.6 Structural dynamics in PR/SET enzymes.

Additionally, PR/SET domains must be able to adopt a catalytically competent structural conformation, were the post-SET residues fold over the cofactor and substrate to enable methylation [83]. Structural evidence for this dynamic catalytic mechanism in PR/SET domains only exists for PRDM9 [101] [102]. In PRDM9, the post-SET residues move from an open and inactive conformation (Figure 4a) to a closed and active conformation (Figure 4b). Interestingly, inhibition of PRDM9 is achievable by locking the PR/SET domain in the inactive conformation (Figure 4c). All of the previously determined structures of PR/SET domains from other PRDMs have been solved in the apo-state with the post-SET residues positioned away from the cofactor/substrate site (Figure 4d). Structural elucidation of PR/SET domains in catalytically competent conformations may provide insights into their enzymatic function.

15

Figure 4. Structures of all PR/SET domains available from the Protein Data Bank. Structural variability of the post-PR/SET regions (highlighted in orange) of PR/SET domain structures. Cartoon representations of PRDM9 in (a) apoenzyme, (b) holoenzyme and (c) chemically inhibited structural conformations. Stick representation of SAH, histone H3 peptide, SAM, and MRK-740 are shown in cyan, green, blue, and red, respectively. (d) Structures of PR/SET domains with the post-PR/SET regions oriented away from the cofactor and substrate binding pocket are shown. X-ray crystal structures are shown, unless otherwise indicated as NMR solution structures. Disordered regions are depicted as smoothened loops for visual clarity.

16

1.3 Structural and functional elucidation of PRDM proteins

1.3.1 Rationale for investigating PRDMs.

Innovative therapeutics are currently being developed that target a variety of epigenetic regulators including protein methyltransferase enzymes [26] [28] [103]. But these innovations are limited by the scope of actionable targets and the field of research pertaining to the SET- fold is heavily skewed towards only a few proteins (Figure 5a). Furthermore, the PRDMs are categorically understudied compared to proteins with canonical SET domains, as PRDMs represent 36% of proteins with SET-folds in humans, but only 11% of research articles referencing proteins with SET-folds (Figure 5b). There is a similarity between the distributions of research articles referencing SET-fold proteins and kinases, both of which skew a high proportion of publications towards a narrow subset of proteins [104]. It is possible that this publication skewing may result in positive-feedback loops where continued focus on well-known targets imparts the neglect of potential new ones [104]. Fortunately, the field of kinase research had recognized this trend early on [105], and efforts to understand the wider range of kinase biology has helped to grow protein kinases into the second largest group of drug targets, behind G-protein-coupled receptors [106] [107]. The development of drugs and chemical probes targeting kinases has been enabled by a thorough understanding of the family. This was facilitated by kinase-assay development and selectivity screening, the determination of cellular targets and rigorous structural characterization, all contributing to the discoveries of selective chemical probes [108] [109] [106]. Additionally, modern approaches to chemical probe development, such as Proteolysis targeting chimeras (PROTACs) that target kinases [110], provide a mechanism to target pseudokinases [111], which adopt a similar protein fold as kinases but lack enzymatic activity. Learning by example, it becomes imperative to characterize the PRDM subgroup of the SET-fold family, to better understand and potentially identify actionable targets available for therapeutic intervention.

17

Figure 5. The PRDM-family is less represented in academic research compared to canonical SET proteins. Tally of research articles referencing proteins that contain SET and PR/SET domains are depicted on (a) linear and (b) logarithmic scales. Proteins names and aliases were retrieved from www..org and the number of research articles (omitting review articles) referencing names and aliases in the title, abstract or keywords were enumerated from www.scopus.com.

18

1.3.2 Research objective.

Structural and function elucidation of PRDM proteins can reveal pertinent information relevant to understanding human development and diseases. I hypothesize that only a subset of PRDM proteins are active lysine methyltransferases. The objective of this thesis is to expand the fields of PRDM and KMT biology by revealing novel insights regarding the enzymatic and non- enzymatic properties of PRDM proteins. I employed techniques from structural biology, biophysics, and bioinformatics. Specifically, x-ray crystallography was used to determine the atomic models of a PR/SET domain, as well as to determine the structures of PRDM-mediated protein-protein interactions. Nuclear magnetic resonance spectroscopy, isothermal titration calorimetry and dynamic light scattering were used to interrogate protein-ligand interactions. Bioinformatic analyses were used to examine PRDM gene expression in cancers and to investigate common structural features of SET-fold chemical probes. To accomplish my objective, I was aided by collaborators from the University of Toronto, the Structural Genomics Consortium, the University of Oxford, and the Agency for Science Technology and Research (A*STAR) in Singapore, as well as scientists from the pharmaceutical companies MSD, and Boehringer Ingelheim. To specifically address my objective, I present the following studies:

1.3.3 Aim 1. Characterization of the first chemical probe for a PRDM protein.

PRDM9 is the most characterized PR/SET domain containing KMT and is involved in important developmental pathways related to meiosis, while being associated with genomic instability in cancer. Presently, there are no chemical probes for PRDM9 or any of the PRDM proteins, which hinders the ability to study disease-relevant biology. Here, I structurally and biophysically characterized MRK-740, which is the first chemical probe discovered for a PRDM protein. I demonstrated how MRK-740 binds to PRDM9, informing how the chemical probe uniquely inhibits PRDM9 KMT activity. Additionally, I demonstrated that MRK-740 selectively binds to the PRDM9 PR/SET domain compared to other human PR/SET domains and provide an in-depth examination of MRK-740 binding selectively for PRDM9 over PRDM7. Furthermore, I identified the occurrence of rare PRDM9 gene overexpression in cancer patients that may be associated with survival outcomes. Finally, I reviewed how my structure of PRDM9 bound by 19

MRK-740 compares to other methyltransferase inhibitors and find that MRK-740 inhibits PRDM9 through a novel mechanism that may be specific to the inhibition of PR/SET domains.

1.3.4 Aim 2. Characterization of the PRDM3 and PRDM16 interaction with the NuRD complex.

Expression of the full-length and ΔPR/SET isoforms of PRDM3 and PRDM16 is tightly regulated during development and aberrant isoform expression is causally associated with oncogenesis. It remains unclear how loss of the full-length PRDM3 and PRDM16 isoforms affects genetic pathways relevant to cancers. By determining how full-length PRDM3 and PRDM16 proteins structurally interact with the NuRD chromatin remodeling complex via the Tryptophan- Aspartate 40 Repeat (WDR)-domain containing protein RBBP4, I demonstrated a potential novel epigenetic pathway that may become dysregulated by loss of the full-length isoforms in some cancers. Additionally, my experience working with a WDR-containing protein, enabled my participation in an ancillary project focusing on the discovery and characterization of binders to the WDR-domain contain proteins FBXW11 and BTRC.

1.3.5 Aim 3. Identification of potential cofactor interactions among PR/SET domains.

Lysine methyltransferase activity has been reported for some PRDMs, while others may be pseudoenzymes. The cofactor binding residues in human PR/SET domains are more diverse than the equivalent residues in canonical SET domains and many PR/SET domains lack the “catalytic tyrosine” associated with KMT activity. SAM is the universal cofactor for KMT enzymes, but it is unclear if all PR/SET domains can bind SAM or SAH. Here, I demonstrated how a fluorinate analog of SAH can be used to investigate the potential for catalytic function among PR/SET domains and I identified several PR/SET domains that are likely candidates for cofactor binding and potentially KMT activity.

20

Chapter 2

Characterization of the first PRDM9 methyltransferase inhibitor

2.1 Authorship attribution statement.

This chapter incorporates unpublished work and portions of a publication, in which I share co-first authorship with two other individuals [102]. To properly contextualize my data from the publication, I have included some Figures and data that are attributed to my coauthors and indicated this in the appropriate Figure legends. This includes in vitro enzyme activity assays performed by Drs. Abdellah Allali-Hassani and Fengling Li, cellular assays performed by Dr. Magdalena Szewczyk and molecular simulations performed by Dr. John Sanders. To accurately present their data, I have maintained their original Figure captions, but have reinterpreted their data in my own words in the Results section. Additionally, I have reproduced text from the publication that was written by me with editing by Drs. Matthieu Schapira, Cheryl Arrowsmith and Masoud Vedadi, as well as all my coauthors. I am responsible for the generation and interpretation of all the remaining data included in this chapter with the following mentions. Shili Duan provided guidance and technical support for the production of protein used for crystallography and NMR. The proteins used DSF and ITC assays were kindly provided by Taraneh Hajian and Elisa Gibson from laboratory of Dr. Masoud Vedadi. Dr. Scott Houliston advised and supported all NMR experiments, and Drs. Houliston and Cheryl Arrowsmith aided in data analysis and interpretation. Drs. Levon Halabelian and Mani Ravichandran advised protein crystallization. Dr. Halabelian acquired the synchrotron data and aided in structure determination. Dr. Matthieu Schapira advised the PDB survey and aided in interpretation of the protein structure. This project was conceptualized by group leaders from the SGC Toronto and Merck & Co. Drs. Cheryl Arrowsmith, Masoud Vedadi and Matthieu Schapira provided guidance throughout the study.

21

2.2 Introduction

Lysine methylation of histone proteins encode epigenetic regulatory information vital for eukaryotic life. Lysine methyltransferases (KMTs) catalyze the transfer of methyl groups from S- adenosylmethionine to the side chain amino group of specific lysine residues in proteins. The Su(var)3-9, E(z) and Trx (SET) domain protein fold is found in a vast majority of human KMTs [15]. The PRDI-BF1 and RIZ1 (PR) domain is a subgroup of the SET domain, with a conserved structural fold that shares a low sequence identity (20-30%) with canonical SET domains. Nineteen PR/SET domain containing (PRDM) genes are found in the , which encode epigenetic proteins with diverse roles in development, homeostasis and disease [reviewed in [30] [32]]. One PRDM family member that has garnered extensive attention for its importance in development and dysregulation in disease is PRDM9.

First identified in 2005, mouse PRDM9 (Prdm9) was originally called “meiosis-induced factor containing a PR/SET domain and zinc-finger motif” (Meisetz) and was described for its specific expression in testis, where it functioned as a writer of the histone H3 lysine 4 trimethylation (H3K4me3) mark and was believed to be imperative for meiotic progression [61]. In 2009, Prdm9 was identified as the gene responsible for male hybrid sterility due to spermatogenic failure in mouse crosses of Mus m. musculus and Mus m. domesticus [112]. A phylogenetic analysis detected orthologs of the PRDM9 gene from low complexity organisms such as sea anemone, nematodes and ray-finned fish, up to organisms of higher complexity such as monotremes (e.g. platypuses), metatherians (e.g. opossums) and eutherians (e.g. cows, dogs, mice and primates), with conspicuous absence in sauropsids (e.g. birds, lizards, snakes), amphibians and flies [62]. With broad distribution across metazoans, mechanistic insights came in 2010 when three groups independently identified PRDM9 as the positional determinant of meiotic recombination loci in humans, chimpanzees and mice [59] [58] [113]. Meiotic recombination promotes evolutionary diversification through the shuffling of parental genetic information and PRDM9 has since been identified as a positional determinant of meiotic recombination “hotspots” in humans and non-human primates, as well as rodents, ruminants and equids.” (reviewed in [114]). In concordance with a role in meiosis, mouse RNA-seq data sets indicate that Prdm9 is

22

expressed in adult testes and embryonic ovaries, when meiotic prophase occurs in males and females respectively [115] [116].

The typical domain arrangement of PRDM proteins consists of an N-terminal PR/SET domain followed by a variable array of C2H2 zinc finger (ZnF) motifs. The human PRDM9 protein possesses an N-terminal atypical Krüppel associated box (KRAB) domain, followed by a nuclear localization signal, a synovial sarcoma, X breakpoint repression domain (SSXRD) and a pre- PR/SET zinc-knuckle (ZnK) motif preceding the PR/SET domain, which is followed by one post- PR/SET ZnF motif and a distal array of 13 ZnF motifs at the C-terminus [117] [101]. Genomic localization of PRDM9 is dictated by ZnF-dependent recognition of speciated DNA sequences [118] [119]. The PR/SET domain catalyzes the mono-, di- and tri- methylation of lysine 4 and lysine 36 on histone H3 (H3K4me1/2/3 and H3K36me1/2/3), which has been reported in vitro and in vivo [61] [79] [80], while one additional study has also reported in vitro H3K9me1/2, H3K18me1 and H4K20me1/2 activity [81]. Elucidation of the human apoenzyme and mouse holoenzyme crystal structures revealed that the PRDM9 PR/SET domain undergoes a conformational change from the inactive, open state to a closed, active state that is folded over the histone tail substrate [101]. The N-terminal KRAB domain is common to a large group of ZnF- containing proteins and usually behaves as a transcriptional repressor through a KRAB-mediated interaction with Tripartite motif-containing 28 (TRIM28) scaffolding onto Heterochromatin Protein 1 (HP-1) (reviewed in [120] [121]). The SSX family of transcription factors possess an SSXRD that is accompanied by an atypical KRAB domain incapable of binding to TRIM28 [122]. In PRDM9, the atypical KRAB domain also lacks the ability to bind TRIM28, but functions as an interaction scaffold with CXXC1, PIH1D1, CHAF1A, CEP70, FKBP6, IFT88 and MCRS1 proteins, while the SSXRD has been shown to be necessary for methyltransferase activity in cells but not in vitro [123] [124] [79].

Functionally, PRDM9 positions meiotic recombination hotspots by epigenetically marking double strand DNA break (DSB) sites that enable DNA crossover events between homologous chromosomes [58] [59] [60] [61]. Together the PR/SET dependent H3K4me3 and H3K36me3 epigenetic signature and the KRAB-mediated protein-protein interactions are indispensable for 23

faithful positioning of meiotic recombination hotspots at PRDM9-bound genomic loci recognized by the C-terminal ZnFs [125] [123] [119] [124]. Methylation of H3K4 and H3K36 is typically recognized as a marker of open chromatin with active gene expression and although PRDM9 may localize to some promoters, genetic ablation of mouse Prdm9 did not result in changes to gene expression in mouse testes and a spermatogonial-derived cell line [126] [124]. Interestingly, meiotic recombination still occurs in Prdm9 knockout mice, but the hotspots shift from loci determined by Prdm9 to other sites marked by H3K4me3, such as active promoters [127] [125]. A comprehensive understanding of how infertility arises is lacking. Hybrid infertility likely results from unmatched double strand DNA break formation resulting from competition of speciated PRDM9 activities, while PRDM9 loss may result in defects in the timing, efficiency and/or regulation of double strand DNA break repair pathways (reviewed in [128]).

Given PRDM9’s essential role in meiotic recombination, it is no surprise that PRDM9 defects have been implicated in cases of human infertility, as well as several other genetic diseases. For instance, two single nucleotide polymorphisms (SNPs) causing missense mutations in the pre- PR/SET and PR/SET domains of PRDM9 have been identified in a small subset of infertile patients who have a rare form of azoospermia caused by meiotic arrest [129]. Furthermore, two additional SNPs causing missense mutations within the ZnF array of PRDM9 were also identified in patients with azoospermia but were absent in a control group of fertile males [130]. A phenomenon first reported in mice, the impact of Prdm9 loss is sexually dimorphic, with male sterility arising due to apoptosis of spermatocytes, while females remain fertile. This same dimorphic phenomenon occurs in humans, evident from the discovery of a fertile woman carrying homozygous loss-of- function PRDM9 mutations [131]. Dozens of PRDM9 allelic variants have been identified in humans, which distinct DNA recognition profiles caused by variations in the ZnFs [118] [132]. Although, some specific crosses of different subspecies of mice carrying different PRDM9 alleles can result in male sterility [112], humans with different PRDM9 alleles produce fertile offspring. Nonetheless, the presence of certain PRDM9 alleles has been associated with the occurrence of specific genetic diseases. For instance, while the PRDM9 A-allele is the most common allele found in populations with European and African ancestry, A-allele meiotic recombination hotspots frequently overlap with disease-associated breakpoints responsible for non-homologous

24

translocation events that cause X-linked ichthyosis, Charcot-Marie-Tooth disease, hereditary neuropathy with liability to pressure palsies, Hunter syndrome, Potocki-Lupski/Smith-Magenis syndrome and von Recklinghausen's disease [133] [134] [135]. Additionally, an increased occurrence of the PRDM9 C-allele was observed in European mothers of Trisomy 21 (Down syndrome) children, which is believed to be caused by meiotic nondisjunction of 21 due to fewer C-allele binding sites on the chromosome 21 q-arm [136].

Many PRDM family members act as either tumor suppressors or oncogenes (reviewed in [30]). The first association between PRDM9 expression and cancer was reported in a study of meiosis-specific cancer/testes antigens, which found PRDM9 gene and protein expression in lymphoma and leukaemia cell lines, as well as tissue samples from five ovarian carcinomas and one lung adenocarcinoma [64]. Next, two independent studies reported an association between the presence of parental PRDM9 C-alleles and children with aneuploid B-cell acute lymphoblastic leukemia, although small cohort sizes reduced the statistical power of the studies, increasing the probability of a false effect being observed [65] [66]. Several recent studies have found evidence of PRDM9 dysregulation from thousands of patient RNA-seq datasets derived from tumor and healthy tissue samples provided by The Cancer Genome Atlas (TCGA). An analysis of KMT genes in bladder cancer subjects from a TCGA study found that frequent copy number alterations, coding mutations, and elevated gene expression of PRDM9 may be causally linked to oncogenesis [137]. A pan-cancer genomic study using TCGA data identified PRDM9 as one of the most frequently mutated genes from the PRDM family with mutational frequency greater than 5% in multiple cancers types, such as DLBCL, HNSCC, endometrial, esophageal, stomach, and colon carcinomas, kidney and lung tumors, and melanoma [57]. While a second pan-cancer analysis of TCGA datasets demonstrated aberrant PRDM9 expression in cancers was strongly correlated with chromosomal structural variant breakpoints at PRDM9 recognition loci [63]. Taken together, aberrant PRDM9 expression has been hypothesized to orchestrate a mechanism underlying genomic instability that could induce oncogenesis or enable tumor genome evolution by promoting recombination initiation through DSBs.

PRDM9 is an essential molecular determinant of development and disease. Here we 25

describe MRK-740, the first small molecule chemical probe for PRDM9, along with its inactive control compound MRK-740-NC. We use a methodology combining biochemical and biophysical techniques to describe MRK-740 activity, specificity, and mechanism of action for PRDM9. We demonstrate that MRK-740 is a potent, PRDM family-selective and cell active chemical probe that inhibits PRDM9 methyltransferase activity through a unique substrate-competitive, cofactor dependent manner, and provide evidence to support a hypothesis that PRDM9 could be an important target in cancer.

26

2.3 Results

2.3.1 MRK-740 specifically inhibits PRDM9 methyltransferase activity.

Inhibition of PRDM9 methyltransferase activity was assessed in vitro and in cells for MRK-740 and its inactive negative control compound analog MRK-740-NC. MRK-740 is distinguished by the presence of a para-conjugated 2-methylpyridine that is replaced by a phenyl group in MRK-740-NC (Figure 6a). A tritium-based methyltransferase assay was employed to quantify the half maximal inhibitory concentration (IC50) of MRK-740 required to prevent in vitro methylation of a histone H3 peptide (residues 1-25) catalyzed by the PRDM9 PR/SET domain (residues 195-415). Methyltransferase reactions were performed in the presence of 3H-SAM across of range of inhibitor concentrations. After the reactions were quenched, incorporation of a tritium- labeled methyl group into purified H3 peptides was measured by a scintillation counter. The IC50 value of MRK-740 was determined to be 80 ± 16 nM (mean ± standard deviation), while MRK- 740-NC did not reach total inhibition of PRDM9 at concentrations up to 100 μM. (Figure 6b).

The cellular IC50 of MRK-740 was obtained using transgenic co-expression of flag-tagged, full- length PRDM9 and GFP-tagged histone H3 in human derived HEK294T cells. Transfected cells were incubated with various doses of compound for 20 hours, at which time the levels of GFP- tagged histone H3 and H3K4me3 were detected by western blot analysis. As expected, transgenic expression of wild-type PRDM9 induced trimethylation of GFP-tagged histone H3 on lysine 4 (H3K4me3), while a PRDM9 catalytic mutant (Y357S) did not produce detectible methylation.

(Figure 6c). The cellular IC50 value of MRK-740 was determined to be 0.8 ± 0.1 μM, while MRK- 740-NC did not inhibit H3K4 trimethylation at concentrations up to 10 μM. (Figure 6d). Taken together, this data indicates that MRK-740 is a potent and cell active inhibitor of PRDM9.

27

Figure 6. MRK-740 is a potent, cell active PRDM9 inhibitor. (a) Chemical structures of MRK- 740 and MRK-740-NC are presented. (b) The effect of (●) MRK-740 and (▲) MRK-740-NC on the methyltransferase activity of PRDM9 on H3 (1-25) peptide was evaluated. Data points are the average ± standard deviation from n = 3 experiments. (c) MRK-740 but not MRK-740-NC decreases PRDM9-dependent K4 trimethylation of exogenous histone H3. HEK293T cells were co-transfected with H3-GFP and PRDM9-FLAG and treated with compounds at indicated concentrations for 20 h. Mut denotes PRDM9 catalytic mutant (Y357S). (d) The graph represents non-linear fit of H3K4me3 fluorescence intensities normalized to intensities of GFP (MRK-740 n=10, 4 separate experiments, MRK-740-NC n=4, 2 separate experiments). This Figure is reproduced from [102]. Dr. Abdellah Allali-Hassani generated panel c and Dr. Magdalena Szewczyk generated panels c and d.

28

MRK-740 was discovered from a structure-activity-relationship (SAR) optimization of hit- compounds found from a library screen of 7500 small molecules that were screened for inhibitory effects against PRDM9 methyltransferase activity [102]. The compound library spanned a diverse chemical space that included molecules with similarities to known methyltransferase inhibitors. We assessed any potential off-target activity of MRK-740 or MRK-740-NC against other human methyltransferases using the tritium-based methyltransferase assay that was employed for PRDM9. PRDM7 is a closely related paralog of PRDM9 but possesses an attenuated capability for catalyzing the H3K4me3 mark in vitro when compared to PRDM9 (100-fold lower kcat) that is attributed to the Y357S amino acid difference [82]. Interestingly, we observed that the PRDM9 Y357S mutant is incapable of methylating histone H3 in cells (Figure 6c), suggesting PRDM7 methyltransferase activity may be negligible in vivo. The IC50 of MRK-740 required to prevent in vitro methylation of a histone H3 peptide (residues 1-25) catalyzed by the PRDM7 PR/SET domain (residues 195-392) was measured as described above for PRDM9. Interestingly, MRK-

740 was ~500-fold less potent at inhibiting PRDM7 (IC50 = 45 ± 7 μM) than PRDM9, while again MRK-740-NC did not reach total inhibition of PRDM7 at concentrations up to 100 μM (Figure 7a).

To assess other potential off-target activities, MRK-740 and MRK-740-NC were used in methyltransferase assays at 1 µM and 10 µM against a panel of 32 protein-lysine, protein-arginine, DNA and RNA methyltransferases. Each enzyme was assayed using protocols established at the Structural Genomics Consortium (SGC) in Toronto with 3H-SAM and substrate at concentrations close to the known Km values for each enzyme. Of all methyltransferases examined, only PRDM9 was inhibited at either MRK-740 concentration (Figure 7b), while MRK-740-NC did not produce a detectible inhibitory effect against any of the 33 enzymes (Figure 7c). Taken together, the in vitro enzyme activity assays indicated that MRK-740 selectively inhibited PRDM9 compared to PRDM7 and 32 other diverse methyltransferases.

29

Figure 7. MRK-740 selectively inhibits PRDM9 in vitro. (a) Inhibitory effect of (●) MRK-740 and (▲) MRK-740-NC on the methyltransferase activity of PRDM7 on H3 (1-25) peptide was evaluated. Data points are the average ± standard deviation from n = 3 experiments. Selectivity of (b) MRK-740 and (c) MRK-740-NC against 33 methyltransferases were tested in duplicate at 1 µM and 10 µM of compounds. This Figure is reproduced from [102]. Dr. Abdellah Allali-Hassani generated panel a and Dr. Fengling Li performed the assays related to panels b and c, while I generated the Figures. 30

Evidence of methyltransferase activity has only been reported for a subset of the PRDM family members (Table 1.) while only PRDM9 and PRDM7 have established methyltransferase activity assays at the SGC Toronto. Therefore, to assess any potential inhibition selectivity within the PRDM family, we tested for binding of MRK-740 and MRK-740-NC to the PR/SET domains of PRDM9 and PRDM7 and 11 other PRDM family members using differential scanning fluorimetry (DSF) thermal shift assays. This assay exploits the fluorescent change that the small molecule dye SYPRO Orange undergoes upon binding hydrophobic regions of proteins, which become exposed during thermal unfolding. In this qualitative assay, the melting temperature of PR/SET domains saturated with a molar excess of SAM were assessed in the presence and absence of MRK-740 and MRK-740-NC. Any increase in stability of the protein melting temperature

(∆Tm) greater than 2 °C was qualified as a confirmation of binding. Of all the PRDMs tested, only

PRDM9 and PRDM7 were stabilized by 250 μM MRK-740 with ∆Tm values greater than 2 °C

(Figure 8a), while 250 μM MRK-740-NC had no increase in the ∆Tm values of any proteins tested (Figure 8b). To confirm these findings, we investigated the effect of concentration-dependence for the increase in thermostability of PRDM9 and PRDM7, as well as PRDM8, which displayed a marginal increase in ∆Tm upon MRK-740 binding. Both PRDM9 and PRDM7 displayed a concentration-dependent increase in thermostability induced by MRK-740 that was not observed for PRDM8 (Figure 8 c-e).

31

Figure 8. MRK-740 binding among PRDM family. Binding and selectivity of (a) MRK-740 and (b) MRK-740-NC at 250 µM against PRDM family members in the presence of 2 mM SAM were tested by differential scanning fluorimetry. Only MRK-740 binding to PRDM9 and PRDM7 with º ∆Tm of higher than 2 C was observed. Dose response curves of (c) PRDM9, (d) PRDM7 and (e) PRDM8 upon the addition of MRK-740 in the presence of 2 mM SAM. Data points are the average ± standard deviation from n = 3 experiments. This Figure is reproduced from [102].

32

2.3.2 MRK-740 is a substrate-competitive, cofactor-dependent PRDM9 inhibitor.

To better understand the difference of methyltransferase inhibition observed for MRK-740 against PRDM9 and PRDM7, we measured the binding affinity for MRK-740 against the PRDM9 and PRDM7 PR/SET domains using isothermal titration calorimetry (ITC). In this assay, a buffered protein solution containing a 5.7-fold excess of SAM was titrated with an identically buffered solution of MRK-740 and SAM. Ligand binding to protein results in a change in enthalpy that was used to calculate the dissociation constant (KD) and interaction stoichiometry (n). The protein constructs of PRDM9 (195-415) and PRDM7 (195-392) that were used in the enzyme activity assays and DSF assays were first examined. The affinity of MRK-740 for PRDM9 in the presence of SAM was determined with a KD of 0.189 ± 0.024 µM (mean ± standard deviation) with an approximate 1 to 1 stoichiometric interaction (Figure 9a). The PR/SET domains of PRDM9 and PRDM7 share a high sequence identity with only three distinct residues (N289S, W312S and Y357S), prompting us to anticipate a negligible difference in binding affinity (Figure

9b). Interestingly, KD of MRK-740 for PRDM7 in the presence of SAM was determined to be 7.07 ± 0.06 µM, which was an approximately 37-fold weaker affinity compared to PRDM9 (Figure 9c).

To investigate if the catalytic tyrosine residue located in the active site of PRDM9 stabilizes the interaction with MRK-740, we assayed MRK-740 binding to a residue-swapped mutant of

PRDM7 (S357Y). The KD of MRK-740 to PRDM7 (S357Y) was determined to be 2.22 ± 0.18 µM, indicating that Y357 partially accounts for some of the observed difference in affinity (Figure 9d). The C-terminal residues of the PRDM9 (195-415) construct contain a C2H2 Zinc finger (ZnF) motif that is not naturally encoded in the PRDM7 protein (Figure 9b). To assess the contribution of this ZnF of PRDM9, we examined a truncated PRDM9 (195-385) construct for binding to

MRK-740. Remarkably, the PRDM9 (195-385) construct was bound by MRK-740 with a KD of 0.673 ± 0.014 µM, suggesting that the ZnF contributes to the complex formation (Figure 9e). Taken together, the weaker affinity of MRK-740 for PRDM7 compared to PRDM9 is jointly attributed to S357 and the lack of a C-terminal ZnF motif, which together partially account for the discrepancy of methyltransferase inhibition against each enzyme.

33

Figure 9. Assessing MRK-740 off-target binding. (a) Binding affinity of MRK-740 for PRDM9 was measured by isothermal titration calorimetry (ITC) in the presence of 200 µM SAM. (b) Amino acid alignment of the PRDM9 and PRDM7 PR/SET domains highlighting identical (grey) and different (red, green and purple) residues. Binding affinity of MRK-740 for (c) WT PRDM7, (d) swap mutant PRDM7-S357Y and (e) C-terminally truncated PRDM9 lacking the proximal ZnF were measured by ITC in the presence of 200 µM SAM. Representative ITC binding curves from duplicate experiments are shown, along with mean ± standard deviation KD and n values.

34

Almost all small molecule inhibitors of protein methyltransferases function by blocking the substrate or the cofactor from binding to the target enzyme [138]. We used 2-dimensional nuclear magnetic resonance (2D-NMR) spectroscopy to investigate the binding and the mechanism of inhibition of MRK-740 towards PRDM9. We purified three 15N-labeled PRDM9 PR/SET domain constructs (195-415, 195-385 and 195-368) expressed in E. coli cells and assessed the quality and comprehensiveness of their 2D spectra using transverse relaxation-optimized spectroscopy (TROSY). Of the three constructs, PRDM9 (195-385) produced the most informative 1H,15N-TROSY spectra and remained free from visible precipitation at room temperature at a necessary concentration of greater than 150 µM, unlike the other two PRDM9 constructs. To identify PRDM9 residue signals involved in substrate and cofactor binding, we performed a qualitative assessment of PRDM9 (195-385) spectra with increasing concentrations of histone H3 peptide (containing residues 1 to 11) and with SAM (Figure 10). We observed five unique chemical shift perturbations (CSPs) induced by the substrate histone H3 peptide (Figure 10a) and sixteen unique CSPs induced by SAM (Figure 10b), as well as 3 CSPs that were common to both ligands (indicated by black arrows). We observed that the CSPs of histone H3 peptide and SAM appeared to be in fast chemical exchange between the free and protein‐bound form of each ligand, which confirmed a relatively short lifetime of the protein-ligand complex. Taken together, these CSPs of PRDM9 provided a signature of the residues involved in histone H3 and SAM binding.

35

36

Figure 10. Binding of histone H3 peptide and SAM to PRDM9 by NMR. 1H,15N-TROSY spectra of the 15N-labeled PRDM9 PR/SET domain (residues 195-385) in the absence and presence of (a) histone H3 (residues 1-11) peptide and (b) SAM. Spectra were overlaid for 190 µM PRDM9 along with the indicated molar equivalents of histone H3 or SAM. Peak shifts indicating fast chemical exchange are indicated in coloured (a) yellow and (b) orange rectangles, while arrows indicate peak shifts common to H3 and SAM.

To investigate the binding of MRK-740, we collected 15N,1H-TROSY spectra of PRDM9 (195-385) with increasing concentrations of MRK-740 and compared any CSPs with those observed for histone H3 peptide and SAM. We observed seven CSPs induced by MRK-740 (Figure 11a), of which four CSPs were also detected upon SAM binding (green and orange boxes). This finding suggested that MRK-740 may inhibit PRDM9 by blocking the enzyme cofactor SAM from binding. To test this hypothesis, we titrated MRK-740 into a solution of SAM-saturated PRDM9 and expected to observe minimal CSPs due to the blocking of MRK-740 by the PRDM9- SAM interaction. In defiance of our expectations, the addition of MRK-740 to a SAM-saturated PRDM9 produced seventeen CSPs (Figure 11b), including four which were also detected upon SAM binding (magenta and orange boxes). Together, these results indicate that MRK-740 binding is enhanced by the presence of SAM and that perhaps MRK-740 may bind PRDM9 adjacent to the SAM binding site, which could be indicated by the overlap in CSPs. Notably in contrast to the fast exchange of the CSPs observed for the histone H3 peptide and SAM (Figure 10), we observed that many of the CSPs for MRK-740 binding to SAM-saturated PRDM9 were in slow chemical exchange (Figure 11b-e). This presence of slow chemical exchange provides evidence for a longer lifetime of a protein-ligand complex, as compared to ligands undergoing fast exchange. Therefore, this data suggested that MRK-740 may inhibit PRDM9 through binding in a cofactor-dependent mechanism and that the inhibitor bound complex exists for a longer duration than either the substrate or cofactor bound complexes.

37

Figure 11. MRK-740 mechanism of binding by NMR. (a) 1H,15N-TROSY spectra of the 15N- labeled PRDM9 PR/SET domain (residues 195-385) plus MRK-740 in the and in the presence of (b) SAM. Spectra were overlaid for 190 µM PRDM9 along with the indicated molar equivalents of MRK-740, as well as with 760 µM SAM plus MRK-740 at 0, 95, 190 and 380 µM. MRK-740 induced peak shifts are indicated in (a) green and (b) purple rectangles, while orange rectangles indicate peaks that shifted in the presence of SAM (Figure 10b). (c-e) Examples of slow chemical exchange from (b) are shown at indicated 1H ppm, where black, red, blue and burgundy indicate MRK-740 at 0, 95, 190 and 380 µM, respectively in the vertical dimension.

38

To further investigate the mechanism of inhibition of MRK-740 for PRDM9, we performed enzyme activity assays to determine IC50 values of MRK-740 across a gradient of concentrations of SAM and histone H3 (1-15) peptide substrates. We observed that the IC50 value of MRK-740 increased as the peptide concentration was increased, which indicated a substrate competitive pattern of PRDM9 inhibition (Figure 12a). Next, we observed a decrease in IC50 values as the SAM concentration was increased, consistent with a cofactor dependent (also known as cofactor uncompetitive) pattern of inhibition (Figure 12b). This data supports our previous findings which suggests that MRK-740 binding to PRDM9 is enhanced by the enzyme cofactor SAM and provides evidence that MRK-740 competes with the peptide substrate. All together, these findings strongly suggest that MRK-740 functions by a substrate-competitive, SAM-dependent mechanism of action.

39

Figure 12. MRK-740 mechanism of action by enzyme activity assay. Mechanism of action of MRK-740 was also evaluated by determining the IC50 values in the presence of (a) fixed concentration of SAM (350 µM) and varying concentrations of peptide as well as (b) fixed H3 (1- 25) peptide (20 μM) and varying SAM concentrations. Data points are the average ± standard deviation from n = 3 experiments. This Figure is reproduced from [102]. Dr. Abdellah Allali- Hassani generated this Figure.

40

To better understand the molecular interactions between PRDM9 and MRK-740, as well as the mechanism of inhibition, we determined the crystal structure of the PRDM9 catalytic domain in a ternary complex with SAM and MRK-740 (Table 2). Crystallization of the PR/SET domain of PRDM9 (residues 195–385) occurred at 18oC in the presence of a 5 molar excess of SAM and MRK-740. Small needle-like crystals grew to maximum size after 5-7 days and required a narrow-focus x-ray beam generated from the Advanced Photon Source (APS) synchrotron (beamline 24-ID-E) to produce diffraction images at 2.58 Å resolution. The structure was solved using molecular replacement with a search model obtained from a previously solved human PRDM9 structure (PDBID 4ijd) trimmed at residue Y357, which was chosen to account for any potential variability in the post-PR/SET residues. SAM was identified in a similar position as was observed for SAH in the structure of the mouse PRDM9 holoenzyme (PDBID 4c1q) and the remaining post-SET residues were built in manually. Clear electron density for MRK-740 was located adjacent to the SAM binding site and allowed unambiguous placement of the inhibitor molecule (Figure 13a-b). The structures were refined until the Rwork/Rfree values minimized at 0.209/0.259. Two biological assemblies of PRDM9 bound by SAM and MRK-740 were observed in the asymmetric unit, with a root-mean-square-deviation (RMSD) of 0.55 Å across all alpha carbon positions indicating highly similar structural configurations.

41

Figure 13. Crystal structure of SAM-bound PRDM9 in complex with MRK-740 inhibitor. Surface representation of PRDM9 PR/SET domain (white) bound by SAM (blue) and MRK-740 showing the (a) Fo – Fc difference electron density map and (b) the refined 2Fo – Fc electron density map of MRK-740 contoured at 2.5 σ (purple) and 1.0 σ (yellow), respectively. This Figure is reproduced from [102].

42

Table 2. Crystallographic data collection and refinement statistics of the PRDM9:SAM:MRK-740 complex. This table is reproduced from [102]. 6NM4

Data collection Space group P 21 21 21 Wavelength (Å) 0.97918 Cell dimensions a, b, c (Å) 38.03, 74.80, 141.44 a, b, g () 90, 90, 90 Resolution (Å) 39.88 – 2.58 (2.69 – 2.58) Rmerge (%) 0.13 (1.65) I / sI 10.7 (1.3) Completeness (%) 99.6 (99.4) Redundancy 7.3 (6.3)

Refinement Resolution (Å) 39.88 – 2.58 No. reflections 13297 Rwork / Rfree 0.2088/0.2588 Wilson B factor (Å2) 61.6 No. atoms Protein 2924 Ligand/ion 124 Water 23 Unidentified 3 B-factor (Å2) Average 63.1 Macromolecules 63.0 Ligands 65.8 R.m.s. deviations Bond lengths (Å) 0.0104 Bond angles () 1.471

Values in parentheses are for highest-resolution shell.

43

The crystal structure revealed MRK-740 was bound to PRDM9 adjacent to the enzyme cofactor SAM (Figure 14a). An analysis of the intermolecular contacts revealed a network of largely hydrophobic interactions between PRDM9 side chains and MRK-740 (Figure 14b). Additionally, the pyridine ring of MRK-740 forms a pi-stacking interaction with W356. The pyridine nitrogen is presumed to be protonated due to the basicity of the isolated 4-aminopyridine group (pKa = 9.2) and forms a long-range electrostatic interaction with the sidechain of Asp325 (Figure 14b). In combination with the methyl group installed on the pyridine and occupying a hydrophobic cavity, this polar contact likely explains the significant difference in PRDM9 inhibition activity observed for MRK-740 and MRK-740-NC, the latter of which contains a phenyl group which would be expected to interact less favorably with the protein Interestingly, significant pi-stacking interactions were observed between the adenosine moiety of SAM and the two central rings of MRK-740 (Figure 14b). Collectively, the extensive interaction surface between SAM and MRK-740 explains why we observed enhanced inhibitor binding in the presence of SAM.

We examined residues within the binding site to better understand the specificity of MRK- 740 for PRDM9 over PRDM7. We found that of the three residues differentiating the PRDM9 and PRDM7 PR/SET domains (N289S, W312S and Y357S), only one residue, Y357, came close to the inhibitor. We found that the carbon of one of the methoxy groups was 4.0 Å away from a tyrosyl carbon (Figure 14b). Although this distance is just outside the range typically used to identify hydrophobic contacts (i.e. aliphatic carbons < 4.0 Å apart) [139] [140], it is clear that any weak hydrophobic effect between PRDM9-Y357 and MRK-740 would be completely absent for PRDM7-S357. Additionally, N289 was observed to be 7.3 Å from MRK-740 and may contribute to the specificity of inhibition via second-shell factors such as the stabilization of residues directly interacting with the inhibitor.

44

Figure 14. Structural basis for MRK-740 inhibition of PRDM9. (a) Surface representation of PRDM9 PR/SET domain (white) bound by SAM (blue) and MRK-740. (b) Intermolecular interactions between MRK-740 and its binding pocket. Hydrophobic interactions (light blue), pi- pi interactions (yellow), CH-pi interactions (pink) and polar interactions (green) are represented by dashes. Structural alignment with the mouse holoenzyme (PDB accession code 4C1Q) showing steric clashes between MRK-740 and (c) the substrate lysine residue (green) and (d) the post-SET substrate recognition helix when it is in an active, substrate-bound conformation (orange). The corresponding helix in the inhibited, human structure is shown in (teal). This Figure is reproduced from [102].

45

To gain insight into the substrate competitive aspect of MRK-740 inhibition we compared our inhibitor-bound structure to that of mouse PRDM9 in complex with a H3K4me2 substrate peptide and S-adenosyl-homocysteine (SAH) (PDBID: 4c1q). The binding position of the substrate lysine is sterically occluded by both methoxy groups of the MRK-740 benzene ring (Figure 14c). Furthermore, while both PR/SET domains aligned with a small RMSD of 0.46 Å, there was a large difference in the conformation of the post-PR/SET substrate recognition helices. In the MRK-740-bound protein, the substrate recognition helix is flipped in the opposite direction, thereby preventing the enzymatically active conformation seen in the mouse structure (Figure 14d). Taken together, this structural evidence corroborates our biochemical results indicating a SAM-dependent, peptide-competitive mechanism of action for MRK-740.

46

Figure 15. MRK-740 binding is incompatible with SAH. Surface plasmon resonance binding curve (top) and sensorgram (bottom) of MRK-740 with PRDM9 measured in the presence of 2 mM (a) SAM and (b) SAH. (c) WaterMap hydration sites for the PRDM9/SAM/MRK-740 complex. The hydration sites observed with SAM are overall very similar to those observed with the SAH-based complex, with the notable exception that the three high-energy hydration sites identified in the PRDM9/SAH/MRK-740 complex are absent. (d) WaterMap hydration sites for the PRDM9/SAH/MRK-740 complex. Three hydration sites were identified in the pocket between PRDM9, SAH, and MRK-740 and are labeled with their ΔG values of 9.3, 6.9, and 4.3 kcal/mol. For reference, green-colored hydration sites have ΔG values closer to zero. This Figure is reproduced from [102]. Dr. Abdellah Allali-Hassani generated panels a and b, and Dr. John M. Sanders generated panels c and d.

47

While our crystal structure of PRDM9 bound by MRK-740 and the enzyme cofactor SAM clearly demonstrated the cofactor-dependence of inhibitor binding, it was not immediately obvious if the enzyme reaction by-product SAH could serve as an equivalent substitute. To address this question, we measured binding of MRK-740 to PRDM9 in the presence of either SAM or SAH using surface plasmon resonance (SPR). To control for the difference in affinity between SAM and SAH for PRDM9, we ran the experiments with 350 μM SAM, which is 5 fold greater than the reported Km value and 750 μM SAH, which is 5 fold greater than the IC50 [79]. Interestingly, while the KD value of MRK-740 in the presence of SAM was observed to be 132 nM (comparable to the ITC measured KD of 189 nM (Figure 9), a substantially weaker binding in the presence of

SAH observed with a measured KD of 112 µM (Figure 15a-b).

The striking decrease in affinity of MRK-740 in the presence of SAH compared to SAM prompted us to hypothesize that the SAM sulfonium methyl group may sterically occlude water molecules from forming unfavourable hydrogen bond interactions near the inhibitor binding site. To test this hypothesis, we performed WaterMap simulations to compare the position and thermodynamic properties of water molecules oriented around our structure of PRDM9 bound by MRK-740 and SAM against a hypothetical model structure of PRDM9 similarly bound by MRK- 740 and SAH. The WaterMap simulation of the SAM-bound complex did not result in any high- energy hydration sites located between the sulfonium group of SAM and MRK-740. (Figure 15c) In contrast, the WaterMap simulation of the SAH-bound complex produced three new hydration sites with unfavourable positive enthalpies (relative to bulk solvent) located in the void between the thioether of SAH and MRK-740. (Figure 15d). These simulations indicated that SAM sterically occludes energetically unfavorable water molecules that would otherwise destabilize MRK-740 from binding when in the presence of SAH, consistent with our experimental measurements.

48

Figure 16. MRK-740 binding is unique among methyltransferase inhibitors. (a) Atoms tallied at the interface (distance < 4Å) between SAM or SAH and small molecule inhibitors. Rossmann-type methyltransferases targeting nitrogen or oxygen are indicated in blue or orange, respectively. The tallied atoms (red) at the interface between SAM or SAH (teal) and small molecule inhibitors (pink) are shown for (b) PRDM9 with MRK-470, (c) SUV420H1 with 9ZY, (d) SETD7 with (R)-PFI-2-a, (e) SMYD2 with AZ506, (f) HNMT with Quinacrine, (g) PRMT5 with EPZ015666, and (h) COMT with a Mg2+ cation and 43J. The post-SET helices are indicated in purple. PDB accession codes: 6NM4, 5WBV, 4JLG, 5KJN, 1JQE, 4X61 and 4XUE, respectively. This Figure is reproduced from [102]. 49

Our crystal structure revealed that SAM forms an extensive structural component of the MRK-740 binding pocket, unlike any other methyltransferase inhibitor characterized by the SGC Toronto [138]. To evaluate the uniqueness of this mode of binding on a broader scale, we analyzed all the methyltransferase structures in the available in the PDB that had small-molecule inhibitors bound in close proximity to SAM or SAH (n = 97). For each structure we tallied the number of ligand atoms at the interface (distance < 4Å) with SAM or SAH, and the number of cofactor atoms at the interface with the ligand (Figure 16a). The interface between SAM and MRK-740 was by far the most extensive with 27 SAM atoms and 17 ligand atoms involved (Figure 16a-b). While the PR/SET domain of PRDM9 is similar to the canonical SET domain fold, the interaction between MRK-740 and SAM was a conspicuous outlier when compared to all other SET methyltransferase inhibitors. We observed that other SAM-dependent, peptide-competitive SET inhibitors typically bound within the substrate pocket and interacted primarily with the labile methyl group, as seen for SUV420H1, SETD7 and SMYD2 inhibitors (Figure 16c-e). In these cases, the adenosyl moiety of SAM that is accessible to MRK-740 is buried within the SAM binding pocket, formed in part, by the post-SET subdomain of canonical SET enzymes. In the crystal structures of SETD7 and SMYD2, the lysine mimetic pyrrolidine moiety of PFI-2 and AZ506 (respectively) makes van der Waals interactions with the departing methyl group of SAM, while the phenyl moiety of the ligand (9ZY; PDB ID: 5WBV) obstructs the methyl group of SAM in SUV420H1. Several Rossmann-fold methyltransferases such as HNMT, PRMT5 and COMT had more extensive ligand-cofactor interactions, although still less than PRDM9-MRK-740 (Figure 16f-h). Quinacrine bound to HNMT makes several van der Waals interactions with the ribose and thioester of SAH (Figure 16f), while the PRMT5 inhibitor EPZ015666 makes van der Waals interactions as well as a cation-π interaction with the departing methyl group on the positively charged sulfonium of SAM (Figure 16g). Taken together, our survey of methyltransferase inhibitors indicated that MRK-740 possess an unusual mechanism of action.

50

2.3.3 PRDM9 overexpression is a rare occurrence in diverse tumor types.

PRDM9 gene deregulation has been associated with infertility and cancer. We chose to explore PRDM9 expression cancer patient data available from The Cancer Genome Atlas (TCGA) to see if there may be a role for MRK-740 to study underlying disease mechanisms. We examined TCGA studies that reported more than 10 patient-matched adjacent healthy tissue control samples to control for tissue-specific differences in PRDM9 expression (Table 3). First, we compared the PRDM9 mRNA expression levels for all available tumor samples and all healthy tissue control samples. Upon visual inspection, we observed that low PRDM9 mRNA expression was typical of most healthy and tumor samples for all TCGA studies, however a noticeable group of outliers with high PRDM9 mRNA expression were observed in the tumor samples in all studies (Figure 17a). We performed unpaired, two-samples Wilcoxon rank sum tests to determine if there was a statistically significant differences between the PRDM9 mRNA expression levels in any of the TCGA studies. Eight of the 14 studies examined were statistically significant with p-values less than 0.05, but after we applied a Bonferroni correction for multiple hypothesis testing, we found that only three studies (LIHC, LUAD and LUSC) had statistically significant adjusted p-values (referred to as a q-value) (Table 4).

51

Table 3. The cancer genome atlas (TCGA) studies with >10 patient-matched healthy tissue control samples. Study Cancer type BLCA Bladder Urothelial Carcinoma BRCA Breast invasive carcinoma COAD Colon adenocarcinoma ESCA Esophageal carcinoma Head and Neck squamous cell HNSC carcinoma KICH Kidney Chromophobe KIRC Kidney renal clear cell carcinoma KIRP Kidney renal papillary cell carcinoma LIHC Liver hepatocellular carcinoma LUAD Lung adenocarcinoma LUSC Lung squamous cell carcinoma PRAD Prostate adenocarcinoma STAD Stomach adenocarcinoma THCA Thyroid carcinoma

Table 4. PRDM9 mRNA expression statistics in human tumors. The number (n) of tumor and healthy, matched control tissue samples assessed are indicated. p-values for unmatched patient samples were calculated using unpaired, two-sample Wilcoxon tests with two-sided alternative hypothesis. p-values for matched samples were calculated using single-sample Wilcoxon signed rank tests with two-sided alternative hypothesis. A Bonferroni correction for multiple hypothesis testing was applied to calculate q-values from the corresponding set of p-values. Data relates to Figure 17. n Unmatched Matched Tumor Control p q p q BLCA 408 19 0.028 0.40 7.8x10-3 0.11 BRCA 1103 109 0.020 0.28 0.019 0.27 COAD 302 26 0.090 1.0 0.25 1.0 ESCA 185 11 0.081 1.0 0.12 1.0 HNSC 523 43 0.022 0.31 6.1x10-5 8.5x10-4 KICH 66 25 0.18 1.0 0.022 0.32 KIRC 534 72 0.086 1.0 3.9x10-3 0.054 KIRP 291 32 0.026 0.36 0.016 0.23 LIHC 373 50 2.7x10-7 3.7x10-6 5.0x10-6 6.9x10-5 LUAD 518 58 1.9x10-4 2.6x10-3 2.4x10-4 3.4x10-3 LUSC 501 51 8.1x10-8 1.1x10-6 1.9x10-6 2.7x10-5 PRAD 498 52 0.12 1.0 0.035 0.49 STAD 418 32 6.1x10-3 0.085 4.9x10-3 0.068 THCA 509 59 0.58 1.0 0.12 1.0 52

To account for any potential expression differences between individual patients, we compared the difference in PRDM9 mRNA expression for the subset study participants that had both healthy tissue and tumor tissue data. Although there was roughly an order of magnitude fewer matched samples, we again observed a noticeable group of outliers with higher PRDM9 mRNA expression in the tumor samples compared to the matched tissue (Figure 17b). We performed single sample Wilcoxon signed rank tests with a Bonferroni correction for multiple hypothesis testing and again found significantly higher PRDM9 mRNA expression levels in studies participants from LIHC, LUAD and LUSC as well as HNSC (Table 4). Taken together, statistically significant elevations of PRDM9 mRNA expression were observed in four of the fourteen TCGA studies, while outliers with abnormally high PRDM9 expression were detected across all TCGA studies, indicating that rare aberrant PRDM9 expression is independent of tumor type.

53

Figure 17. PRDM9 is aberrantly expressed in a variety of cancers. Boxplots showing PRDM9 mRNA expression data from the TCGA from (a) all tumor samples and healthy tissue and patient matched control samples and (b) only patients with matched samples. Circles represent outliers with expression values beyond 1.5 fold of the interquartile range. Statistical significance is indicated as *, q<0.01; **, q<0.001; ***, q<0.0001; ****q, <0.00001, with values available in Table 4.

54

Since we observed aberrant PRDM9 mRNA expression in a subset of patient outliers across all TCGA studies, we tested the hypothesis that the detection of PRDM9 mRNA expression could stratify patient survival outcomes. We performed a Kaplan-Meier analysis to assess if there were significant differences in unadjusted survival probabilities between TCGA subjects whose tumors had detectable levels of PRDM9 mRNA against those who lacked detectable PRDM9 (Figure 18). We used a Cox proportional hazards regression model to calculate the difference in total survival probability between the PRDM9 detected and PRDM9 null subjects, along with a log-rank test to detect statistical significance. Detected PRDM9 expression in tumors from 5 of the 14 TCGA studies was associated with significantly lower survival probabilities (hazard ratio > 1 and p-value < 0.05), however a Bonferroni correction for multiple hypothesis testing eliminated the detected statistical significance (q > 0.05) (Table 5). This data suggested that aberrant PRDM9 mRNA expression may not independently stratify patient survival outcomes, although a trend toward decreased survival probability was observed in HNSC, KICH, KIRC, LUSC and STAD. Notably, tumors from the HNSC and LUSC studies were also observed to express PRDM9 mRNA at significantly higher levels, with a trend towards decreased survival probabilities.

55

Figure 18. PRDM9 expression does not stratify patient survival outcomes. Kaplan-Meier curves showing survival probability analysis for patients from the indicated TCGA studies. Patients were stratified based on those with no PRDM9 expression detected (PRDM9 null) and those with PRDM9 expression RSEM≥1 (PRDM9 detected). Vertical lines indicate censored data for subjects with missing event indicator after date of last follow-up and 95% confidence intervals for point estimates of survival curves are displayed. Statistical data is shown in Table 5.

56

Table 5. PRDM9 expression does not stratify patient survival outcomes. The number of patients with PRDM9 expression absent (RSEM<1) and detect (RSEM≥1) are indicated. A Cox proportional-hazards regression model was used to assess survival probability, where an HR > 1 indicates an increased risk of death. Survival curves were tested for significance using the log-rank test to calculate p-values, and a Bonferroni correction for multiple hypothesis testing was applied to calculate q-values from the corresponding set of p-values.

Absent Detected HR p q BLCA 232 173 0.914 0.55 1.0 BRCA 984 96 0.996 0.99 1.0 COAD 262 18 2.17 0.071 1.0 ESCA 117 56 0.866 0.59 1.0 HNSC 390 125 1.40 0.031 0.46 KICH 49 17 4.45 0.021 0.31 KIRC 416 113 1.44 0.033 0.50 KIRP 221 68 1.23 0.54 1.0 LIHC 195 174 1.14 0.46 1.0 LUAD 398 114 1.11 0.55 1.0 LUSC 302 194 1.35 0.029 0.43 PRAD 454 29 1.03x10-8 1.0 1.0 STAD 303 110 1.43 0.031 0.47 THCA 478 18 3.85x10-8 1.0 1.0

57

2.4 Discussion

2.4.1 On the specificity of inhibition.

In this chapter I described the characterization of MRK-740, a potent, selective, and cell active chemical probe for PRDM9, along with its negative control analog MRK-740-NC. While small molecule chemical probes and drugs are available to target SET-domain protein lysine methyltransferases and Rossmann-fold arginine methyltransferases, MRK-740 is the first inhibitor that specifically targets any of the 19 human PRDM proteins. Compounds leading to MRK-740 were initially discovered from a diverse library of 7500 Merck & Co., Inc. compounds that shared structural similarities to other known methyltransferase inhibitors. Hits which inhibited PRDM9 methyltransferase activity were used to generate compound analogs until MRK-740 was found with a potent inhibitor effect against PRDM9. We demonstrated that MRK-740 selectively inhibits

PRDM9 (IC50 = 80 nM) compared to 33 other methyltransferase enzymes, with marginal off-target inhibition detected against the closely related PRDM7 (IC50 = 45 μM).

We examined our crystal structure depicting MRK-740 and SAM bound to PRDM9 to understand how MRK-740 may have such contrasting inhibitory effects against the PRDM9 and PRDM7 PR/SET domains, which only differ at N289S, W312S and Y357S, respectively. We found that only one of the three residues differentiating the PRDM9 and PRDM7 PR/SET domains came in close proximity to the inhibitor. We determined that the sidechain of Y357 orients 4.0 Å away from MRK-740 and this may provide a very weak hydrophobic contact that stabilizes the inhibitor binding. Additionally, Y357 may contribute via a secondary-shell interaction wherein Y357 forms a pi-stack that stabilizes W356 which in turn makes a pi-stack interaction with MRK- 740 (Figure 14b). We compared the binding affinity of MRK-740 towards PRDM9, PRDM7 and a PRDM7-S357Y substitution mutant and found that S357Y partially enhances to the affinity of the interaction. Surprisingly, we also observed that the ZnF proximal to the PRDM9 PR/SET domain contributes to MRK-740 binding affinity, perhaps by favouring the open and inactive confirmation of the PR/SET domain that becomes trapped upon inhibitor binding. Nevertheless,

58

the construct that crystalized with SAM and the inhibitor bound lacked this ZnF. In a previous analysis of PRDM7 enzyme kinetics, Blazer, et al. reported that PRDM7 is a substantially less productive methyltransferase enzyme compared to PRDM9 [82]. The PR/SET domain of PRDM7 was found to possess a ~100-fold smaller apparent rate of catalysis (kcat) for writing the H3K4me3 mark and was absent of the ability to methylate H3K36me3, which was primarily attributed to the S357Y residue difference [82]. They also observed that the apparent Michaelis constant (Km) for SAM with WT PRDM7 (Km = 900 μM) was much larger than with the S357Y mutant enzyme (Km = 14 μM), suggesting the affinity of SAM towards PRDM7 is weaker than SAM towards PRDM9 due to the S357Y difference. Interestingly, when PRDM9 adopts a closed, catalytically active conformation, the hydroxyl group of Y357 approaches the cofactor to within 4.0 Å [101], which may promote SAM binding. This, along with our lines of evidence that MRK-740 binding to PRDM9 is SAM-dependent, suggests that the weaker PRDM7 inhibition may arise from the weaker affinity of both MRK-740 and SAM, leading to a less stable trimeric complex.

Another noteworthy difference between PRDM9 and PRDM7 is that while PRDM7 does not possess methyltransferase activity for H3K36 in cells, PRDM9 is a confirmed H3K36 methyltransferase [82] [79] [80]. Here, we did not examine MRK-740 inhibition for PRDM9’s H3K36 methyltransferase activity, but based on our data, we do not see any reason to expect a different effect compared to H3K4 methyltransferase inhibition. We recommend H3K36 methylation levels be assessed in future studies of MRK-740 function in cells and in vitro.

2.4.2 An unusual mechanism of action.

Protein methyltransferase inhibition can occur by competitively obstructing access to the substrate or cofactor binding sites. Here we provided structural, biophysical and enzymological lines of evidence that indicated MRK-740 inhibits PRDM9 in a substrate-competitive, cofactor- dependent mechanism of action (MOA). Furthermore, we reported that this MOA is unique among protein methyltransferase inhibitors. Visible from our crystal structure, SAM forms a portion of the MRK-740 binding site on PRDM9, where MRK-740 structurally occludes the substrate lysine

59

from binding, together supporting the findings of our enzyme activity assays which demonstrated a substrate-competitive, cofactor-dependent MOA.

During the initial stages of study, we concurrently pursued a structural characterization of MRK-740 bound to PRDM9 by NMR. Although we were able to determine the crystal structure, our preliminary NMR assays also established cofactor-dependent MOA for MRK-740. Furthermore, our detection of slow chemical exchange for MRK-740, evident from the doublet of NMR signals upon inhibitor binding, suggested that lifetime of the inhibitor-bound complex, outlives both the substrate and cofactor-bound complexes, which we observed to be in fast chemical exchange. This suggests that the inhibited PRDM9 complex may also sequester SAM in the cell. Therefore, utilization of MRK-740 in cell types with high PRDM9 expression such as those undergoing meiosis could impact the local concentration of SAM and its balance with SAH, which may indirectly affect proximal methyltransferase reactions.

As a first-in-class PRDM inhibitor, MRK-740 binds to PRDM9 in a SAM-dependent manner with an unusually large SAM interaction interface, exploiting the aromatic system of the adenosyl group of SAM. This is a novel mode of binding compared to other SET and non-SET methyltransferase inhibitors and relies on the displacement of the post-SET region from its active conformation. The highly variable post-SET regions of both SET and PR/SET domains can interact with the adenosyl portion of SAM and contribute to the formation of the catalytic active site. This structural diversity of the post-SET regions of SET and PR/SET proteins suggests that the PRDM proteins may be more amenable to this type of strong SAM-dependent inhibition. Understanding how these PR/SET domains bind SAM would help to explain the specificity of MRK-740 among the PR/SET domains and may contribute to the development of additional PRDM inhibitors.

The development of SAM-competitive protein methyltransferase inhibitors is challenged with finding cell-penetrant molecules that also bind in the polar SAM-binding pocket [138] [25]. In contrast, substrate-competitive inhibitors are employed to target the chemically and structurally

60

diverse substrate interfaces, which can also depend on interactions with either SAM or SAH. The extensive ligand-SAM interaction surface seen in PRDM9-MRK-740 lies at one extreme of a continuum of degrees of interaction (Figure 16). While substrate competitive inhibitors such as (R)-PFI-2-a for SETD7 or EPZ015666 for PRMT5 cannot bind to their target in the absence of a cofactor, their relatively smaller cofactor interfaces exemplify this underexplored avenue for rational design of chemical probes targeting methyltransferases [141] [142]. These findings indicate that the rational design of substrate-competitive small molecule inhibitors could be enhanced by considering the structural interface provided by the bound co-factor, as well as an understanding of subclass-relevant structural dynamics. Furthermore, our finding that MRK-740 binding is incompatible with SAH highlights the importance of considering the different potential contributions when including the enzyme cofactor or by-product during rational design and compound screening.

2.4.3 On PRDM9 in cancer.

Defects in the PRDM9 gene have recently been implicated in cancers, prompting the need for tool compounds to investigate these disease mechanisms. While small scale studies have implicated PRDM9 in specific blood cancers and carcinomas [64] [65] [66], several large scale analyses examining the TCGA datasets have found that aberrant PRDM9 expression occurs in every cancer type examined [63] [57] [137]. Indeed, our comparison of PRDM9 expression TCGA tumor samples and match control tissues found that rare aberrant PRDM9 expression occurs in all of the 14 cancer types examined (Figure 17). Furthermore, our analysis indicated high PRDM9 expression occurs in a significant proportion of head and neck squamous cell carcinoma (HNSC) and lung tumors (LUAD and LUSC), in agreement with previous reports [63] [57]. In our analysis of 14 cancer studies, we attempted to control for potential false positive results due to multiple hypothesis testing by employing a Bonferroni correction to our initial tests for statistical significance. One of our initial tests found significantly higher PRDM9 expression (p-value = 7.8x10-3) in bladder tumors (BLCA) compared to control tissue, in agreement with a previous

61

study that only examined the BLCA dataset [137]. However, after our Bonferroni correction we found that this difference was no longer significant (q-value = 0.11).

We tested to see if the presence of PRDM9 expression in tumors could stratify the survival duration of patients in the 14 TCGA studies (Figure 18). We did not observe any statistically significant differences in any cancer types after the Bonferroni correction but observed a trend toward decreased survival probability in 5 cancer types (HNSC, KICH, KIRC, LUSC and STAD), of which 2 were also found to express PRDM9 mRNA at significantly higher levels (HNSC and LUSC). It is important to note that while a correction for multiple hypothesis testing can help to avoid false positive results, it can still allow for false negatives. Furthermore, our survival analysis only assessed the unadjusted survival probabilities and did not consider potential confounders like age and sex. Therefore, given the evidence provided from our survival analysis, and the overlap of our expression analysis with reports from other groups [63] [57], we propose further investigation of PRDM9 expression in head and neck squamous cell carcinoma (HNSC) and lung squamous cell carcinoma (LUSC).

To investigate PRDM9 inhibition in cancer, MRK-740 and MRK-740-NC compared for effects on the proliferation in a PRDM9-expressing multiple myeloma cell line and two breast cancer cell lines that were matched to PRDM9-null cell lines of the same cancer type [102]. In these cell lines, no differences in proliferation after 6 days of treatment we observed between MRK-740 and MRK-740-NC treatments [102]. Whether PRDM9 is an essential survival factor in some specific cancer types remans undetermined. Genomic evidence suggests that the consequence of aberrant PRDM9 expression in tumors is increased genomic instability at PRDM9 binding loci [63], but a thorough understanding of a molecular mechanism is lacking. Some interesting avenues of future investigation would be to assess MRK-740 in PRDM9 expressing cancer cells in conjunction with inducers of DNA breaks, such as ionizing radiation or platinum- based drugs (e.g. cisplatin). Additionally, it is possible that PRDM9 inhibition in conjunction with inhibitors of recombination machinery may result in synthetic lethality due to a simultaneous dysregulation of recombination position and repair pathways

62

2.5 Methods

2.5.1 Protein production

PRDM9 (195-385) with an N-terminal His-tag and TEV protease site was expressed in Escherichia coli BL21 (DE3) codon plus cells from a pET28-MHL vector. For NMR experiments, cell were grown at 37°C in M9 minimal medium supplemented with 1.0 g/l 15N-ammonium chloride in the presence of 50 µg/ml of kanamycin and 50 µM ZnSO4 to an OD600 of 0.8 and induced by isopropyl-1-thio-D-galactopyranoside (IPTG), final concentration 0.5 mM and incubated overnight at 15°C. For all other experiments, cells were grown in Luria broth (LB) media with the same conditions. Cells were harvested by centrifugation at 7,000 rpm and cell pellets were stored at -80°C. Cell were lyzed on ice in lysis buffer (20 mM Tris pH 7.5, 300 mM NaCl, 10 mM imidazole, 50 µM ZnSO4, 5 mM β-mercaptoethanol, 0.5 mM TCEP, 0.1% Triton-X 100, 2.5% glycerol, 1mM PMSF, 2 mM benzamidine and Roche complete EDTA-free protease inhibitor cocktail tablet) using a probe sonicator. The crude extract was cleared by ultracentrifugation for 50 min at 50000xg and the supernatant was incubated with TALON Metal Affinity Resin (Takara) at 4oC for 120 minutes with agitation. Resin was washed in wash buffer (lysis buffer with 0.01% Triton-X 100 and lacking protease inhibitors) and bound proteins were eluted using elution buffer (wash buffer with 400 mM imidazole), monitored by Bradford analysis. Protein was dialyzed overnight at 4°C in TEV dialysis buffer (20 mM Tris pH 7.5, 300 mM NaCl, 0.5 mM CHAPS,

20µM ZnCL2, 2.5% glycerol, 5 mM β-mercaptoethanol and 0.5 mM TCEP) and incubated with His-tagged TEV protease produced inhouse at a 1/20 dilution by mass. TEV and uncleaved proteins were removed using TALON resin and soluble protein was loaded onto a Superdex 75 Increase 10/300 GL colum (GE Healthcare), equilibrated with 20 mM Tris-HCl buffer, pH 7.5, and 150 mM NaCl, 2.5% glycerol, 5 mM β-mercaptoethanol and 0.5 mM TCEP at flow rate 0.7 ml/min. Fractions containing PRDM9 protein were pooled and dialyzed into low salt buffer (20 mM Tris-HCl pH 8.5, 100 mM NaCl, 5% glycerol, 5 mM β-mercaptoethanol and 0.5 mM TCEP). PRDM9 protein was further purified by anion-exchange chromatography on 2 tandemly joined 5 ml HiTrap™ DEAE FF (GE Healthcare) columns along a linear gradient up to 1.0 M NaCl.

63

Purified PRDM9 protein was dialyzed into a final buffer containing 20 mM Tris pH 7.5, 150 mM NaCl and 2 mM TCEP. The proteins for PRDM1 (38-223), PRDM2 (2-160), PRDM3 (1-231), PRDM6 (194-405), PRDM7 (195-392), PRDM7-S357Y (195-392), PRDM8 (2-165), PRDM9 (195-415), PRDM10 (188-339), PRDM11 (79-314), PRDM12 (60-229), PRDM13 (2-187), PRDM14 (212-422), and PRDM16 (1-475) were purified from E. coli and kindly provided by Taraneh Hajian and Elisa Gibson from laboratory of Dr. Masoud Vedadi from the SGC Toronto.

2.5.2 DSF assays

PRDM selectivity experiments were carried out by determining the effect of 250 µM of MRK-740 on the thermal stability of PRDM9 and other PRDMs. DSF measurements were performed as previously described [143] using a Light Cycler 480 II instrument from Roche Applied Science. All proteins tested in these selectivity experiments were used at final concentration of 0.02 mg.mL-1 in a 0.1 M HEPES, pH 7.5 and 150 mM NaCl buffer in the presence of 2 mM SAM. Sypro Orange was purchased from Invitrogen as a 5,000× stock solution and was diluted 1:1,000 to yield a 5× working concentration. The temperature scan curves were fitted to a

Boltzmann sigmoid function, and the Tm values were obtained from the midpoint of the transition. Titration experiments by DSF were performed by varying MRK-740 concentrations from 500 nM to 500 µM in the presence or absence of 2 mM SAM.

2.5.3 Isothermal titration calorimetry (ITC)

PRDM9 (195-415), PRDM9 (195-385), PRDM7 (195-392) and PRDM7-S357Y (195-392) were each dialyzed for 16 hours at 4oC in buffer (20 mM Tris pH 7.5, 150 mM NaCl and 2 mM TCEP) containing 200 μM SAM. MKR-740 was diluted from DMSO in the same buffer containing SAM, DMSO concentration was equilibrated in the buffered protein solution and pH values of compound and protein solutions were assessed to be within 0.1 pH. ITC was performed with 35 μM protein titrating 350 u μM M MRK-740. A binding curved was generated from 24 x 2 μl

64

titrations with an initial 0.5 μl injection that was omitted from the fitting analysis. The reported KD and n values are based on the average from three experiments and are accompanied by the standard deviation of the measurements. The data were acquired on a Nano ITC from TA Instruments at 25 °C and fitted with an independent-binding site model using NanoAnalyze software (v3.7.0).

2.5.4 NMR spectroscopy

NMR spectra were collected at 25°C on a Bruker Avance spectrometer operating at 800 MHz, and equipped with a TCI cryoprobe. PRDM9 (195-385) protein buffered in 20 mM Tris pH

7.3,150 mM NaCl, 2 mM TCEP and 20 µM ZnCl2 containing 6.25% D2O. TROSY titrations were performed similarly to previously described experiments [144], with 190 μM 15N-labeled PRDM9 in the presence of specified molar equivalents of each titrant (histone H3 (1-11) peptide, SAM and MRK-740), as indicated in the respective Figures. 1H,15N-TROSY spectra were collected at 298 K in 3 mm sample tubes. Each spectrum was acquired from 8 scans and an acquisition time of 42.6 ms and 21.9 ms, for the 1H and 15N dimensions, respectively. Processing and analysis were carried out with Topspin 3.5 (Bruker BioSpin).

2.5.5 Crystallization and structure determination

Purified PRDM9 (residues 195-385) was concentrated to 10.9 mg.mL-1 and incubated for 1 hour at 4 °C with 2.5 mM SAM and 2.5 mM MRK-740. Ligand-bound protein crystals were obtained from a 1:1 mixture of protein and crystallization buffer after 3 days by sitting-drop vapor- diffusion method at 18 °C using 0.1 M BisTris pH 6.0, 0.2 M NH4OAc and 24.5% PEG3350 as the crystallization buffer. Single crystals were flash frozen in liquid nitrogen by harvesting them in a cryoprotectant solution using crystallization buffer supplemented with 20% (v/v) glycerol. The data set was collected at the Advanced Photon Source (APS) at Argonne National Laboratory (in Argonne, Illinois, USA) using beamline 24-ID-E with a wavelength of 0.979180 Å from an ADSC Quantum 315 CCD detector at 100 K. Data was processed and scaled with XDS [145] and Aimless [146]. The structure was solved by molecular replacement using a trimmed structure of the human PRDM9 apo-enzyme (PDB accession code 4IJD, chain A, residues 201 - 354) as the

65

search model with the program PhaserMR [147]. The stereochemical restraints for MRK-740 were generated using the program JLigand v1.0.4036. The structural models were refined using REFMAC5 [148] and manually checked with COOT [149]. The Ramachandran values were 96.5%, 3.0% and 0.5% for favored, allowed and outliers, respectively. Images were generated using PyMOL (The PyMOL Molecular Graphics System, v2.2.0, Schrödinger, LLC.). Data collection and refinement statistics are shown in Table 2.

2.5.6 Methyltransferase inhibitor survey

To quantify the interaction between small-molecule inhibitors and SAM or SAH, atoms at their interface (distance < 4Å) were tallied. Briefly, all SAM or SAH containing protein structures (n = 1311) were analyzed from the Protein Data Bank (date of acquisition: 2019-06-01). An in- house Python3 script was used in PyMOL (The PyMOL Molecular Graphics System, v2.2.0, Schrödinger, LLC.) to model hydrogens onto all structures and generate tallies of any ligand atom within 4.0 Å of SAM or SAH and vice versa. Next, ligands lacking ring structures or identified as ions or crystallization buffer components were removed and the resulting list of ligands was manually curated to identify the 97 unique inhibitor–SAM/SAH pairs.

2.5.7 TCGA data analysis

TCGA datasets for clinical information and PRDM9 transcript expressions were obtained from the FireBrowse resource (http://firebrowse.org/), for all patients with matched healthy and tumor tissue samples. Analysis was performed using the R programming language (version 4.0) [37]. PRDM9 gene expression was analyzed as RSEM normalized counts of mRNA and RSEM values plus 1 were used when logarithmic transformations were applied. Tests for statistical significance of expression differences were performed using the exactRankTests (v0.8-31) and stats (v4.0.0) packages. Survival analyses and significant testing were performed using the survival (v3.2-3), survminer (v0.4.8) and stats (v4.0.0) packages. All plots were generated using the ggplot2 (v3.3.2) and survminer (v0.4.8) packages.

66

Chapter 3

Methyltransferase inhibitors that exploit the bound cofactor.

3.1 Authorship attribution statement

This chapter contains an excerpt from a review article that I coauthored with Drs. Renato Ferreira de Freitas and Matthieu Schapira [150]. Dr. Ferreira de Freitas and I equally contributed to this review article and wrote separate, complementary sections. Dr. Schapira developed the concept of this review article and provided guidance throughout the analysis and writing process. In this chapter, the section called “Neurotransmitter methylation” was omitted from the original publication, which was exclusively focused on protein methyltransferase inhibitors. I have also written “Introduction” and “Conclusion” sections to provide the reader with context for my section of the review article.

3.2 Introduction

Methyltransferases are enzymes that catalyze the transfer of a methyl (CH3) group from S- adenosylmethionine (SAM) to a substrate molecule. Methyltransferases bind to the polar reaction cofactor SAM and possess weaker affinity for the less polar reaction by-product, S-adenosyl-L- homocysteine (SAH) [151]. In humans, protein methylation is catalyzed by two classes of enzymes that possess either a Rossmann-fold or SET-fold structure. The Rossmann-fold is structurally described as an extended seven-β-strand fold, where the N-terminal β-strand folds into the middle of a β-sheet forming the SAM-binding interface directly adjacent to the substrate-binding region that is formed at the C-terminus [152] [153]. The SET-fold is comprised of small, distinct β sheets that are threaded by a C-terminal 'pseudoknot' structure that forms the adjacent cofactor and substrate binding sites linked by a narrow reaction channel [154]. Two common strategies 67

employed by protein methyltransferase (PMT) inhibitors are to complete for one of the adjacent cofactor or substrate binding sites. Cofactor-competitive inhibitors block an enzyme’s cofactor from binding by occupying the cofactor binding site. Cofactor-competitive PMT inhibitors have shown clinical success in cancer trials targeting enzymes such as DOT1L and EZH2 [155] [26], however the design and development of cofactor-competitive inhibitors with enzyme specificity can be hindered by the structural conservation of the cofactor binding pocket that is shared across different classes of methyltransferases. Additionally, the high intracellular concentration and chemically polar nature of SAM constitutes a significant challenge for the design of cell-permeable and selective SAM-competitive inhibitory molecules [25]. In contrast, substrate-competitive inhibitors, which block the substrate binding site, target more chemically diverse regions on the target protein that are specific to the various methyltransferase substrates. In some cases, the presence of SAM or SAH has been shown to enhance the binding of substrate-competitive inhibitors. This phenomenon is referred to as cofactor-dependent inhibition (also known as cofactor uncompetitive inhibition). Understanding the structural bases of this phenomenon may inform the optimization of methyltransferase inhibitors through structure-guided, rational design. To accomplish this objective, we surveyed the Protein Data Bank (PDB) and analyzed all methyltransferase protein structures bound by SAM or SAH in close proximity to a substrate- competitive inhibitor. Here, we summarized the various cofactor-inhibitor interactions and reviewed the accompanying publications to reveal insights related to the known mechanisms of substrate-competitive, cofactor-dependent methyltransferase inhibition.

68

3.3 Results

3.3.1 Exploiting the bound cofactor

PMT substrate binding pockets are more structurally diverse and less polar than the SAM- binding pockets, and several chemical probes and compounds in clinical trials compete with the substrate rather than SAM. Interestingly, the presence of SAM or SAH can be required for substrate competitive inhibitors to bind their target enzymes, either via direct interactions between the cofactor and the inhibitor, or via cofactor-dependent allosteric stabilization of the substrate binding pocket. This phenomenon is clarified by examples of the various substrate recognition mechanisms used by methyltransferase enzymes. For instance, cofactor binding can impose structural rigidity within the disordered substrate binding pocket of Rossmann proteins as observed in the x-ray crystal structures of the SAM-bound and apo forms of Protein arginine N- methyltransferase 4 (PRMT4), which revealed a disorder-to-order transition of the substrate binding site induced by SAM [156] [157] (Figure 19A). The post-SET domain of lysine methyltransferases can also require cofactor binding to properly fold and generate a substrate binding groove lined by the I-SET and post-SET domains [25] [154]. For example, comparing the apo and SAM-bound forms of SETD7 suggests the cofactor can stabilize residues along the substrate binding pocket [158] [159] (Figure 19B). Moreover, a comparison of the human apo- PRDM9 and mouse holo-PRDM9 complexes clearly demonstrated the structural contributions of SAH to the catalytically competent conformation of PRDM9 [101] (Figure 19C).

69

Figure 19. Cofactor binding induces structural rearrangements in methyltransferase domains. Cartoon representations highlighting the mobile structural elements of cofactor-bound (top, blue) and unbound (bottom, purple) conformations for (A) PRMT4, (B) SETD7 and (C) PRDM9. PDB IDs are 3B3F, 3B3J, 1N6C, 1H3I, 4C1Q and 4IJD, respectively. Adapted from [150].

70

To examine the role of the methyltransferase cofactor for binding of substrate competitive inhibitors, we conducted a survey of all 102 methyltransferase structures where a small molecule inhibitor bound within <4 Å of either SAM or SAH. To examine cofactor contributions to inhibitor binding, we measured the contacting surface areas of each inhibitor with SAM/SAH and with the target protein and calculated the percentage of the interface formed with the cofactor (Figure 20). Cofactor-inhibitor interfaces can represent up to 20% of the total contact area of the inhibitor in Rossmann fold enzymes, while this percentage is generally lower in SET-domain PMTs, with the exception of PRMD9, where 24% of the inhibitor interface is with SAM. A detailed inspection of these structures revealed a diverse collection of inhibitor functional groups that interact with the cofactor (Figure 20, middle and right panels). In the following sections, we review the structural chemistry of substrate competing inhibitors that depend on the presence of the cofactor for binding.

71

Figure 20. SAM or SAH interface bound by methyltransferase inhibitors. Percentage of the inhibitor’s interface interacting with SAM/SAH rather than with the protein (left panel). Inhibitors of Rossmann-fold (middle panel) and SET-fold (right panel) methyltransferases. Color-coding indicates inhibitor chemistry involved in SAM and SAH binding. Exemplary methyltransferase inhibitors are labeled with their target indicated in parentheses. Adapted from [150].

72

3.3.2 PRMT5

Substrate competitive Rossmann methyltransferase inhibitors have successfully employed aromatic, amide and composite moieties to promote binding by forming stabilizing interactions with SAM and/or SAH (Figure 21A). These compounds can possess high affinity for their target enzyme, such as the type II arginine methyltransferase, PRMT5, which catalyzes the monomethylation and symmetrical dimethylation of arginine sidechains. One example of this high-affinity binding comes from the substrate competitive, SAM-dependent PRMT5 inhibitor,

EPZ015666 (KD <1 nM) [142]. Interestingly, Chan-Penebre et al. demonstrated that EPZ015666 possessed a much weaker affinity for PRMT5 when the SPR assays used to measure binding were performed in the presence of SAH as opposed to SAM (KD = 171 nM) [142]. Cocrystal structures of the SAM-PRMT5-EPZ015666 and SAH-PRMT5-EPZ015666 complexes illustrates how the phenyl ring of EPZ015666 forms an important cation-π interaction with the partially charged sulfonium methyl group of SAM that is absent in SAH (Figure 21A). By comparing the affinities of EPZ015666 to SAM- and SAH-bound PRMT5, the authors calculated that the cation-π interaction contributed ~12.5 kJmol−1 of binding energy, demonstrating the specific importance of SAM for inhibition [142].

73

Figure 21. Substrate competitive methyltransferase inhibitors interact with SAM and SAH. Stick representations of inhibitor-cofactor pairs with spheres indicating van der Waals radii of interacting atoms for A EPZ015666-SAM (PRMT5, 4X61), B GSK3368715-SAH (PRMT1, 6NT2) , C E72-SAH (EHMT1, 3MO5), D PFI-2-SAM (SETD7, 4JLG), E BAY-598-SAM (SMYD2, 5ARG), F A-196-SAM (SUV420H1, 5CPR) and G MRK-740-SAM (PRDM9, 6NM4). H MRK-740 locks the PRDM9 post-SET helix in an “open”, inactive confirmation (purple helix) and blocks the “closed”, active confirmation (blue helix, 4C1Q). Protein and PDB code are indicated in parentheses. Adapted from [150].

74

3.3.3 Type I PRMTs

Type I PRMTs are Rossmann-fold enzymes that catalyze the asymmetric di-methylation of a single nitrogen on the guanidinium moiety of arginine sidechains. Type I PRMT inhibitors that contain an aliphatic amine can function as unreactive surrogates for substrate mimetic inhibition. For example, GSK3368715 (EPZ019997) is a potent and bioactive pan-type I PRMT inhibitor currently in phase 1 clinic trial in patients with solid tumors and diffuse large B-cell lymphoma (Clinicaltrials.gov ID: NCT03666988). GSK3368715 is highly potent against all type

I PRMT enzymes, with the exception of PRMT4 (IC50 values for PRMT1, PRMT3, PRMT4, PRMT6 and PRMT8 of 3.1, 48, 1148, 5.7 and 1.7 nM, respectively) [160]. Kinetic analysis of the inhibitory mechanism was performed by increasing either the substrate or SAM concentrations across enzymatic activity assays and indicated that GSK3368715 was a substrate competitive and SAM-dependent (also known as SAM-uncompetitive) inhibitor [160]. A cocrystal structure of the PRMT1-SAH-GSK3368715 complex demonstrated that the methylamino group of GSK3368715 projects through the substrate arginine site towards the sulfur atom of SAH (Figure 21B) [160]. Other type I PRMT inhibitors such as TP-064 and MS023 also position an aliphatic amine within the arginine binding pocket, but instead exhibit SAM-independent (also known as SAM-non- competitive) modes of inhibition as determined by enzymatic activity assays [161] [162]. Although an increase in SAM or peptide concentrations has no effect on the TP-064 catalytic IC50 value, TP- 064 affinity for PRMT4 measured by surface plasmon resonance (SPR) revealed that the compound bound tightly to PRMT4 (KD = 7.1 ± 1.8 nM), but only in the presence of either SAM or SAH. Taken together, these findings exemplify the structural contribution of SAM or SAH towards the stabilization of the substrate pocket, which appears to be required for inhibition [157] [162].

75

3.3.4 EHMT1/2 (G9a/GLP)

Aliphatic amines have also been employed by substrate mimetic inhibitors of SET-domain containing lysine methyltransferases. Chang et al. (2010) explored this idea by incorporating a lysine mimetic to a previously reported inhibitor (BIX-01294) of the histone H3 lysine 9 methyltransferases EHMT1 and EHMT2 (also known as GLP and G9a, respectively) [163]. Structural elucidation of EHMT1-SAH in complex with the modified inhibitor (E72) revealed that the primary amine is only 4.2Å from the SAH sulfur atom and within the range of catalytic activity when EHMT1 is bound by SAM (Figure 21C). Indeed, the authors demonstrated that overnight incubation produced mono-, di-, and tri-methylated E72 and found that this slow methylation activity led to an improved IC50 value when compared to the inhibitory effect of the original BIX- 01294 molecule [163].

Slow methylation of E72 highlights an interesting avenue for methyltransferase inhibition that exploits the natural function of SAM. In contrast to E72, inhibitor methylation was not reported for the EHMT1/2 inhibitor UNC0224, which possesses a terminal dimethyl-amino moiety [164]. A co-crystal structure of the EHMT2-SAH-UNC0224 complex revealed that the tertiary amine approaches the sulfur atom of SAH within a catalytically amenable distance similar to the structure of E72 bound to EHMT1. However, EHMT2 lacks intrinsic tri-methyltransferase activity which could explain why inhibitor methylation was unreported [164] [35].

3.3.5 SETD7, SMYD2, and SUV420H1/2

Pyrrolidines are cyclic tertiary amines used in several unreactive substrate mimetic SET domain inhibitors that directly interact with the bound cofactor. For example, PFI-2 is a potent and cell-active SETD7 inhibitor that exhibits a substrate-competitive, SAM-uncompetitive mode of inhibition [141]. A cocrystal structure of SETD7 bound to SAM and PFI-2 demonstrated that the pyrrolidine group of the inhibitor extends through the substrate lysine channel and makes a hydrophobic interaction with the departing methyl group of SAM (Figure 21D). Importantly, affinity measurements by SPR confirmed that PFI-2 only binds to SETD7 in the presence of SAM [141]. 76

BAY-598 is a peptide-competitive, SAM-uncompetitive inhibitor of SMYD2 that interacts with SAM via a chloro-substituted phenyl moiety [165]. A structure of the SMYD2-SAM-BAY- 598 ternary complex revealed a direct contact between the 3-chloro substituent and the departing methyl group of SAM, resulting in both hydrophobic and weak electrostatic contributions to binding (Figure 21E). Hydrogen bonds can also form with the cofactor’s departing methyl group. For example, the cocrystal structure of the substrate-competitive, SAM-uncompetitive SMYD2 inhibitor AZ505 in complex with SMYD2 and SAM showed that the ketone oxygen of the benzooxazinone moiety is 2.8 Å away from the sulfonium methyl group, which can act as a hydrogen-bond donor [166] [21]. Stabilizing amide-sulfonium interactions have also been reported for other SMYD-family inhibitors such as EPZ030456 and an AZ505-analog called A-893 [167] [168].

By forgoing the nitrogen atom present in the amino and pyrrolidine moieties of substrate mimetics, several inhibitors have used the enhanced aliphatic characteristics of cycloalkyl or simple alkyl moieties to bind within the substrate channel [169] [170] [171]. One example is the cyclopentane group on the SUV420H1/2 inhibitor A-196 [170]. Similar to the previously mentioned pyrrolidine inhibitor PFI-2, the cyclopentane group is positioned within 3.2 Å of the departing methyl group of SAM, making a hydrophobic interaction (Figure 21F). The authors used methyltransferase activity assays to assess the mechanism of inhibition and found that the

IC50 value of A-196 remained constant with increased SAM concentrations, implying that A-196 is noncompetitive with the cofactor. Interestingly, isothermal titration calorimetry (ITC) affinity measurements demonstrated that the KD of A-196 decreases from 74.8 nM to 27.8 nM in the presence of SAM. The observed increased affinity of A-196 for SUV420H1/2 in the presence of SAM is attributed to the dual effect of interacting with SAM and stabilizing the substrate-binding fold of the protein, as observed in other SET domains and Rossmann domains [157] [170] [15].

3.3.6 PRDM9

Our structural survey of methyltransferase-cofactor-inhibitor complexes revealed that SET-domain inhibitors were generally larger than Rossmann methyltransferase inhibitors but 77

possessed a smaller relative interface with either SAM or SAH compared to the protein (Figure 20). The only outlier to this trend was the PRDM9 inhibitor MRK-740, where SAM directly contributes to 24% of the total inhibitor binding interface (Figure 20C). PRDM9 is one of seven PRDM family members with reported methyltransferase activity. PRDMs share the SET-domain fold but have only 20-30% amino acid sequence identity with other SET-domain enzymes [35]. MRK-740 is a PRDM9-specific, first-in-class PRDM inhibitor with a unique mode of binding compared to all SET-fold methyltransferase inhibitors [102]. Enzyme assays demonstrated that MRK-740 is a substrate-competitive, SAM-uncompetitive inhibitor, and a cocrystal structure of the PR/SET-domain of PRDM9 in a ternary complex with SAM and MRK-740 revealed that MRK-740 makes multiple interactions with SAM and PRDM9 driven by aromatic and hydrophobic substituents on the inhibitor (Figure 21G). MRK-740 locks the enzyme into an inactive, “opened” confirmation that prevents completion of the substrate pocket, while simultaneously occupying the lysine binding cavity (Figure 21H). The extensive interface between MRK-740 and SAM highlights essential structural contributions provided by the cofactor. Whether such an extensive inhibitor-cofactor interaction is only possible within the PRDM subfamily remains an open question.

3.3.7 Neurotransmitter methylation

3.3.7.1 COMT

In addition to PMTs, the design of inhibitors targeting methyltransferases that act on neurotransmitter molecules is an important area of medical research. Catechol-O- methyltransferase (COMT) is one of several enzymes that degrade catecholamine neurotransmitters (dopamine, epinephrine, and norepinephrine), which is achieved by methylating oxygen atoms using the donor molecule SAM. The COMT-mediated catecholamine methylation reaction occurs stepwise with the binding of SAM, then the secondary cofactor Mg2+ ion and finally the substrate catecholamine [172]. Nitrocatechols are a class of unreactive, substrate mimetic COMT inhibitors prescribed to treat symptoms of Parkinson’s disease, which exert their 78

pharmacological effects by blocking the degradation of dopamine orchestrated by COMT methyltransferase activity [173]. Early structural elucidation of COMT with a small (200.11 Da), non-reactive, first-generation, substrate mimetic inhibitor called OR-486 (3,5-dinitrocatechol) provided an exemplary depiction of the cofactor’s contribution to the inhibitor binding interface (Figure 22A). This co-crystal structure of rat COMT in complex with a Mg2+ ion, SAM and OR- 486 demonstrated a detailed network of electrostatic interactions between the positive charges on the Mg2+ ion and sulfonium of SAM with the negative partial charges of the catechol hydroxyl groups [174]. A recent study examined high resolution structures of COMT with OR-486 and either SAM or the transition state analog sinefungin and found that only a positively charged sinefungin amine could account for the tight binding of OR-486 to the COMT, which highlighted the importance of the charged sulfonium group in SAM [175]. Not surprisingly, OR-486 and other nitrocatechol-based inhibitors bind COMT competitively with the substrate catechol and behave uncompetitively with respect to SAM [176].

Interestingly, OR-486 only occupies a small portion of the available substrate binding pocket on COMT (Figure 22B). Our analysis of cofactor-inhibitor interfaces indicated that SAM comprises 20% of the OR-486-binding site interface, which is the highest proportion of any inhibitor for Rossmann methyltransferases that we could find (Figure 20, middle panel). A more recently developed COMT inhibitor BIA 3-335 based off the OR-486 scaffold exploits the available space within the substrate pocket for improved potency and specificity [177] [178]. It appears that this strategy may be important for pharmacological optimization as some drugs for Parkinson’s disease such as opicapone, tolcapone and entacapone that are all based off the OR- 486 scaffold possess chemical substituents reminiscent to BIA 3-335 [179].

79

Figure 22. Substrate competitive methyltransferase inhibitors interact with SAM and SAH for neurotransmitter methylation. Stick representations of inhibitor-cofactor pairs with spheres indicating van der Waals radii of interacting atoms for A OR-486-SAM alone and B bound to COMT (1vid), and C SK&F29661-SAH alone and D bound to PNMT (1hnn). Protein and PDB code are indicated in parentheses.

80

3.3.7.2 PNMT

Phenylethanolamine N-methyltransferase (PNMT) is a Rossmann fold methyltransferase responsible for the neuronal synthesis of epinephrine by methylating the primary amine of norepinephrine using the donor molecule SAM. An inhibitor of PNMT methyltransferase activity called SK&F 29661 potently and selectively blocks PNMT from binding to norepinephrine, but cannot cross the blood-brain barrier, thereby limiting its utility as a chemical probe compound or drug [180]. However, SK&F 29661 still offers valuable insights as a tool compound to study PNMT inhibition. A cocrystal structure of PNMT bound by SK&F 29661, revealed that SK&F 29661 has a composite interface with SAH, composed of an unreactive, substrate-mimetic amino group positioned 7 Å from the SAH sulfur, tethered to a sulfonamide-substituted benzene that packs along homocysteine portion of SAH [181] (Figure 22C). The structure also demonstrated that the small (212.27 Da) substrate mimetic inhibitor only occupied a portion of the available substrate pocket just as with OR-486 binding to COMT (Figure 22D). Unlike OR-486, which relies on SAM to provide 20% of the binding interface, we found that for SK&F 29661, SAH only provides 10% of the binding site (Figure 20, middle panel). Interestingly, a subsequent study reported the development of larger compounds based of the SK&F 29661 scaffold that could bind within the extended substrate pocket resulting in larger perturbations of the active site [182]. This highlights how Rossmann methyltransferase inhibitors can be chemically optimized by building away from the substrate-inhibitor interface towards unoccupied sites on the protein.

81

3.4 Discussion

Here we surveyed publicly available methyltransferase structures bound by SAM or SAH in close proximity to a substrate-competitive inhibitor and examined the various inhibitor-cofactor interactions. We found that SAM enhances the binding of certain substrate-competitive inhibitors and is essential of inhibitor binding in some instances. Often SAM provides stabilizing interactions with the inhibitor via the positively charged methyl-sulphonium or the aliphatic nature of departing methyl group. For instance, the methyl-sulphonium cation stabilizes the PRMT5 inhibitor EPZ015666 and the COMT inhibitor OR-486 via pi-cation and salt bridge interactions, respectively. Additionally, the aliphatic departing methyl group on SAM forms stabilizing hydrophobic interactions with the SETD7 inhibitor PFI-2, the SUV420H1/2 inhibitor A-196, and the EHMT1 inhibitor BIX-01294. Quite remarkably, converting the aliphatic and inert substrate mimetic group on the EHMT1 inhibitor BIX-01294 to a reactive primary amine (EHMT1 inhibitor E72) was shown to optimize the inhibitory effect of the compound. It is conceivable that this mode of inhibitory optimization could yield similar results in analogous systems. Further examples of cofactor-enhanced binding are the type I PRMT inhibitor GSK3368715 and the PRMT4-specific inhibitor TP-064, where cofactor-induced substrate binding site stabilization enhances the affinity of these inhibitory substrate mimetics to their target. An extreme version of this cofactor-enhanced binding is exemplified by the PRDM9 inhibitor MRK-740, which only binds the SAM-bound enzyme. At an unprecedented degree, SAM contributes 24% of the total inhibitor-interaction interface, indicating a unique strategy for inhibition that may be specific to the dynamic nature of methyltransferase proteins that possess a PR/SET-fold. Evidence for inhibitor optimization by building away from the substrate-inhibitor interface towards unoccupied sites on the protein was observed in second-generation inhibitors for COMT and PNMT, that were based off OR-486 and SK&F 29661, respectively. It is possible that an overreliance on the cofactor interface may lead to off-target effects among methyltransferases within the same family, although the high specificity of MRK-740 for PRDM9 indicates that this may not always hold true. In summary, our analysis and review provide mechanistic insights that could be used to inform structure-guided, rational design of new substrate-competitive, cofactor-dependent methyltransferase inhibitors

82

3.5 Methods

To quantify the interaction between small-molecule inhibitors and SAM or SAH, binding interfaces between SAM or SAH and the inhibitor, as well as the protein and inhibitor were calculated. Briefly, all SAM or SAH containing protein structures (n = 1397) were extracted from the Protein Data Bank (date of acquisition: 2019-09-25) and filtered to only include structures with a ligand <4 Å from either SAM or SAH. Ligands matching the BioLiP list of biologically irrelevant ligands [183] were automatically removed and the remain structures we manually curated to ensure only structures with inhibitor compounds were retained for further analysis (n = 102). An in-house Python3 script was used in PyMOL (The PyMOL Molecular Graphics System, v2.2.0, Schrödinger, LLC.) to model hydrogens onto all structures and calculate the total potential solvent accessible surface area (SASA) of the protein, SAM or SAH and the ligand, as well as the joint SASA of the protein-inhibitor complex and the SAM or SAH-inhibitor complex. The binding interface was calculated using (푥 + 𝑖 − 푥𝑖)/2, where x and i are the SASA of SAM, SAH or protein and inhibitor, respectively and xi is the complex. The cofactor interface percentage was calculated as (100 ∗ 푠𝑖)/(푠𝑖 + 푝𝑖), with si being the SAM or SAH and inhibitor interface and pi being the protein-inhibitor interface. Figures of protein structures were generated using ICM- Browser (Molsoft, San Diego).

83

Chapter 4

Interrogating PR/SET domains for the ability to bind SAH

4.1 Authorship attribution.

This chapter contains unpublished work, and I am responsible for the overall generation and interpretation of the data included herein with the following mentions: The F-SAH compound was synthesized and kindly provided by Carlos Zepeda, Ontario Institute for Cancer Research (OICR). Shili Duan provided guidance and technical assistance in protein production. Additional proteins were kindly provided by Taraneh Hajian, Elisa Gibson and Dr. Masoud Vedadi from the SGC Toronto. Dr. Scott Houliston advised and supported all NMR experiments, and Drs. Houliston and Cheryl Arrowsmith aided in data analysis and provided guidance throughout the study. This project was prematurely halted due to a catastrophic freezer malfunction during the COVID-19 shutdown.

4.2 Introduction

Lysine methyltransferases are a group epigenetic regulators currently being investigated in clinical and pharmaceutical studies for the treatment of certain cancers [28] [5] [26] [29]. The majority of all KMT proteins possess a domain with a SET-fold [15]. The PRDM protein family is characterized by the presence of a PRDI-BF1 and RIZ1 (PR/SET) domain, which is a subgroup of the SET domain (Figure 2). Currently KMT activity has been reported for some PRDMs (Table 1), while some PRDMs are thought to be inactive pseudoenzymes. Since PRDMs are less represented in the academic literature compared to canonical SET proteins (Figure 5), it is unclear if these PRDMs are truly pseudoenzyme or if their KMT activity is only yet to be characterized. 84

Distinguishing which PRDMs possess KMT activity would further our understanding of lysine epigenetics and pseudoenzyme-driven biology [4] [184].

All domains with a SET-fold interact with SAM through a conserved hydrogen bonding network split across two distinct domain regions [96]. These SAM binding residues in the canonical SET-domains are well conserved, however the corresponding residues in PR/SET domains are less well conserved (Figure 3) (H-binding clusters at Regions 1 and 2)]. The only holo-enzyme structure of a PR/SET domains is of mouse PRDM9 bound to a histone H3 peptide and SAH [101]. The human and mouse PRDM9 PR/SET domains possess many of the highly conserved SAM-binding residues found in the canonical SET-domains. Lacking any structural evidence, it is unclear if and how many of the PR/SET domains may bind to SAM. Therefore, detecting cofactor binding in PR/SET domains would aid in distinguishing whether certain PRDMs are enzymes or pseudoenzymes.

Here I investigate cofactor binding to understand which PR/SET domains may have the ability to catalyze the KMT reaction. Using 19F-nuclear magnetic resonance (NMR) spectroscopy, we measured the binding response for the majority of the human PR/SET domains with a fluorinated analog of the methyltransferase by-product S-adenosylhomocysteine (SAH), 2-fluoro- SAH (F-SAH). 19F-NMR is a sensitive and versatile method to study protein-ligand interactions [185]. Using the F-SAH binding response with the PRDM9 PR/SET domain as a positive control for the screening assays, we identified several PR/SET domains that bind F-SAH and therefore likely bind to SAM. Additionally, we detected several PR/SET domains that do not show any evidence of binding to F-SAH, which suggests a lesser affinity for SAM or potentially no affinity. This dataset can be used as a starting point for future discovery of KMT activity in PRDM proteins.

85

4.3 RESULTS

4.3.1 Establishment of the F-SAH binding assay

We utilized 19F nuclear magnetic resonance spectroscopy (NMR) spectroscopy to assess whether PR/SET domains may have the ability to bind to the methyltransferase enzyme cofactor. Fluorinated SAH was chosen, rather than fluorinated SAM because SAH and its chemical precursors are more stable across a wide pH range and were therefore more suited to chemical synthesis. The chemical structure of 2-fluoro-SAH (F-SAH) shows that SAH is fluorinated at the 2-position of the adenine ring (Figure 23a). Using our structure of the human PRDM9 PR/SET domain bound by SAM, we generated a theoretical model of F-SAH bound to PRDM9 (Figure 23b). We observed that the fluorine atom projects toward the solvent and therefore is unlikely to alter the interaction with the protein. Nevertheless, because fluorine’s chemical shift is highly sensitive to any change in its microenvironment, we expected to see perturbation and/or broadening of the 19F resonance when F-SAH binds to a PR/SET domain.

86

Figure 23. Characterization of F-SAH binding to PRDM9. (a) Chemical structure of 2-fluoro- SAH (F-SAH) and (b) model of F-SAH bound to PRDM9 based on a SAM-bound structure (PDBID: 6nm4). Colour coding corresponds to (a). The effects of F-SAH binding to PRDM9 were measured by 19F-NMR. (c-d) Spectra of 16 µM F-SAH alone and with 1X and 5X molar equivalents of PRDM9 display a concentration dependent decrease in the unbound F-SAH peak and the emergence of a bound F-SAH peak.

87

First, we investigated F-SAH alone by measuring a 1D 19F spectrum of F-SAH, which gave rise to a single peak at ~-52.4 ppm. Next, we monitored for F-SAH binding to PRDM9, which is the most extensively characterized methyltransferase among all the PR/SET domain containing proteins [79] [82] [102] [80]. We assessed the F-SAH NMR signal alone and in the presence of equal and 5-fold molar equivalents of PRDM9 PR/SET domain, we detected a concentration- dependent decrease in the intensity of the unbound F-SAH 19F resonance (Figure 23c-d). Moreover, we observed the emergence of a second 19F signal (~ 1045 Hz up-field at -54.3 ppm) that is from protein-bound F-SAH. As these results were reproducible using two separate PRDM9 protein preparations, we selected 5-fold molar equivalents of PR/SET domain to F-SAH for further experiments.

4.3.2 Screening human PR/SET domains with the F-SAH binding assay

We performed F-SAH binding assays with PR/SET domains from 13 of the 19 human PRDM family members. The PR/SET domains of PRDM 1, 2, 4, 5, 6, 11, 12, 14 and 17 were provided by the SGC Toronto and I purified PRDM 3, 7, 9 and 16. I was able to purify PR/SET protein constructs for PRDM 8, 10, 13 and 15, however these constructs suffered stability issues in our 19F-NMR buffer during dialysis (see Methods) and were therefore not suitable for further study. We collected 19F-NMR spectra for F-SAH with 5-fold molar equivalents of each of the 13 human PR/SET domains (Figure 24a). The relative intensities of the unbound and bound F-SAH peaks varied considerably among the different PR/SET proteins indicating variable binding affinities across the panel of PRDM proteins. To quantify the binding for each PRDM PR/SET domain, we normalized the peak integrals of the unbound and bound peaks for each protein to the unbound peak of F-SAH alone. Specifically, we divided the integral of each peak by the integral of the peak for F-SAH alone. We plotted the normalized integrals of the 1 minus the unbound peak against the bound peak for each PR/SET domain to differentiate each protein’s affinity for F-SAH (Figure 24b). PR/SET datapoints located in the upper, right region of the plot displayed strong binding to F-SAH, while datapoints at the origin indicate no evidence of binding. We used the F- SAH signal in the presence of PRDM9 as an internal control to distinguish a strong binding

88

response. PRDM 4 and 5 were plotted near to PRDM9, indicating a strong evidence response. Additionally, PRDM 14, 12, 7, 11 and 17 exhibited a very strong binding response. We were unable to detect any binding response for F-SAH in the presence of PRDM 3, 1 or 16, using our assay conditions, though we did detect a weak binding response for PRDM 2 and 6. Taken together, this data provides evidence to help prioritize further assays to identify methyltransferase activity among the PRDM family members.

89

90

Figure 24. Screening for F-SAH binders among PR domains. (a) 19F-NMR spectra showing 16 µM F-SAH in the presence of 80 µM of the indicated PR/SET domains. (b) Plots of 19F peak integrals for F-SAH in the presence of individual PR/SET domains. All 19F peaks were normalized to the peak integral of F-SAH alone in buffer and the normalized integrals of the Bound and 1- Unbound peaks relative to F-SAH alone were plotted (i.e. F-SAH alone would be located at position [0,0]).

4.4 Discussion

4.4.1 Using F-SAH to probe KMT domains

Here we used 19F-NMR to investigate the potential for 13 of the 19 human PR/SET domains to bind a fluorinated analog of the methylation reaction by-product SAH. Using our F- SAH binding assays with PR/SET domains, we identified 7 domains (PRDM 4, 5, 7, 11, 12, 14 and 17) that showed a binding response similar to, or stronger than PRDM9. Of these ‘binding- competent’ domains only PRDM 9 and 7 have reported KMT activity, with accompanying enzyme kinetics data [79] [82]. Interestingly, PRDM7 showed stronger binding to F-SAH compared to PRDM9 (Figure 24). PRDM7 possesses much weaker KMT activity for H3K4me2 substrate 2 4 peptide compared to PRDM9, with a 100-fold lower rate of catalysis (kcat=1.9x10 /h vs 1.9x10 /h, for PRDM 7 and 9, respectively) [82] [79]. These catalytic rate differences are mostly attributed to the distinct capacities to facilitate the methyl cation transferase by the catalytic residues (Y357 and S357, in PRDM 9 and 7, respectively). Based on our data, another contributing factor could be that PRDM7 has a stronger affinity for the reaction by-product SAH, which inhibits the catalytic rate. In our F-SAH binding assays, PRDM 2 and 6 showed weak, but detectable, binding compared to PRDM9 (Figure 24). KMT activities towards H3K9 and H4K20 have been reported for PRDM 2 and 6, respectively [78] [86] [91]. This indicates that a weak binding response for F-SAH is associated with known active KMTs. Finally, in all cases where we observed F-SAH binding, we measured a consistent up-field shift (~1040 Hz) in the F-SAH 19F resonance (Figure 24a). This indicated that there is a common binding modality with a highly similar microenvironment formed in the bound state across the family.

91

We also demonstrated the utility of F-SAH as a synthetic chemical probe for methyltransferase proteins. Our approach is comparable to a previous study that reported the binding of the MLL1 SET-domain to a SAM-analog conjugated with a fluorescent tag [186]. Comparatively, F-SAH benefits from the smaller size of fluorine, which is positioned on the adenine ring in an orientation that is unlikely to interfere with binding (Figure 24a-b). Additionally, 19F-NMR is capable of detecting low affinity protein-ligand interactions, such as is required for fragment-based screening [185]. We surveyed two publicly available databases that quantify protein-ligand interaction data (Table 6). We found that SET-fold containing proteins have a range of affinities for SAH with dissociation constants (KD) from low µM to high nM, as well as a similar range for reported half- maximal inhibition (IC50) by SAH. Only RBCMT had affinity data for both SAH and SAM, showing that SAM binds RBCMT with an ~70-fold higher affinity when compared with SAH [22]. The authors reported conserved binding modes in crystal structures of RBCMT-SAM and RBCMT-SAH implying that the variations in affinity could not be attributed to the binding conformations but may rather be attributed to a higher entropic penalty for bound SAH, which possesses additional bond rotational freedom compared to SAM [22]. Taken together, we can predict that the upper-limit for the affinity of PRDM9 for F-SAH is less than the known affinity of SAM, which is 18.6 µM [79].

92

Table 6. Known dissociation constant (KD) and half maximal inhibition values (IC50) of SAM and SAH for SET domain proteins. Binding affinity data available from the BindingDB (www.bindingdb.org) and BindingMOAD (www.bindingmoad.org/) was accessed from the RCSB

PDB (www.rcsb.org). Note that BindingDB reports a large range of IC50 values for EZH2.

Protein Source Ligand KD (µM) IC50 (µM) Source EHMT1 Homo sapiens SAH 2.3 BindingDB EHMT2 Homo sapiens SAH 0.57 2.0 BindingDB EZH2 Homo sapiens SAH 0.263-16.6 BindingDB KMT2A Homo sapiens SAH 2.3 BindingDB Kmt5b Mus musculus SAM 11.2 BindingMOAD Kmt5c Mus musculus SAH 17.2 10 BindingDB PRDM9 Homo sapiens SAM 18.6 [79] RBCMT Pisum sativum SAM 0.29 BindingMOAD RBCMT Pisum sativum SAH 21.2 BindingMOAD SETD7 Homo sapiens SAH 30 BindingDB mSMYD2 Mus musculus SAH 0.18 BindingDB SMYD2 Homo sapiens SAH 0.18 BindingDB

93

4.5 Interpreting the absence of binding evidence

We identified that PRDM 1, 3, and 16 do not bind to F-SAH in our assay conditions, which is suggestive of the inability to bind to SAM. PRDM1 (initially identified as Blimp-1) is widely understood to lack KMT activity and instead functions as a gene repressor by recruiting specific co-repressors proteins to PRDM1-binding sites throughout the genome [187]. PRDM1 does affect histone methylation, but it does so through interactions with the KMT protein G9a/EHMT2 [84], the arginine methyltransferase PRMT5 [85] and the lysine demethylase LSD1 [188]. Whether the PR/SET domain of PRDM1 functions as a “reader domain” for specific histone tail modifications is undetermined, however our data suggests that it is unlikely that SAH or SAM would play a role in this ability.

We found that the PR/SET domains of PRDM 3 and 16 do not bind to F-SAH under our assay conditions. However, this does not exclude the possibility that larger or differently generated constructs may possess the ability to bind SAM or SAH. For instance, longer or full-length PRDM 3 and 16 protein constructs were reported to possess KMT activity when purified from either HeLa cells and mouse fibroblasts or SF9 insect cells [87] [95]. Our PRDM 3 and 16 constructs were purified from E. coli and each contained the PR/SET domain along with the first proximal C- terminal zinc-finger motif (ZnF1), which is similar to previous structural and enzyme characterization studies of the PR/SET and ZnF1 of PRDM9 [79] [101]. It is unclear if the additional regions outside the PRDM 3 and 16 PR/SET domains and ZnF1 could function to stabilize the cofactor binding site in the PR/SET domain. Interestingly, a previously solved NMR solution structure of PRDM16 demonstrated the absence of internal β-strands that are present in crystal and solution structures of all other PR/SET domains, which could possibly destabilize the cofactor and substrate binding sites (Figure 4d). Furthermore, it is unclear if additional factors present in eukaryotic expression systems enable cofactor binding or enzymatic activity.

94

4.5.1 Future analysis of PRDMs

Future investigation to complete the F-SAH binding screen with the untested members of the PRDM family (PRDM 8, 10, 13, 15, FOG1, and FOG2) would provide valuable data. Additionally, assaying F-SAH binding in competition with (unlabeled) SAM could qualitatively delineate the affinities of the methyltransferase reaction cofactor and by-product, providing further evidence of potential enzymatic function. We detected F-SAH binding in PRDM 4, 5, 11, 12, 14, and 17, and we could not find any studies reporting KMT activity with these proteins. Intriguingly, PRDM 4, 5, 11, 12, and 14 all possess the highly conserved “catalytic tyrosine” (Figure 3), which is further evidence supporting KMT activity for these proteins. Importantly, several studies have attributed oncogenic roles for PRDM 4, 12 and 14 in specific cancers [45]. Future discovery of enzymatic function in any of these PR/SET domains could provide a novel, actionable target for oncological drug discovery.

95

4.6 Methods

4.6.1 Protein production

PRDM9 (195-385) and PRDM7 (195-392) with an N-terminal 6xHis-tag and TEV protease site were expressed in Escherichia coli BL21 (DE3) codon plus cells from a pET28-MHL vector. PRDM3 (69-235) and PRDM16 (73-256) with an N-terminal 6xHis-tag and TEV protease site were expressed in Escherichia coli BL21 (DE3) codon plus cells from a pET15b/MHL vector. Cell were grown at 37°C in M9 minimal medium in the presence of 50 µg/ml of kanamycin (PRDM 7 and 9) or 100 µg/ml of ampicillin (PRDM 3 and 16) to an OD600 of 0.8 and induced by isopropyl-1-thio-D-galactopyranoside (IPTG), final concentration 0.5 mM and incubated overnight at 15°C. Cells were harvested by centrifugation at 7,000 rpm and cell pellets were stored at -80°C. Cell were lyzed on ice in lysis buffer (20 mM Tris pH 7.5, 300 mM NaCl, 10 mM imidazole, 50 µM ZnSO4, 5 mM β-mercaptoethanol, 0.5 mM TCEP, 0.1% Triton-X 100, 2.5% glycerol, 1mM PMSF, 2 mM benzamidine and Roche complete EDTA-free protease inhibitor cocktail tablet) using a probe sonicator. The crude extract was cleared by ultracentrifugation for 50 min at 50000xg and the supernatant was incubated with TALON Metal Affinity Resin (Takara) at 4oC for 120 minutes with agitation. Resin was washed in wash buffer (lysis buffer with 0.01% Triton-X 100 and lacking protease inhibitors) and bound proteins were eluted using elution buffer (wash buffer with 400 mM imidazole), monitored by Bradford analysis. Protein was dialyzed overnight at 4°C in TEV dialysis buffer (20 mM Tris pH 7.5, 300 mM NaCl, 20µM ZnCL2, 2.5% glycerol, 5 mM β-mercaptoethanol and 0.5 mM TCEP) and incubated with His-tagged TEV protease produced in house at a 1/20 dilution by mass. 0.5 mM CHAPS was added to the TEV dialysis buffer for PRDM 9 and 7, but not PRDM 3 and 16. TEV and uncleaved proteins were removed using TALON resin and soluble protein was loaded onto a Superdex 75 Increase 10/300 GL colum (GE Healthcare), equilibrated with 20 mM Tris-HCl buffer, pH 7.5, and 150 mM NaCl, 2.5% glycerol, 5 mM β-mercaptoethanol and 0.5 mM TCEP at flow rate 0.7 ml/min. Fractions containing PRDM9 protein were pooled and dialyzed into low salt buffer (20 mM Tris-HCl pH 8.5, 100 mM NaCl, 5% glycerol, 5 mM β-mercaptoethanol and 0.5 mM TCEP). PRDM9 protein

96

was further purified by anion-exchange chromatography on 2 tandemly joined 5 ml HiTrap™ DEAE FF (GE Healthcare) columns along a linear gradient up to 1.0 M NaCl. Purified PRDM9 protein was dialyzed into a final buffer containing 20 mM Tris pH 7.5, 150 mM NaCl and 2 mM TCEP. The proteins for PRDM1 (38-223), PRDM2 (2-160), PRDM4 (390-540), PRDM5 (2-226), PRDM6 (194-405), PRDM11 (79-314), PRDM12 (60-229), PRDM14 (212-422), and PRDM17 (1-205) were purified from E. coli expression and kindly provided by Taraneh Hajian, Elisa Gibson and Dr. Masoud Vedadi from the SGC Toronto.

4.6.2 F-SAH binding by 19F NMR

F-SAH was synthesized and kindly provided by Carlos Zepeda, Ontario Institute for Cancer Research (OICR). All PR/SET domain containing proteins were dialyzed overnight at 4oC 19 into F-NMR buffer (20 mM Tris pH 7.3, 150 mM NaCl, 2 mM TCEP and 20 µM ZnCl2) also containing 16 µM sodium trifluoroacetate. The concentration of each protein was measured using an absorbance reading at 280 nm and evaluated using their respective theoretical extinction coefficients calculated using ProtParam [189]. 100 mM F-SAH in d6-DMSO was diluted to 16 µM F-SAH by adding the dialyzed protein solution to a final protein concentration of 80 µM and then topping off with 19F-NMR buffer to a final volume of 500 µl. For the F-SAH alone sample, only 19F-NMR buffer was added. The solutions were mixed in 1.5 ml polypropylene tubes and then transferred to 5 mm glass NMR tubes.

1D 19F spectra were acquired at 298K on a Bruker Avance III spectrometer operating at 600 MHz and equipped with a QCI cryoprobe with an independent 19F detection coil. Each spectrum was acquired with 3072 scans, an acquisition time of 577 ms, and processed by applying an exponential window function (LB = 20). There was no difference observed in the F-SAH 19F line-shape when signal acquisition was acquired with or without proton decoupling; therefore, all spectra were acquired without proton decoupling. The spectral window was measured from -37.89 ppm to -88.09 ppm and 19F peaks were identified at -75.53 ppm, -52.40 ppm, and -54.21 ppm, for trifluoroacetate, F-SAH unbound, and F-SAH bound, respectively. Processing and analysis were

97

carried out with Topspin 3.5 (Bruker BioSpin). To control for pipetting errors that could result in differences in F-SAH signal intensities, the integral of the trifluoroacetate 19F peak of each PR/SET containing sample was set to equal the integral of the trifluoroacetate 19F peak of the sample with F-SAH alone. To normalize F-SAH peaks, the integral for each F-SAH peak was divided by the integral of the F-SAH peak from the sample with F-SAH alone (i.e. integral of F-SAH alone = 1).

98

Chapter 5

Direct interaction between the PRDM3 and PRDM16 tumour suppressors and the NuRD chromatin remodelling complex

5.1 Authorship attribution.

This chapter is an unabridged reproduction of a publication, for which I was the primary author [190]. All Figures and text are directly copied, with minor textual edits to ensure continuity with the accompanying thesis chapters (e.g. using ‘PR/SET’ rather than ‘PR’, as in the original publication). I am responsible for the generation and interpretation of the data included herein with the following mentions: Dr. Evelyne Lima-Fernandes and Harshika Jain, in collaboration with Dr. Edyta Marcon from the laboratory of Dr. Jack Greenblatt, performed the mass spectrometry experiments and produced the raw data. The RBBP4 protein expression was performed by Ashley Hutchinson and Dr. Alma Seitova. Ashley Hutchinson and I purified the RBBP4 protein and Shili Duan provided guidance during the production of protein PRDM3 and PRDM16 proteins. Dr. Lima-Fernandes performed all immunoprecipitation and western blot assays. Elizabeth Henderson and Pavel Savitsky from the laboratory of Dr. Panagis Filippakopoulos performed all the cellular interaction assays by LacO/LacR chromatin immobilization. Drs. Levon Halabelian and Mani Ravichandran advised protein crystallization, Dr. Halabelian acquired the synchrotron data, and aided in structure determination. Drs. Lima-Fernandes and Cheryl Arrowsmith conceptualized this project and provided guidance throughout the study.

99

5.2 Introduction

The MDS1 and EVI1 complex (MECOM) encodes two isoform subgroups through the induction of alternate transcription start sites (TSS) preceding either MDS1 or EVI1 loci [41]. Either the MECOM-encompassing PRDM3 isoforms or the MDS1-lacking EVI1 isoforms can be produced, with additional isoforms arising from alternative splicing events. PRDM3 belongs to the PRDM family of transcription factors characterized by an N-terminal PR/SET domain, followed by an array of C2H2 zinc finger motifs, while EVI1 possesses the zinc fingers but is N- terminally truncated and lacks the PR/SET domain (ΔPR/SET). PRDM16 (also known as MEL1) is a closely related paralog of PRDM3 sharing 53% sequence identity with the N-terminus of PRDM3, which can also be omitted through expression of the ΔPR/SET PRDM16 isoform [191].

MECOM and PRDM16 gene expression has been observed across many tissue types and both are implicated in haematopoietic development [192] [31]. The PRDM3 isoform is critical for maintaining long-term hematopoietic stem cell function, while the EVI1 isoform has an essential role in hematopoiesis. However, MECOM expression declines after hematopoiesis [193]. PRDM16 is preferentially expressed by haematopoietic and neuronal stem cells and functions to attenuate reactive oxygen species (ROS)-related stress through the promotion of HGF (Hepatocyte Growth Factor) gene expression [194] [195]. Additionally, PRDM16 is a key determinant of brown adipose tissue identity by suppressing genes of white adipose tissue, while independently activating genes for brown adipose tissue [74] [196] [197]. Proteins that drive key developmental pathways are frequently dysregulated in cancer and thus it is no surprise that both PRDM3 and PRDM16 are directly linked to various aspects of oncogenic transformation.

A Yin-Yang analogy describes the isoform imbalance observed with some PRDM proteins that either function as tumor suppressors or oncogenes depending on the retention of the PR/SET domain [32]. For example, EVI1 is a potent oncogene associated with transformation and proliferation in multiple leukemias, while expression of PRDM3 is frequently abrogated and a low PRDM3/EVI1 expression ratio predicts an extremely poor prognosis for acute myeloid leukemia (AML) patients [41] [198] [199] [200] [201]. Additionally, in solid tumors from ovarian and

100

hepatocellular carcinomas, EVI1 overexpression has been shown to drive oncogenesis and progression, while EVI1 in colon cancer was shown to be critical for metastasis [202] [203] [71]. Aberrant MECOM isoform expression can arise from 3q26 genomic rearrangements imparting an imbalance between EVI1 and PRDM3 isoforms and with it, poor patient survival in AML [201] [204]. Rearrangements may arise from retroviral insertions between the MDS1 and EVI1 loci, which interrupt normal PRDM3 transcription and leads to EVI1 overexpression [38] [205] [198]. Some leukemia patients lack an altered 3q26 karyotype and instead overexpress the EVI1 isoform through activation by mixed-lineage leukemia (MLL) chimeric genes MLL-ENL or MLL-AF9 [206]. PRDM16 is similar to the MECOM gene, wherein lentiviral-induced genomic alterations lead to depletion of full-length PRDM16 and higher levels of the N-terminally truncated ΔPR/SET PRDM16 isoform, while full-length PRDM16 may also function as a tumor suppressor protein in leukemias [191] [205] [77] [207] [95]. Together, these consistent pathologies, along with the high sequence conservation between the PRDM3 and PRDM16 proteins appear to suggest that an exclusive molecular property of both full-length isoforms can function to repress certain aspects of tumor formation and/or progression.

The N-terminal PR/SET domain belongs to a distinct class of SET domains that are sometimes described for a lack of intrinsic lysine methyltransferase (KMT) activity. Reported KMT activity for PRDM3 and PRDM16 PR/SET domains includes weak mono-methyltransferase activity on lysine 9 of histone H3 (H3K9me1), which occurs in the cytosol [87]. Additionally, PRDM16 has been reported to methylate lysine 9 and lysine 4 on histone H3 in separate studies [56] [95]. Interestingly, a key catalytic tyrosine residue present in the robustly active KMT enzyme PRDM9, as well as in all other demonstrably enzymatic SET domains, is absent from PRDM3 and PRDM16, suggesting a potential alternative function of these N-terminal domain [81] [82]. The C-terminal zinc finger motifs of PRDM3 and PRDM16 cluster into two separate domains and facilitate specific interactions with DNA. In PRDM3, the N-terminal zinc fingers bind a GATA- like motif and the C-terminal zinc fingers bind to an ETS-like motif [208] [209]. ChIP-seq analysis of EVI1 binding sites in SKOV3 ovarian carcinoma cells demonstrated enrichment at myeloid leukemia genes [210], while an analysis across a panel of AML cell lines found that EVI1 binding leads to deregulation of genes involved in apoptosis, differentiation, and proliferation [211].

101

Interestingly, while PRDM16 can localize to the same DNA-binding sites as PRDM3/EVI1 through its zinc finger domains, ChIP-seq analysis suggests that PRDM16 can be recruited indirectly to chromatin in brown adipose tissue via interactions with DNA-binding partners, including C/EBPβ and PPARγ, rather than by direct binding to DNA [77] [197]. Both PRDM3 and PRDM16 bind to C-terminal binding protein (CtBP) through canonical PLDLS CtBP-binding sites located between the 2 zinc finger clusters, which can promote cellular growth by repressing transcription downstream of transforming growth factor-β (TGF-β) signaling [212] [213]. Additionally, the EVI1 isoform has been reported to form homo-oligomers capable of enhanced CtBP binding [214]. While it is well established that the zinc finger motifs direct genomic localization, it remains unclear if the N-terminal amino acids that are exclusive to full-length PRDM3 and PRDM16 contribute to biologically relevant protein-protein interactions.

The NuRD chromatin remodeling complex is an essential epigenetic regulator of developmental genes. In haematopoietic stem cells, NuRD regulates the expression of genetic pathways critical for proliferation and differentiation, while perturbation of NuRD signaling is associated with cancer and premature aging [215]. The NuRD complex possesses ATP-dependent chromatin-remodeling and histone deacetylase activities conferred by CHD3 and CHD4 (chromodomain/helicase/DNA-binding) proteins and HDAC1 and HDAC2 (histone deacetylase) proteins, respectively [216]. Structural studies suggest that the core NuRD complex contains the HDAC1 and HDAC2 proteins, the MTA1 and MTA2 (metastasis-associated) proteins, as well as the RBBP4 and RBBP7 proteins [217] [216]. Additional complex members include the CHD3 and CHD4 proteins, the MBD2 and MBD3 (methyl-CpG-binding domain) proteins and the GATAD2A and GATAD2B (GATA Zinc Finger Domain Containing) proteins [216]. The NuRD complex is composed of multiple RBBP4/7 subunits, which scaffold between histone H3 tails and MTA1/2 subunits. [216] [218] [217]. Several transcription factors are known to bind RBBP4/7 in the NuRD complex by competing for the histone H3 binding interface, including SALL1, FOG1, PHF6 and BCL11A [219] [220] [221] [222]. Interestingly, FOG1 forms a strong interaction with RBBP4 and has a secondary interaction with the MTA proteins within the NuRD complex [223] [220]. A previous immuno-precipitation with mass spectrometry (IP-MS) screen of EVI1 interactions has identified the NuRD complex members RBBP4, HDAC1, HDAC2 and CHD4 as

102

potential interactors [224]. Additionally, a yeast two-hybrid screen found that EVI1 interacts with the MBD3b protein [225], but this interaction was not observed in IP-MS assay [224]. It remains unclear how EVI1 interacts with the NuRD complex and whether the N-terminal residues of PRDM3 contribute to these interactions.

In this study, we find that the RNA expression profiles of MECOM transcripts across a panel of solid tumors support the Yin-Yang hypothesis of a cancer-specific imbalance of full- length PRDM3 and the ΔPR/SET isoform known as EVI1. Using proteomics IP-MS experiments to compare the interactomes of the full-length and ΔPR/SET isoforms of PRDM3 and PRDM16, we determine that NuRD complex members are significantly enriched for both full-length proteins compared to the ΔPR/SET counterparts. Through biophysical characterization and cellular co- localization analysis, we identify an interaction between the N-terminal residues of full-length PRDM3 and PRDM16 with RBBP4 and present the crystal structures of PRDM3 and PRDM16 peptides (residues 1-12) bound to RBBP4 perpendicular to the electronegative β-propeller axis typically occupied by the histone H3 tail. Together, these data provide a molecular and structural framework for understanding how full-length PRDM3 and PRDM16 may regulate epigenetic machinery which is lost in some cancers.

103

5.3 Results

5.3.1 PRDM3 is encoded by the MECOM gene and is depleted in solid tumors

Altered MECOM gene expression is commonly associated with initiation and aggressiveness in a variety of hematologic cancers [201] . To assess if MECOM expression perturbations are common in other cancer types, we examined MECOM gene expression using RNA-seq data obtained from The Cancer Genome Atlas (TCGA) from a variety of solid tumor types. MECOM gene expression was defined as the total transcripts detected from any part of the MECOM locus. First, total MECOM gene expression was compared between autologous healthy and tumor tissue samples from a variety of cancer studies that possessed >10 matched sample pairs (Figure 25A, Table 7). Pairwise comparisons between autologous samples revealed statistically significant decreases in MECOM expression in renal (KIRC and KIRP), lung (LUSC and LUAD), prostate (PRAD) and breast (BRCA) carcinomas, as well as increased MECOM expression in liver (LIHC) and thyroid (THCA) carcinomas, as compared to matched healthy tissue (Figure 25A, Table 8).

104

Figure 25. Full-length PRDM3 depletion is prevalent in MECOM-deficient solid tumors. (A) MECOM gene expression analysis in TCGA studies for patient-matched healthy (yellow) and tumor (blue) tissue samples. See Tables 7 and 8 for details. (*** q-value<10-9, ** q-value<10-4, * q-value<10-3, Wilcoxon rank-sum test, Bonferroni correction) (B) Protein domain diagram of MECOM transcripts, which contain the PR/SET domain (referred to as PRDM3 in green) or lack it (referred to as MDS1 in purple and EVI1 in orange). Note: uc003ffn.3, uc011bpk.1 and uc003ffi.3 encode an identical amino acid sequence and were grouped as uc003ffn.3*. (C) Principal component analysis of MECOM isoform expression levels from RNA-seq data comparing patient-matched healthy and tumor tissue samples. (D) Principal Component 2 (PC2) loading values of each MECOM isoform. Color coding of the ten protein-encoding transcripts correspond to (B), while uc003ffo.1 is a non-coding transcript, shown in grey.

105

Table 7. TCGA Patient information. The number of patient-matched health and tumor tissue samples from TCGA studies with >10 matched samples. (Related to Figure 25) TCGA study Matched Samples BLCA Bladder urothelial carcinoma 109 KIRC Kidney renal clear cell carcinoma 72 THCA Thyroid carcinoma 59 LUAD Lung adenocarcinoma 58 PRAD Prostate adenocarcinoma 52 LUSC Lung squamous cell carcinoma 51 LIHC Liver hepatocellular carcinoma 50 HNSC Head and Neck squamous cell carcinoma 43 STES Stomach and Esophageal carcinoma 43 COADREAD Colorectal adenocarcinoma 32 KIRP Kidney renal papillary cell carcinoma 32 KICH Kidney Chromophobe 25 BRCA Breast invasive carcinoma 19

Table 8. Statistical analysis of differential MECOM gene expression. Multiple hypothesis testing was addressed by a Bonferroni correction (q-value) of the Mann-Whitney U-test calculated p-values. (Related to Figure 25) TCGA study MECOM loss p-value q-value KIRC TRUE 2.32x10-13 3.48x10-12 BRCA TRUE 7.59x10-12 1.14x10-10 KIRP TRUE 4.66x10-10 6.99x10-09 PRAD TRUE 4.20x10-08 6.30x10-07 LUSC TRUE 1.15x10-06 1.72x10-05 LIHC FALSE 6.69x10-06 1.00x10-04 LUAD TRUE 6.98x10-06 1.05x10-04 THCA FALSE 1.42x10-05 2.14x10-04 KICH TRUE 4.18x10-03 6.26x10-02 HNSC TRUE 4.60x10-03 6.89x10-02 COADREAD TRUE 3.79x10-01 1.00 BLCA FALSE 4.41x10-01 1.00 STES TRUE 9.29x10-01 1.00

106

To examine the underlying features of altered MECOM gene expression, we compared the expression of each individual transcripts arising from the MECOM locus, across healthy tissue and tumor tissue samples. A total of 10 protein-encoding MECOM transcripts are annotated in the TCGA data (Figure 25B). The full-length PRDM3-coding transcript (uc011bpj.1) encodes a protein that possesses an N-terminal PR/SET domain followed by two Zinc finger (ZnF) arrays. Two shorter PRDM3-coding transcripts (uc010hwn.2 and uc003ffl.2) encode intact PR/SET domains, but harbor deletions in the C-terminal half of the protein. The six EVI1-coding transcripts lack the PR/SET domain (ΔPR/SET) but retain some or all of the ZnF arrays. One transcript derived from the MDS1 loci (uc011bpl.1) encodes a short protein truncated in the middle of the PR/SET domain and an additional non-coding transcript (uc003ffo.1) brings the total number of MECOM-gene derived transcripts to 11. Principal component analysis (PCA) was used to examine the expression of the 11 MECOM transcripts, contrasting healthy and tumor tissues from the TCGA studies that had a significant decrease in MECOM gene expression. The set of all 11 MECOM transcript expression levels for each patient’s tumor and normal tissue were used to compute principal component (PC) values. The PC1 and PC2 values were plotted to explore any potential clustering of the tumor and healthy tissue samples. Interestingly, PC2 distinguished between the majority of healthy and tumor tissue samples, wherein the majority of healthy tissue samples were primarily located along the negative PC2 axis, while most tumor samples localized on the positive axis (Figure 25C). To assess which MECOM transcripts accounted for the distinction between healthy and tumor samples within PC2, we examined the PC loading plot, which indicated that the expression of full-length PRDM3 (uc011bpj.1) was the largest contributor for samples appearing along the negative PC2 axis (Figure 25D). Taken together, these findings suggest that expression of the full-length PRDM3 transcript is a major contributor to distinguishing healthy from tumor tissue in renal, lung, prostate and breast carcinomas.

107

5.3.2 N-termini of Prdm3 and Prdm16 interact with the NuRD complex

Despite evidence for the tumour suppressive properties of full-length PRDM3 compared to the N-terminal truncated oncogenic EVI1, the function of the N-terminus remains unclear. Full- length PRDM3 and its closely related paralog PRDM16 possess an ~80 residue unstructured region preceding the PR/SET domain, which are both absent in the N-terminally truncated isoforms (referred to as ΔPR/SET). Although the PR/SET domains of PRDM3 and PRDM16 have been associated with weak, exonuclear lysine methyltransferase activity in the cytoplasm [87], a role in mediating protein-protein interactions has not been described. To investigate the potential role of this region, we performed mass spectrometry-coupled immunoprecipitation (IP-MS) assays using full-length mouse Prdm3 and Prdm16 proteins and their respective ΔPR/SET isoforms (Figure 26A). C-terminally-tagged GFP-fusion constructs of the full-length and ΔPR/SET isoforms were individually expressed in T47D ductal carcinoma cells and high affinity prey proteins were purified after with a stringent, high-salt immunoprecipitation protocol and then identified by mass spectrometry.

108

Figure 26. The N-termini of full-length PRDM3 and PRDM16 isoforms function as protein- protein interaction scaffolds. (A) Protein domain diagram of full-length (left) and ΔPR/SET (right) PRDM3 and PRDM16 isoforms used for co-immuno-precipitation with mass spectrometry. (B-C) Prey proteins associated with PRDM3 (B) and PRDM16 (C) by co-immuno-precipitation with the full-length (left) and ΔPR/SET (right) bait. Prey proteins scoring with a Bayesian false discovery rate ≤ 1% are ranked by decreasing Odds ratio relative to GFP control co-immuno- precipitation. Prey proteins shared between full-length and ΔPR/SET isoforms are indicated by color and peptide count relative to GFP control is indicated by size. NuRD complex members are outlined.

109

Gene ontologies were obtained from the lists of high confidence protein-protein interactions using the DAVID software (version 6.8), to identify enriched ontologies related to “cellular components”, and terms passing a Benjamini FDR cut-off of < 0.001 were compared. As expected, all the bait-prey interactions were enriched for terms associated with nuclear localization, while both the ΔPR/SET isoforms of Prdm3 and Prdm16 were also associated with exonuclear ontology terms such as “cytosol” and “extracellular exosome” (Figure 27). Interestingly, only the full-length isoforms were enriched for “NuRD complex”, in agreement with our IP-MS data (Figure 27). Indeed, we examined the highest ranked bait-prey interactions of the full-length and ΔPR/SET isoforms or Prdm3 and Prdm16 and found that the NuRD complex members were the most confident interactions with the full-length isoforms, while only RBBP4 and CHD4 were pulled down by the ΔPR/SET isoforms (Figure 26B and C).

110

Figure 27. The NuRD complex is enriched in full-length PRDM3 and PRDM16 prey. analysis of the cellular component terms associated with full-length and ΔPR/SET PRDM3 and PRDM16 isoforms. Heat map colour indicates the Benjamini corrected p-value calculated by DAVID v6.8. (Related to Figure 26)

111

To compare NuRD complex member associations between the full-length and ΔPR/SET isoforms, we plotted the spectral count fold-change over that of the GFP-control immunoprecipitation for Prdm3 and Prdm16 (Figure 28A and B). A four-fold peptide enrichment was observed for full-length over ΔPR/SET isoforms, for significantly called NuRD complex members common to both full-length and ΔPR/SET isoforms (e.g. RBBP4), while most NuRD complex members detected for the full-length isoforms fell bellow the significance threshold or were undetected for the ΔPR/SET bait proteins (Figure 28A and B). We performed co- immunoprecipitation with western blot analysis to directly contrast the interaction between full- length and ΔPR/SET Prdm3 and Prdm16 isoforms with members of the core NuRD complex. Western blot analysis confirmed that the full-length Prdm3 and Prdm16 isoforms associated with RBBP4, MTA1 and HDAC2, while associations with the ΔPR/SET isoforms were not detected (Figure 28C). To further validate this interaction for the endogenous human proteins, we performed reciprocal co-immunoprecipitations in high EVI1 expressing T47D cells and high PRDM3 expressing MCF7 cells. Western blot analysis after the co-immunoprecipitation of endogenous RBBP4 demonstrated that endogenous, human PRDM3 protein associated with RBBP4, while EVI1 was not detected (Figure 29). These findings indicate that the N-terminal region of full-length PRDM3 and PRDM16 are key mediators of the NuRD complex interaction.

112

Figure 28. The NuRD complex members are enriched interactors with full-length PRDM3 and PRDM16. (A-B) Spectral count fold change above control (log2) of the NuRD complex members that co-immuno-precipitated with (A) PRDM3 and (B) PRDM16 comparing ful-length (x-axis) and ΔPR/SET (y-axis) bait. (C) Immunoblotting of specific NuRD complex members following co-immunoprecipitation with full-length and ΔPR/SET isoforms of PRDM3 and PRDM16. 10% protein input is shown in the right-hand panel. GFP empty vector was used as a control (Ctrl). Blots shown are representative of 3 experiments. Dr. Evelyne Lima-Fernandes generated panel C.

113

Figure 29. Endogenous full-length PRDM3, but not EVI1 co-immunoprecipitate with RBBP4. Immunoblotting of PRDM3 and EVI1following co-immunoprecipitation with RBBP4 from (A) T47D cells and (B) MCF7 cells. 5% protein input is shown. (Related to Figure 28). Dr. Evelyne Lima-Fernandes generated this Figure.

114

5.3.3 N-termini of PRDM3 and PRDM16 bind directly to RBBP4

We hypothesized that the residues that facilitated the interaction with NuRD would be conserved between PRDM3 and PRDM16. As expected, the PR/SET domains are highly conserved among both the mouse and human proteins, as are two short sequence regions N- terminal to the PR/SET domains (Figure 30A). Interestingly, the first 12 residues of both proteins also share close sequence identity with the N-terminal residues of histone H3 (Figure 31A), which has been shown to interact with the RBBP4 Drosophila paralog Nurf55 [218]. To assess the potential role of the 12 aa peptide regions versus the PR/SET domains for NuRD complex interactions, we purified proteins comprised of the first ~200 residues of human PRDM3 and PRDM16 with and without these 12 residues and performed in vitro pull-down assays with purified recombinant RBBP4, which had the highest likelihood of interaction calculated by SAINTexpress for both full-length PRDM3 and PRDM16 among all the bait-prey interactions in our IP-MS datasets (Figure 28B and C). Recombinant RBBP4 only associated with the first ~200 N-terminal residues of PRDM3 and PRDM16 in vitro when the first 12 amino acids were also present (Figure 30B).

115

Figure 30. The first 12 amino acids of full-length PRDM3 and PRDM16 enable RBBP4 binding. (A) Sequence alignment of human and mouse amino acids exclusive to the full-length PRDM3 and PRDM16 constructs. The highly similar first 12 amino acids are enclosed in purple and the PR/SET domain is enclosed in orange. (B) SDS-PAGE separated proteins from in vitro pull-down assay of RBBP4 with human PRDM3 and PRDM16 constructs from panel A ±1-12 amino acids. (Related to Figure 32)

116

Figure 31. The RBBP4-Histone H3 interaction is disrupted by K4 methylation. (A) Sequence alignment of the first 12 amino acids or PRDM3, PRDM16 and H3. (B) Surface representation of RBBP4-paralog Nurf55 bound to Histone H3 peptide (PDB ID: 2yba). Lysine 4 (purple) is localized in a specific binding groove. (C) Dissociation constant (KD) and molar stoichiometry (N) of Histone H3 K4 tri-methylated peptides with RBBP4 by isothermal titration calorimetry. (Related to Figure 32)

117

We next performed peptide-binding assays by isothermal titration calorimetry (ITC) to assess the interaction between residues 1-12 of human PRDM3 and PRDM16 with full-length

RBBP4. Equilibrium dissociation constants (KD) with a 1:1 binding ratio between peptide and RBBP4 were calculated to be 2.96 ± 0.34 μM and 3.15 ± 0.36 μM for PRDM3 and PRDM16 peptides, respectively (Figure 32A and B). By comparison, a histone H3 peptide (residues 1-25) bound to human RBBP4 with a KD of 1.50 ± 0.38 μM (Figure 32C). Examination of the crystal structure of a histone H3 peptide with Nurf55, suggested that lysine 4 of H3 (H3K4) is important for RBBP4 interactions (Figure 31B). As a control for peptide binding to RBBP4, we tested the affinity of a 25-residue histone H3 peptide bearing a trimethylated lysine 4 (H3K4me3) and observed that the KD decreased by 10-fold relative to the unmodified peptide (Figure 31C). This suggested that the lysine conserved among histone H3, PRDM3 and PRDM16 may be an important contributor to the interaction.

To investigate the importance of the 12 N-terminal residues of the full-length isoforms in a cellular context, we assessed colocalization of PRDM3 with RBBP4 by employing a LacO/LacR chromatin immobilization assay. U2OS cells with a stably integrated 256 LacO array were transfected with C-terminally tagged PRDM3-mCherry-LacR-NLS with either wildtype (WT) PRDM3 or PRDM3 lacking the 12 N-terminal residues (PRDM3ΔN12) to assess for co-localization with co-expressed GFP-RBBP4. PRDM3-LacR fusion proteins localized at the LacO array as single red foci (Figure 32D; Figure 33). GFP-RBBP4 colocalized in clear foci with WT PRDM3- LacR, but not with PRDM3ΔN12 or the control (Figure 32D). When co-expressed with PRDM3ΔN12, GFP-RBBP4 produced a diffuse GFP signal in both the nucleus and the cytoplasm. In some cells, weak GFP foci could be spotted slightly above background GFP fluorescence in the PRDM3ΔN12- expressing cells, as well as in N-terminally tagged PRDM3ΔN12-expressing cells (Figure 33). Quantification of the GFP signal increase at the mCherry foci normalized to the background nuclear GFP fluorescence showed a significant increase in signal intensity with WT and mutant PRDM3 proteins (p<10-16, ANOVA/Dunnett analysis; Figure 32E). Taken together with the immunoprecipitation and affinity pull-down assays, these results show that the 12 N-terminal residues of full-length PRDM3 and PRDM16 are essential for the interaction with RBBP4 in a cellular environment. 118

Figure 32. The first 12 residues of PRDM3/16 interact with RBBP4. (A-C) In vitro measurement of dissociation constant (KD) and molar stoichiometry (N) of (A) PRDM3, (B) PRDM16 and (C) Histone H3 peptides with RBBP4 measured by isothermal titration calorimetry (experiment performed in triplicate, standard deviation is shown). (D) Cellular interaction between PRDM3 and RBBP4 measured by LacO/LacR chromatin immobilization assay. PRDM3- mCherry-LacR-NLS with wildtype (WT) PRDM3 or PRDM3 lacking the 12 N-terminal residues (ΔN12) were assessed for co-localization with GFP-RBBP4. (E) Quantification of the PRDM3 (WT or ΔN12) co-localization with RBBP4 shown in (D). The violin plots represent the GFP intensity increase over background in the mCherry foci, calculated from n = 100 cells. Elizabeth Henderson and Pavel Savitsky generated panels C and D.

119

Figure 33. Weak cellular co-localization of N-terminally truncated PRDM3 with RBBP4. (for Figure 32.) (A) C-terminally tagged PRDM3 LacO/LacR assay. FL-PRDM3- mCherry-LacR- NLS lacking the 12 N-terminal amino acids (ΔN12) co-localization with FL-GFP-RBBP4. Some cells show clearly no GFP signal enrichment over the mCherry foci (top), while other cells have intermediate enrichment (middle and lower panels). (B) N-terminally tagged mCherry-LacR-NLS- PRDM3 lacking the 12 N-terminal amino acids (ΔN12) co-localization with FL-GFP-RBBP4. Similar to the C-terminally tagged protein, GFP-RBBP4 is found to co-localize with PRDM3 in some cells, however the intensity increase on the GFP foci is smaller than in the case of WT PRDM3. (Related to Figure 32). Elizabeth Henderson and Pavel Savitsky generated this Figure.

120

5.3.4 Co-crystal structure of RBBP4 with PRDM3/PRDM16 N-terminal peptide

RBBP4 helps direct the NuRD complex onto chromatin via associations with the histone H3 tail, but transcription factors can also serve as an intermediary between the NuRD complex and chromatin by competing for occupancy at the histone H3 interface on RBBP4 [218] [220] [221] [222]. To examine the structural basis of the interaction between PRDM3 and PRDM16 with RBBP4, we determined the crystal structures of full-length RBBP4 (residues 1-425) bound to human PRDM3 and PRDM16 peptides (residues 1-12) at 2.2 Å and 2.0 Å, respectively (Figure 34A and B; Table 9). The nearly identical structures of the PRDM3 and PRDM16 peptides bound to RBBP4 revealed that both peptides bind to the electronegative histone H3-binding interface of RBBP4 in an extended conformation (Figure 34, Figure 35).

121

Figure 34. Crystal structure of RBBP4 in complex with the PRDM3 (1-12 amino acid) peptide. (A) Stick representation of PRDM3 (1-12) peptide [Oxygen atoms (red) and nitrogen (blue)] bound to the ribbon representation of RBBP4. (B) Electrostatic surface potential representation of the binding pocket with aligned PRDM3 peptide. RBBP4 surface colour indicates electrostatic potential ranging from -7kT/e (red) to +7kT/e (blue). Electrostatic surface potentials were calculated using the Adaptive Poisson-Boltzmann Solver (APBS).

122

Table 9. Data for X-ray Crystal Structure of PRDM3/16 1-12aa with RBBP4. PDB structure 6bw3 6bw4 RBBP4 and RBBP4 and Contents PRDM3(1-12aa) PRDM16(1-12aa) Data Collection Space group P 1 21 1 P 1 21 1 wavelength (Å) 1.54178 0.97918 a, b, c (Å) 75.98,59.82,101.77 76.03,59.86,101.84 α, β, γ (°) 90, 94.55, 90 90, 94.55, 90 Resolution (Å) 46.95 - 2.2 (2.27 - 2.2) 19.78 - 2.0 (2.05 - 2.0) Rmerge (%) 0.1 (0.1) 0.06 (0.7) I/σ(I) 2.44 2.75 Completeness (%) 99.4 99.2 Redundancy 4.1 (4.0) 3.9 (4.1)

Refinement Resolution (Å) 46.95 - 2.2 19.78 - 2.0 Reflections 44059 58442 Rwork/Rfree 0.215/0.246 0.199/0.233 Wilson B factor (Å2) 34.0 34.8 Protein atoms 5841 5877 Water molecules 110 112 Unidentified molecules 12 12

r.m.s.d. values Bond length (Å) 0.011 0.012 Bond angle (°) 1.48 0.95

Ramachandran values Favoured (%) 96.1 97.2 Allowed (%) 3.5 2.5 Outliers (%) 0.4 0.3

Values in parentheses are for highest-resolution shell. r.m.s.d. = root mean squared deviation

123

Figure 35. Crystal structure of RBBP4 in complex with the PRDM16 (1-12 amino acid) peptide. (A) Stick representation of PRDM16 (1-12) peptide [Oxygen (red) and nitrogen (blue)] bound to a ribbon representation of RBBP4. (B) Electrostatic surface potential representation of the binding pocket occupied with PRDM16 peptide. RBBP4 surface colour indicates electrostatic potential ranging from -7kT/e (red) to +7kT/e (blue). Electrostatic surface potentials were calculated using the Adaptive Poisson-Boltzmann Solver (APBS). (Related to Figure 34)

124

The peptide:RBBP4 interactions are similar to histone H3 on Nurf55 [218] and include salt bridges, hydrogen bonds, hydrophobic contacts and cation-π stacking (Figure 36A, Table 10). At the N-terminus, the positively charged amine groups of the Met1 residues of PRDM3 and PRDM16 form salt bridges with Asp248 and Glu231 on RBBP4 (Figure 36B). Prominently, the positively charged Arg2 and Lys4 sidechains orient towards the highly electronegative RBBP4 β- propeller axis where the Arg2 guanidinium moiety is sandwiched between the aromatic sidechains of Tyr181 and Phe321 forming stabilizing cation-π interactions, while additionally participating in a salt bridge and hydrogen bond network with Glu231, Arg129, Glu231 and Asn277 (Figure 36B). The cationic ɛ-amino group of Lys4 forms salt bridges with Glu179 and Glu126, and hydrogen bonds with Tyr181 and Asn128, with hydrophobic packing of the aliphatic component of the sidechain against the sidechain of Leu45 (Figure 36C). As reported from Thr2 of histone H3 on Nurf55 [218], Ser3 from PRDM3 and PRDM16 is solvent exposed and does not interact with RBBP4 directly. Weak electron density for the Arg6 and Arg8 sidechains was only apparent in some biological assemblies of PRDM3:RBBP4 and PRDM16:RBBP4, suggesting that the salt bridges of Arg6 with Asp74 and Arg8 with Glu75 and Glu42 may not be important contributors to the peptide:RBBP4 interaction (Figure 36D). The aliphatic and cationic regions of the Lys9 sidechain form hydrophobic and polar contacts with the aliphatic and anionic regions of the Glu41 sidechain, respectively, as well as a salt bridge with Glu75 (Figure 36E). The electron density of Leu10 was inadequate to distinguish the position of its sidechain or the subsequent resides (Figure 37), while the important interactions with RBBP4 occur with residues 1 to 9 of both peptides. To assess the validity of our structural model, we performed an in silico alanine scan of the PRDM3 peptide and calculated the difference in Gibbs binding energy between the wild type and mutant peptides (Figure 38A). The Lys4 sidechain had the largest theoretical contribution to RBBP4 binding and a K4A mutant peptide of PRDM3 (1-12) lacked detectable binding to RBBP4 as measured by ITC (Figure 38B).

125

Figure 36. Amino acid interactions at the PRDM3 peptide-RBBP4 interface. (A) The interface between PRDM3 (1-12) residues (tan) and RBBP4 residues (green). Interactions between PRDM3 centered at (B) arginine 2, (C) lysine 4, (D) arginine 6 and (E) lysine 9 are indicated with dashed lines. Interactions within 4 Å (purple) and 3 Å (yellow) are detailed in Table 10.

126

Table 10. Interaction distances at the interface of the PRDM3 N-terminal peptide and RBBP4. (Related to Figure 36) PRDM3 RBBP4 Interaction Distance (Å) M1 D248 salt bridge 2.5 D248 salt bridge 3.5 E231 salt bridge 2.8 R2 E231 salt bridge 3.7 R129 h-bond 4.0 E231 salt bridge 3.0 N277 h-bond 3.9 F321 cation-π ~4.5 Y181 cation-π ~4.6 K4 Y181 h-bond 3.4 E179 salt bridge 2.8 N128 h-bond 2.9 N128 h-bond 3.5 E126 salt bridge 2.9 G5 E395 h-bond 4.0 H71 h-bond 3.7 R6 D74 salt bridge 3.4 A7 S73 h-bond 3.1 R8 N397 h-bond 3.0 N397 h-bond 3.1 K9 E75 salt bridge 3.6 E41 salt bridge 3.6

127

Figure 37. Electron density maps of PRDM3 and PRDM16 peptides bound to RBBP4. (for Figure 34.) The simulated annealing mFo-DFc omit map of (A) PRDM3 (1-12) peptide and (B) PRDM16 (1-12) peptide contoured to 2.5σ are shown in purple. The peptide electron density of the final 2Fo-Fc maps for (C) PRDM3 peptide and (D) PRDM16 peptide contoured to 1.0σ are shown in orange. (Related to Figure 36)

128

Figure 38. A conserved lysine in PRDM3 and PRDM16 is crucial for RBBP4 binding. (A) Difference in Gibbs free energy of binding for a mutational alanine scan of the PRDM3 peptide bound to RBBP4 indicating lysine 4 (K4) was the leading contributing residue to binding. (B) Isothermal titration calorimetry for the PRDM3-K4A mutant peptide binding to RBBP4 indicates K4 is essential for peptide binding. (n.d. not detected)

129

Our structural elucidation of the PRDM3 and PRDM16 peptides bound to RBBP4 revealed a conserved interaction network common to other RBBP4 interacting proteins that binding perpendicular to the β-propeller axis. The N-terminal PRDM3 and PRDM16 peptides share higher sequence identity with the histone H3 peptide than all other peptides from transcription factors known to bind RBBP4 (Figure 39C). Sequence and structural alignments of RBBP4-binding peptides at the β-propeller axis clearly demonstrate the importance of the conserved arginine- lysine pair, which makes critical contacts at the highly electronegative groove on RBBP4 (Figure 39A-C). Notably, the small and polar Thr3 residue found between the arginine and lysine residues of histone H3 is replaced with a Ser in PRDM3 and PRDM16, while this residue is absent in all other previously identified binding peptides, (Figure 39B and C). These small and polar residues do not form direct interactions with RBBP4, but instead extend the peptide backbone to enable consistent sidechain orientation towards RBBP4 (Figure 39B). Additionally, an alanine residue is found at position 7 of histone H3 and PRDM3/16, which is a proline in all other structures, further exemplifying the peptide backbone similarity with histone H3 that is exclusive to PRDM3 and PRDM16 (Figure 39C).

130

Figure 39. PRDM3 and PRDM16 mimic the RBBP4-histone H3 interaction. (A) Structure alignment of all reported peptides that bind perpendicular to the RBBP4 β-propeller axis. (B) Structure alignment of arginine, threonine/serine and lysine from H3, PRDM3 and PRDM16 (top) and arginine and lysine from FOG1, PHF6, BCL11a and AEBP2 (bottom). (C) Sequence alignment of all reported peptides that bind to the RBBP4 top hole. Colours correspond to A and B panels. (D) A model illustrating a potential mechanism for how full-length PRDM3 and PRDM16 could tethering the NuRD complex to chromatin and subsequently regulate transcription. Dashed line and arrows indicate potential secondary interactions between MBD3 of NuRD and the first zinc finger motifs of PRDM3 and PRDM16.

131

5.4 Discussion

The Yin-Yang paradigm of oncogenesis describes a situation where an imbalance between an oncogene and a tumor suppressor encoded by the same gene gives rise to cancer [32]. The differential expression of short, oncogenic and long, tumor suppressive isoforms has been reported for several PRDM family members, including PRDM1, PRDM2, PRDM3, PRDM5 and PRDM16 [48] [73] [49] [77] [226]. It is well established that chromosomal aberrations at the PRDM3- encoding gene, MECOM, are found in up to 10% of AML cases with poor survival outcomes [199] [200]. The expression levels of specific MECOM isoforms suggests that the N-terminal residues of PRDM3 bestow a tumor suppressor function, while the oncogenic shorter EVI1 isoform is overexpressed in myeloid, ovarian, liver and colon tumors, and correlates with poor outcome in AML [198] [41] [199] [201] [202] [203] [71]. Indeed, this is supported by our comparative analysis of MECOM isoform expression between patient-matched tumor and healthy tissues and identified a previously unreported loss of full-length PRDM3 expression as a common feature of certain solid tumor types, such as renal, lung, prostate and breast carcinomas.

To further our understanding towards a potential disease mechanism, we searched for unique molecular characteristics of the full-length PRDM3 protein that may explain the Yin-Yang paradigm. We used IP-MS experiments to directly compare the potential protein-protein interactions between the full-length and ΔPR/SET isoforms of the PRDM3 and PRDM16 paralogs and present the molecular basis for a novel protein-protein interaction that is unique to the full- length isoforms. Specifically, the first ten N-terminal residues of full-length PRDM3 and PRDM16 directly engage the NuRD chromatin remodeling complex via the histone H3 binding interface on RBBP4. We propose a model in which this interaction recruits the NuRD complex to genomic loci specifically bound by the full-length protein isoforms of PRDM3 and PRDM16 (Figure 39D).

The full-length PRDM3 and PRDM16 proteins possess an N-terminal ~80 residue unstructured region followed by the PR/SET domain, which are both absent in the ΔPR/SET isoforms. While the PR/SET domains have been reported to have weak intrinsic KMT activity, 132

reports have been inconsistent in identifying either H3K4 or H3K9 as the substrate in either the nucleus or cytosol [56] [95] [87]. PRDM3 is also known to interact with established KMT enzymes such as G9a and SUV39H1 [227], which could potentially contribute trace KMT activity to preparations or PRDM3 that contain trace amounts of these robust enzymes. Moreover, both PRDM3 and PRDM16 lack a key catalytic tyrosine residue conserved in all other established SET- domain KMT enzymes (Figure 40). Therefore, we hypothesized that there may be an alternative activity attributed to the N-terminal region missing from oncogenic ΔPR/SET isoforms.

We focused on potential protein-protein interactions, which could account for molecular functions across the PR/SET domain as well as the preceding unstructured residues in the full- length proteins. Unlike previous studies, our IP-MS experiments are the first to directly compare the potential protein-protein interactions found for the full-length and ΔPR/SET isoforms of the PRDM3 and PRDM16 paralogs. We show that the previously demonstrated interactions between EVI1 and epigenetic factors like the CtBP proteins and the CSNK2A1 and CSNK2B components of the CK2 complex [212] [224] were also found for the full-length PRDM3 protein, as well as both the full-length and ΔPR/SET PRDM16 protein isoforms. Using a yeast two-hybrid screen, Spensberger et al. found that EVI1 interacts specifically with the NuRD complex member MBD3b [225]. However, in a subsequent study performed by Bard-Chapeau et al., who used proteomic IP- MS experiments in an ovarian cancer cell line, showed that while RBBP4, HDAC1, HDAC2 and CHD4 were associated with EVl1, MBD3b was not detected [224]. Through our proteomics IP- MS experiments, we found that while both the full-length and ΔPR/SET isoforms of the PRDM3 and PRDM16 proteins associated with some NuRD complex members like RBBP4 and CHD4, only the full-length isoforms were able to pull-down the complete NuRD complex. Furthermore, our co-IP experiments demonstrated that the associations between the ΔPR/SET isoforms and RBBP4 fall below the assay’s detection threshold. Additionally, the EVI1 protein has been demonstrated to form homo-oligomers [214], but it is unclear if EVI1 and PRDM3 can form hetero-oligomers. A hetero-dimerization event could explain how the ΔPR/SET proteins could indirectly associate with the NuRD complex through their longer counterparts, which would justify a weaker relative association with NuRD complex members. Taken together, our proteomics and structural data suggest that the full-length PRDM3 and PRDM16 proteins form a stable complex 133

with NuRD and therefore we hypothesize that the full-length isoforms can direct NuRD to specific genomic loci, while EVI1 and ΔPR/SET PRDM16 likely lack this ability due to a weaker association.

134

Figure 40. PRDM3 and PRDM16 lack the conserved catalytic tyrosine observed in SET domain structures of Michaelis complexes. A sequence logo corresponding to the catalytic tyrosine (yellow box) and flanking residues observed in all Michaelis complex structures (Protein Data Bank accession codes: 5lsu, 5upd, 3ooi, 5jlb, 4ynm, 2r3a, 3bo5, 4z4p, 5f6k, 5f5e, 3hna, 5vsc, 3rq4, 3s8p, 5ccl, 4wuy, 3f9x, 5hyn, 2f69, 3qxy, 3smt and 4c1q) aligned to the corresponding residues found the catalytically active PRDM9 enzyme, as well as PRDM3 and PRDM16.

135

The NuRD chromatin remodelling complex has been implicated in both promotion and suppression of tumorigenesis, growth and metastasis [215] [228]. This paradoxical behaviour emerges from its ability to epigenetically regulate genes for either tumor suppressors or oncogenic factors depending on specific associations with genomic localization factors. Multiple studies have demonstrated that the NuRD complex can associate with oncogenic transcription factors like BCL11A via RBBP4 in triple-negative breast cancer [222], TWIST via MTA2 to drive metastasis in carcinomas [229] and NAB2 via CHD4 in prostate cancers [230]. Additionally, NuRD also associates with tumor suppressing transcription factors like PHF6 [221] and SALL1 [219] via RBBP4, as well as c-JUN via MBD3 [231]. RBBP4 and RBBP7 are integral components of the core NuRD complex, serving as a scaffold for the MTA proteins, while functioning as chromatin recognition interfaces through binding to the histone H3 tail. A previous ChIP-seq analysis of CHD4 showed that the NuRD complex localized at genomic loci marked by trimethylated lysine 4 on histone H3 (H3K4me3), which is a post-translational modification incompatible with RBBP4/7 binding [232] [218]. Interestingly, BCL11A, PHF6, SALL1 and FOG1 all interact with NuRD by competing for RBBP4/7 at the histone H3-binding interface, thereby demonstrating how NuRD localization can depend on the balance of transcription factors available within specific nuclear environments.

Given that PRDM3 and PRDM16 directly interact with NuRD via RBBP4, it is noteworthy that we did not identify any other RBBP4/7 containing complexes in our IP-MS experiments. The RBBP4 and RBBP7 paralogs share a 90% sequence identity with effectively identical histone H3 binding interfaces. Other RBBP4-binding transcription factors like PHF6 and FOG1 have also been found to exclusively associate with NuRD [221] [220], while alternatively, BCL11A was demonstrated to bind RBBP4 in the NuRD, PRC2, and SIN3A complexes [222]. Structural studies have shown that when RBBP4 is found in PRC2, the histone H3 binding interface is engaged by either AEBP2 or SUZ12. While we observed that a histone H3 peptide binds roughly twice as tightly to RBBP4 than PRDM3 and PRDM16 peptides, competitive binding assays between BCL11A and histone H3 peptides suggest that BCL11A has a 3 times greater affinity for RBBP4, which may explain why it is capable of binding multiple RBBP4-containing complexes [222]. Furthermore, the potential of multivalent interactions between PRDM3 and PRDM16 with the 136

NURD complex remains unexplored. Studies have shown that the FOG1 binds to the MTA proteins and RBBP4 in NuRD [220] [223] and it would be likely that a conserved region between EVI1 and the ΔPR/SET PRDM16 proteins could facilitate this secondary interaction, perhaps on MBD3b. While we did not observe that MBD3 associated with either of the ΔPR/SET proteins, it is interesting that MBD3 deletions have been implicated in cancer progression [233].

In summary, we have identified an important function of the N-termini of PRDM3 and PRDM16, which bind directly to RBBP4 to facilitate an interaction with the NuRD chromatin remodeling complex. Our data is consistent with a model in which full-length PRDM3 and PRDM16 proteins function as transcriptional co-repressors by directing NuRD to specific genomic loci. The truncation of the N-terminus dramatically decreases the interaction, which warrants further investigation into the resulting transcriptional changes and could suggest a mechanism for the tumour-suppressive properties of full-length PRDMs through their interaction with RBBP4/NuRD.

137

5.5 Methods

5.5.1 Gene and isoform expression analysis

TCGA datasets for MECOM transcript expressions were obtained from the FireBrowse resource (http://firebrowse.org/), for all patients with matched healthy and tumor tissue samples. MECOM gene expression was estimated from the summation of RSEM normalized counts for each transcript and statistical significance was calculated with the R programming language (version 3.2) [37]. MECOM transcript expression levels from healthy and tumor samples from TCGA studies where MECOM expression was significantly decreased was used to perform a principal component analysis using the “prcom” function in the R environment.

5.5.2 Co-Immunoprecipitation and mass spectroscopy

T47D cells were transfected with GFP-tagged, murine Prdm3 or Prdm16 plasmids using Turbofect following manufacturer’s recommendations. 48 hours after transfection, cells were washed with ice-cold PBS, and lysed in High salt AFC buffer (10mM Tris-HCl, pH 7.9, 420mM NaCl, 0.1% NP-40), followed by 3 freeze-thaw cycles. After lysates were cleared by centrifugation (13,000 rpm for 20min at 4˚C), 5% of the lysates were saved for Western Blot input controls and 80% was used for the IP procedures.

For the immunoprecipitation step, the lysates were incubated with 5ug of anti-GFP antibody (Invitrogen) overnight at 4˚C. Magnetic protein A/G beads (Dynabeads, Life technologies), were mixed at 1:1 ratio (10uL of each per sample) and washed with low salt AFC buffer twice (10mM Tris-HCl, pH 7.9, 100mM NaCl, 0.1% NP-40). The beads were incubated 4 hours at 4˚C with the lysate/antibody mix and washed twice in low-salt AFC. The beads were washed a 3rd time in low-salt AFC in the absence of NP-4 and eluted in 0.5 M ammonium hydroxide, before flash freezing in liquid nitrogen.

138

For the MS procedure, samples were dried in a Speedvac concentrator (Eppendorf) and reconstituted in 44uL of 50 mM NH4HCO3 and 1 ul of 100mM TCEP-HCL. Samples were incubated at 37oC for 1 hour with shaking and cooled down to room temperature before adding 1 ul 500 mM iodoacetamide. Samples were incubated in the dark at room temperature for 45 minutes, then digested using 1 ug Trypsin (Promega) with overnight incubation at 37oC, and the reaction was stopped with 2ul of acetic acid. Samples were desalted using Zip-Tips (Millipore) following the manufacturer’s instructions and processed for MS using an Orbitrap Velos mass spectrometer (Thermo Fisher Scientific).

5.5.3 Co-IP validation experiment and Western blot

For co-IP experiment on transfected T74D cells, cell lysis and immunoprecipitation were performed as described in the IP-MS experiment, except for the 3rd wash and elution. The beads were washed a 3rd time in low salt AFC, resuspended in 30 μl of Laemmli buffer, and boiled for 5 minutes. The whole 30uL were processed for Western blot using antibodies for GFP (Living Colors JL-8, Clontech), RBBP4 (R&D systems, MAB7416-SP), MTA1 (CST, 5647) and HDAC2 (CST, 5113). Endogenous co-IP experiments were performed on T47D and MCF7 cells using the same methodology as described above, using 5ug of RBBP4 antibody (Abcam, ab79416) for the IP, and blotted with PRDM3/EVI1antibody (ab124934).

5.5.4 Mass Spectrometry analysis

SAINTexpress (v3.6.1) [234] was used to calculate the probability that identified proteins from the Immuno-precipitation and mass spectroscopy experiments were significantly enriched above background contaminants by comparing the expected and observed peptide counts between the GFP-tagged prey proteins and the GFP-only negative control (n=3). Prey proteins with a Bayesian false discovery rate (BFDR) of ≤1% where called as high confidence protein-protein 139

interactions (Supplementary Table 9) and sorted by odds ratio for each bait-prey interaction with the full-length and ΔPR/SET isoforms.

5.5.5 Cellular colocalization analysis

Full-length mouse Prdm3 cDNA (National Center for Biotechnology Information (NCBI) accession number NM_001361034) or a truncated mutant lacking the first twelve amino-acids were amplified by PCR and cloned into the AgeI site of the mCherry-LacR-NLS-KpnI plasmid (Luijsterburg et al., 2012) to generate a C-terminally-fused mCherry-LacR-NLS tag onto Prdm3. Full-length human RBBP4 cDNA (NCBI accession number NM_005610) was amplified by PCR and cloned into pDONR221 vector with Gateway BP Clonase II mix and then transferred into the pCDNA6.2/N-EmGFP-DEST vector with Gateway LR Clonase II mix (ThermoFisher Scientific). Constructs encoding an N-terminal mCherry-LacR-NLS tag onto Prdm3 were cloned into the KpnI site of the same vector. All constructs were verified by sequencing.

U-2 OS cells expressing 256 repeats of a LacO array (U-2 OS LacO Cells, previously described [235]) were cultured in DMEM media (Thermo Fisher Scientific, cat. #31966021) supplemented with 10% fetal bovine serum (v/v, Biosera, cat. #FB-1001/500) and antibiotics (1% Pencillin Streptomycin (Thermo Fisher Scientific, cat. #15070063), 1 µg/mL puromycin (Thermo Fisher Scientific, cat. #A1113803) at 37°C and 10% CO2.

U-2 OS LacO cells were washed with phosphate-buffered saline (PBS), trypsinized using 0.25% trypsin-EDTA (Thermo Fisher Scientific, cat. #25200056) for 3 min at room temperature and seeded in 24-well cell culture dishes (30,000 cells in 1 mL media per well). After incubating for 20 h at 37°C and 10% CO2, cells were transfected with plasmids using the FuGENE6 reagent (Promega, cat #E2311) according to the manufacturer’s instructions (100 ng per plasmid in 20 μL OptiMem (Thermo Fisher Scientific, cat. #31985070) with 1 μL FuGENE6 per well), then incubated for 24 h at 37°C and 10% CO2. Transfected cells were then washed with PBS, trypsinized, and re-plated onto 8-well glass imaging chamber dishes (Thermo Fisher Scientific, cat. #155411PK) in 300 µL media. Cells were incubated for 20-24 h at 37°C and 5% CO2 before 140

fixing. Cells were washed with PBS twice before fixing at room temperature with 4% formaldehyde (300 µL in PBS) for 10 min, then washed twice with PBS, and were subsequently incubated with 0.2% Triton-X100 (300 µl in PBS) for 5 min at room temperature. Cells were washed with PBS three times, then incubated with a DAPI solution (300 µL of 1 µg/mL from a 1 mg/mL stock; Thermo Fisher Scientific, cat. #62248) for 2 min at room temperature. Cells were washed twice with PBS, and then glycerol buffer (300 µl, 90% glycerol, 10% 20 mM Tris-HCl pH 8.0) was added to the cells. Cells were stored at room temperature, away from light and were imaged within one week of fixing.

Cells were imaged using a Zeiss LSM 710 scan-head (Zeiss GmbH, Jena, Germany) coupled to an inverted Zeiss Axio Observer Z1 microscope equipped with a high-numerical- aperture (N. A. 1.40) 63× oil immersion objective (Zeiss GmbH, Jena, Germany). A 488 nm excitation laser and a 494-542 nm emission filter were used to detect GFP fluorescence. A 594 nm excitation laser and a 598-700 nm emission filter were used to detect mCherry fluorescence. Quantification was carried out using the ImageJ software. Co-localization enrichment was calculated using the following formula:

퐺퐹푃푓표푐푖 (퐺퐹푃푁푢푐푙푒푢푠 − 퐺퐹푃푓표푐푖 ) 퐼푛푡푒푛푠푖푡푦 − 퐼푛푡푒푛푠푖푡푦 퐼푛푡푒푛푠푖푡푦 퐺퐹푃푓표푐푖 (퐺퐹푃푁푢푐푙푒푢푠 − 퐺퐹푃푓표푐푖) 퐸푛푟𝑖푐ℎ푚푒푛푡 (%) = 퐴푟푒푎 퐴푟푒푎 푎푟푒푎 × 100 푚푎푥(퐺퐹푃퐼푛푡푒푛푠푖푡푦)

5.5.6 Protein expression and purification

Full-length human RBBP4 (residues 1–425) was cloned into the baculovirus expression vector pFBOH-LIC (GenBank EF456740) encoding an N-terminal His6 tag and a tobacco etch virus protease cleavage site. The protein was expressed in Sf9 cells cells infected with baculovirus using the Bac-to-Bac expression methodology (Invitrogen). The harvested cells were lysed by rotation in lysis buffer containing 50mM Hepes pH7.5, 300mM NaCl, 5% glycerol, 0.3% NP40, 141

protease inhibitor cocktail (Roche) and 1000 U of benzonase, followed by centrifugation. The clarified lysate was loaded onto cobalt resin equilibrated with lysis buffer and incubated for 1.5 hours with agitation at 4oC. The beads were washed with 50 column volumes of 50mM Hepes pH7.5, 300mM NaCl, 5% glycerol, 10mM Imidazole. The bound protein was eluted using 50mM Hepes pH7.5, 300mM NaCl, 5% glycerol, 300mM Imidazole. To cleave the His-tag, eluted protein was dialyzed into dialysis buffer (50mM Hepes pH7.5, 300mM NaCl, 5% glycerol, 2mM CaCl) with 500 U of Thrombin overnight at 4oC with agitation. The cleaved protein was purified by ion exchange chromatography using a Source S column with an AKTA Explorer (GE Healthcare) and a linear, buffered salt gradient between Buffer A: 20mM Hepes, 2.5 % glycerol and buffer B: 20mM Hepes, 2.5 % glycerol, 1 M NaCl. Fully homogeneous protein was dialyzed back into dialysis buffer.

5.5.7 Structural determination

Cleaved RBBP4 protein was concentrated to 10.0 mg/ml and combined in a 1:1.2 molar ratio with synthetic human PRDM3 or PRDM16 peptide (1-12; Genscript) at 4 °C. The complex was crystallized from sitting drops by vapour diffusion from a 1:1 mixture of protein solution and crystallization buffer (20% [PRDM3] or 22% [PRDM16] PEG3350, 0.2M sodium malonate pH 7, 0.1M BisTris pH 6). Protein crystals were cryoprotected in well solution containing 33% glycerol in mother liquor. Diffraction data for RBBP4:PRDM3 was collected at 100 K using a Rigaku FR- E Superbright rotating anode home source with a Rigaku SATURN A200 detector at a wavelength of 1.54178 Å, while data for RBBP4:PRDM16 was collected at 100 K from a synchrotron source at the Advanced Photon Source from beamline 24-ID-E with an ADSC QUANTUM 315 detector at a wavelength of 0.92819 Å. All data was processed and scaled with XDS [145] and Aimless [146]. Initial phases were estimated by molecular replacement using PhaserMR [147] from a known structure of RBBP4 (PDB 4r7a). The structural models were refined using REFMAC5 [148] and manually checked with COOT [149]. Each asymmetric unit of the crystals contained two binary peptide:RBBP4 complexes with global identical quaternary architecture. The highly similar PRDM3:RBBP4 and PRDM16:RBBP4 asymmetric units superimposed with a root mean 142

square deviation (r.m.s.d.) of 0.13 Å over 696 Cα atoms, while the PRDM3:RBBP4 and PRDM16:RBBP4 biological assemblies superimposed to the paired, non-crystallographic symmetry mate with r.m.s.d. values of 0.16 Å and 0.18 Å, respectively. The full β-propeller structure of RBBP4 encompassing all seven WD40 repeats and the N-terminal α-helix were modeled into the electron density, but like other deposited structures, residues 1-3, 89-112, 356- 359 and 411-425 could not be modeled due to inadequate electron density at these predicted disordered positions. Electrostatic surface potentials were calculated using the Adaptive Poisson- Boltzmann Solver (APBS) plug-in in PyMOL [236]. Images were generated using PyMOL (The PyMOL Molecular Graphics System, v1.7.4, Schrödinger, LLC.).

5.5.8 Isothermal titration calorimetry (ITC)

For ITC measurements of full-length human RBBP4 (residues 1–425) interactions with PRDM3 and PRDM16 peptides (1-12) and histone H3 peptides (1-25), RBBP4 was dialyzed into a buffer containing 20mM Hepes, 150mM NaCl and 2.5 % glycerol. Peptides were suspended in the same buffer to a final concentration of 0.50 mM. A preliminary peptide injection of 0.06 μl was followed by subsequent 2-μl injections into the sample cell containing 167 μl of 50 μM RBBP4. The reported KD and n values are based on the average from three experiments and are accompanied by the standard deviation of the measurements. The data were acquired on a Nano ITC from TA Instruments at 25 °C and fitted with an independent-binding site model using NanoAnalyze software (v3.7.0).

5.5.9 In silico alanine scan

Calculation of the changes in theoretical binding free energy upon alanine mutation of each residue of the PRDM3 peptide bound to RBBP4 (pdb: 6bw3) was carried out using ICM-Pro (Molsoft) [237]. Briefly, the protein complex was processed to repair missing sidechains and add hydrogens followed by energy minimization to relieve possible atomic clashes. ΔGibbs free energy of binding was calculated for the wild-type PRDM3 peptide and all 10 alanine scan mutants. ΔΔGibbs free energy of binding was determined by ΔG(Mut)-ΔG(WT) 143

Chapter 6 Characterization of BTRC/FBXW11 binders

6.1 Authorship attribution statement.

This chapter contains the unpublished work completed during a MITACS-funded internship at Boehringer Ingelheim in Vienna, Austria. I am responsible for the generation and presentation of all data included in this chapter, with help from the following people: Drs. Leonhard Geist and Moriz Mayer generated the fluorine compound library and provided technical support for the NMR assays. BTRC-SKP1 protein for the NMR assays was kindly provided by Taraneh Hajian in Dr. Masoud Vedadi’s group at SGC. Technical support for protein crystallization was provided by Barbara Muellauer, Alexander Weiss-Puxbaum and Dr. Jark Boettcher. Technical support for the SPR assay was provided by Maximilian Scharnweber. This project was conceptualized by group leaders from the SGC Toronto and Boehringer Ingelheim. Drs. Cheryl Arrowsmith and Jark Boettcher advised the design of this study and Dr. Jark Boettcher guided and supervised the study while in Vienna.

6.2 Preamble

This is to inform the reader that this chapter does not directly support the hypothesis or objective of this thesis, pertaining to the structural and function elucidation of PRDM proteins, but rather serves to complement the reader’s understanding of WDR-domain containing proteins examined in Chapter 4, and to explore a new direction related to chemical probe development introduced in Chapters 2 and 3. As I demonstrated previously, WDR-domain proteins act as protein interaction scaffolds. WDR-proteins comprise one of the largest groups of biologically diverse proteins in humans and have become an enticing target for pharmacological development [238]. Several WDR-domains are associated with E3 ligases, including BTRC and FBXW11 which are involved in marking proteins for proteasome-mediated degradation. The discovery of 144

chemical matter that binds to E3 ligases is an ongoing pursuit critically important for the development of a new class of chemical probes and drugs, called PROTACs [239]. Therefore, I submit this auxiliary thesis chapter to complement key themes pertaining to chemical probe development and the structural biology of WDR domains that were discussed in the preceding chapters.

6.3 Introduction

Inhibiting a protein’s biochemical function is a common approach employed by many pharmacological agents, including antagonists, monoclonal antibodies, enzyme inhibitors and disruptors of protein-protein interactions. These inhibitory molecules apply an occupancy- based strategy that can be prone to shortcomings, which novel and developing inhibition strategies are capable of addressing [reviewed in [240], [241]]. For instance, cells can adapt to the presence of an occupancy-based inhibitory molecule by overexpressing the target, while inhibitor concentrations in stoichiometric excess of the target protein can result in undesirable off-target affects. Mutations or post-translational modifications on a target protein may abolish the inhibitor’s ability to bind, while overexpression of a target’s natural ligand may be able to outcompete for the inhibitor molecule’s binding site. Furthermore, inhibitors of a protein’s enzymatic activity may not necessarily block its role as an interaction scaffold or vice versa. In contrast, pharmacological agents that abolished the target protein may be capable of overcoming some, or all, of these limitations.

Targeted protein degradation represents a paradigm for the active removal of a disease- relevant target. Here a bivalent molecule tethers a targeted protein of interest (POI) to a component of an endogenous protein degradation pathway. To degrade the POI, molecules such as proteolysis targeting chimeras (PROTACs) and immunomodulatory imide drugs (IMIDs) utilize non-native E3 ubiquitin ligase recruitment and subsequent proteasome targeting, while autophagy-targeting

145

chimeras (AUTACs), autophagosome-tethering compounds (ATTECs) and lysosome targeting chimeras (LYTACs) employ various shuttling pathways for lysosome targeting [reviewed in [242]]. The ubiquitin–proteasome pathway is the principal method for reducing abundance of intracellular proteins. In this pathway, ubiquitin is covalently shuttled from an E1 ubiquitin- activating enzyme to an E2 conjugating enzyme and then ligated (either directly or indirectly) to its substrate which is recognized by an E3 ubiquitin ligase. The initial ubiquitin’s carboxy terminus forms an iso-peptide bond with the amine of a lysine side chain on the target protein, while subsequent ubiquitination produces a polyubiquitin adduct that is recognized by the 26S proteasome. PROTACs are bivalent molecular tethers that hijack this pathway by simultaneously binding to a specific E3 ligase and a POI. This PROTAC-dependent ternary complex triggers polyubiquitination of the POI and consequent proteasome-mediated degradation (Figure 41a).

146

Figure 41: Targeting the BTRC and FBXW11 paralogs for PROTAC handles. (a) Steps of PROTAC mediated degradation of a protein of interest (POI). (b) Surface representation of the BTRC-containing SCF-complex (pdbID: 6ttu). (c) Structure representation of the BTRC WDR domain highlighting substrate binding pocket (pdbID: 1p22). (d) BTRC, FBXW11, FBXW7 amino acid sequence alignment for residues within the substrate binding pocket.

147

PROTACs possess certain advantages and extended applications beyond typical occupancy-based inhibitor molecules [reviewed in [240], [241] [239]]. First, PROTACs function catalytically, recycling after polyubiquitination of each POI and are therefore effective at sub- stoichiometric dosages, unlike most inhibitors. While this helps to limit the likelihood of off-target binding events, the formation of off-target ternary complexes is also less likely to occur in catalytically competent orientations necessary for polyubiquitination. Next, PROTACs remove the entire protein rather than inhibit a single function or domain. The benefits of this ability to eliminate both enzymatic and nonenzymatic functions of a POI are exemplified by a focal adhesion kinase (FAK) degrader, PROTAC-3. When compared to the kinase inhibitor defactinib, which is currently in clinical trials to treat a diverse group of cancers, PROTAC-3 was demonstrated to be more effective at blocking migration and invasion of a prostate cancer cell line [243]. These advantages may allow PROTACs to overcome drug resistance in cancers as described for B-cell malignancies harboring an ibrutinib-resistant C481S mutation in Bruton's tyrosine kinase (BTK) that are susceptible to a BTK targeting PROTAC [244] [245]. Additionally, the antagonist enzalutamide was shown to be less effective than its PROTAC derivative, ARCC-4, across several cellular models of prostate cancer proliferation and drug resistance [246]. Lastly, PROTACs, along with the other forms of targeted protein degradation, have greatly expanded the scope for actionable disease-relevant proteins by enabling targeting of the "undruggable" proteome. For example, many disease-relevant proteins such as oncogenic transcription factors or scaffolding proteins lack an active site but possess binding pockets that may be amenable to PROTAC development [247] [239].

The first published PROTAC molecule consisted of an IκBα phospho-peptide that bound to the WDR domain of BTRC in the Skp1-Cullin-F box (SCF) E3 ligase complex. By linking this peptide to a small molecule binder of methionine aminopeptidase-2 (MetAP-2), this PROTAC, named Protac-1, induced ubiquitination and subsequent degradation of MetAP-2, which is not a physiological substrate of the SCF complex [248]. Peptide-based PROTACs owe their poor cell penetration to a high molecular weight and their low potency and cellular half-life to the reactivity of labile peptide bonds. The next generation of PROTACs used small molecule “handles” to bind

148

E3 ligases, which were then linked to the target protein. The first cell permeable, small molecule- based PROTAC consisted of an androgen receptor ligand (SARM) linked by polyethylene glycol to an MDM2 ligand (nutlin) [249]. Subsequently, several PROTACs have been developed using MDM2, VHL, CRBN, IAP, KEAP1, RNF4 and RNF114 as E3 ligase to degrade various target proteins [241]. The early success in small molecule PROTAC development has spurred billions of dollars in investment in biotech and pharmaceutical companies in recent years [250] and currently the first orally available PROTAC drug, called ARV-110, which targets the androgen receptor, is being investigated in a phase 1 clinical trial for patients with metastatic and castration-resistant prostate cancer (ClinicalTrials.gov ID: NCT03888612).

PROTAC development is complicated by the structural complexity of the ternary complex central to PROTAC activity. Compatibility between a particular E3 ligase and the desired POI is not guaranteed, requiring iterative testing and optimization of various combinations of the E3 and target protein binding moieties “chemical handles”, with variable linker chemistries. This is exemplified by PROTAC MZ1, which combines the pan-BET selective bromodomain inhibitor JQ1 with a binder of von Hippel-Lindau (VHL) E3 ligase. Although MZ1 binds to the bromodomains of BRD2, BRD3 and BRD4 with similar affinities, MZ1 selectively induced the removal of BRD4 over BRD2 and BRD3 [251]. Subsequent structural and biophysical elucidation demonstrated that MZ1 is able to promote favourable intermolecular interactions for the cooperative formation of BRD4-MZ1-VHL ternary complex [252]. Additional biological considerations for PROTAC design include ensuring that the E3 ligase and POI share the same expression profiles, as well as picking an E3 ligase resilient against disruptive mutations or loss of expression. Currently, all PROTAC molecules utilize ~1% of the >600 human E3 ligases (e.g. MDM2, VHL, CRBN, IAP, KEAP1, RNF4 and RNF114). Given that the development of a functional PROTAC molecule requires matching a POI to a compatible E3 ligase, further PROTAC discovery is limited by the number of available drug-like chemical ligands to E3s (E3 handles).

Discovery of additional E3 ligase handles would promote further PROTAC development by expanding the diversity of available ligases that may be compatible with a POI. An ideal E3

149

ligase would have high activity and expression and would possess some mechanism to prevent the development of drug resistance in cells and tumors. Two E3 ligases that fit these criteria are the BRTC and FBXW11 paralogs. BTRC and FBXW11 are interchangeable components of the SCF (Skp1-cullin-F-box protein ligase) E3 ligase complex. Structural elucidation of BTRC complexes have shown that they possess an N-terminal F-box domain and a C-terminal WDR domain. The F-box domain interacts directly with the Skp1 protein and forms the core of an SCF complex, while the WDR domain is responsible for substrate recruitment by recognition of the phospho- serine containing substrate destruction motif (or degron) D-pS-G-X-X-pS [253] [254] (Figure 41b-c). A proof-of-principle study showed that BTRC is amenable for a peptide-based PROTAC [248], and a recent study has reported several small molecule substrate-recognition enhancers (referred to as “molecular glues”) [255]. These molecular glues are distinct from PROTAC molecules, as they enhance the affinity of the SCF complex for its physiological substrate even if it lacks a required phospho-serine modification. These studies demonstrated that BTRC is amenable to small-molecule ligand screening and PROTAC discovery.

Like BTRC, FBXW11 also regulates the ubiquitination of phosphorylated substrates and importantly, the WDR domains share a 91% sequence identity at the amino acid level. Furthermore, the substrate binding pockets share 100% sequence identity, while the closely related FBXW7 only shares 35% sequence identity among these residues (Figure 41c-d). This raises the intriguing possibility of small-molecule ligands that selectively bind the substrate recognition pockets of both BTRC and FBXW11. Co-opting both proteins could hinder the occurrence of drug resistance due to evolved mutations at the protein-ligand interface of one protein. Therefore, a PROTAC utilizing both BTRC and FBXW11 simultaneously may lead to a more effective PROTAC molecule. Here we provide experimental evidence to support the discovery of small- molecule ligands for the WDR domains of BTRC and FBXW11 to be used as E3 ligase handles for the development of novel PROTAC molecules. We present (1) a fragment compound library screen to identify binders to the substrate binding region of BTRC and (2) a novel characterization of FBXW11 with a di-phosphorylated β-catenin peptide.

150

6.4 Results

6.4.1 Identifying BTRC substrate pocket binders

The BTRC-SKP1 complex has been structurally determined and small molecules that bind cooperatively with the substrate peptide have been identified [254] [253] [255]. To identify small molecules that bind to BTRC and compete for the substrate binding pocket, we performed a screen of fluorine-containing compounds using nuclear magnetic resonance (NMR) spectroscopy and then validated hits by reassessing binding in the presence of a di-phosphorylated β-catenin peptide

(Figure 42a). Libraries containing CF3 and CF groups were split into pools of 6 to 16 compounds and 19F NMR signals were collected in the presence and absence of BTRC-SKP1 using 100 ms and 500 ms relaxation delays. 19F NMR signals that exhibited a decreased intensity when in the presence of BTRC-SKP1 were considered hits (Figure 42b-c). Next, the 48 initial hits were individually validated by reassessing binding to BTRC-SKP1. We found that 39 compounds exhibited reproducible 19F NMR signal decreases, while the remaining 9 compounds were likely to be false positives. Lastly, we assessed whether the decreased signal intensity could be reversed from the addition of the substrate di-phosphorylated β-catenin peptide, which could localize compounds that bind within the substrate binding pocket (Figure 42d). We found that 18 compounds demonstrate peptide-competitive binding, while 21 compounds likely bind to BTRC- SKP1 beyond the substrate binding pocket. From a library of 2156 fluorine containing compound fragments, we identified 18 peptide competitive hits for further characterization.

151

Figure 42: Finding BTRC-binders from a 19F library screen. (a) Flow chart of the 19F library screen, hit validation and binding site characterization. CF3 and CF libraries contained compounds with 6 to 18 heavy atoms (not including fluorine). (b) Example spectra of a hit from a pool of compound fragments in the CF3 library screen, with (c) a zoom in of the hit peaks. (d) Example spectra of a hit validation with peptide displacement. NMR spectra with 500ms and 100 ms relaxation delays shows at top and bottom, respectively. Overlaid spectra are shown with a 0.2 ppm offset for visual clarity.

152

6.4.2 Structural characterization of FBXW11

Given the high sequence identity between BTRC and FBXW11, we sought to produce a protein construct of FBXW11 amenable to structural and biophysical assays that could be used in tandem with the established BTRC construct to find common small molecule binders. Attempts to express and purify any of the 3 WDR domain constructs of FBXW11 (residues 211-542, 216-542 and 219-542) were unsuccessful, but a FBXW11-SKP1 complex with construct boundaries corresponding to a previously solved BTRC-SKP1 structure was purified. We discovered that the FBXW11-SKP1 complex crystallized under multiple conditions in the presence and absence of the di-phosphorylated pSer33/pSer37 β-catenin peptide. All protein crystals diffracted to a resolution of ~6 Å, except one crystal containing FBXW11-SKP1 and the pSer33/pSer37 β-catenin peptide that diffracted to 2.5 Å (Figure 43a, Table 11). This confirmed that the FBXW11-SKP1 peptide-bound complex is amenable to structural characterization, suggesting that similar methodology may serve as a foundation for further characterization with small molecule binders.

We determined the structure of FBXW11-SKP1 in complex with a pSer33/pSer37 β- catenin peptide to gain an understanding of the substrate binding pocket in complex with its native ligand. Unlike any of the previously solved crystal structures of BTRC-SKP1 (PDBID: 1p22, 6m90, 6m91, 6m92, 6m93 and 6m94), our structure of the FBX11-SKP1-peptide complex was solved with 3 biological assemblies in the asymmetric unit (Figure 43b). As expected, the quaternary structure of the FBXW11-SKP1-peptide complex matched all previously observed BTRC-SKP1-peptide complexes, with the pSer33/pSer37 β-catenin peptide bound to the donut- hole of the WDR domain on FBXW11, which is connected by a linker α-helix to the F-box domain that forms complex with SKP1 (Figure 43c).

153

Figure 43: FBXW11-SKP1 in complex with a pSer33/pSer37 β-catenin peptide. (a) The protein crystal containing FBXW11-SKP1 and peptide grown in a 0.4 µl volume. (b) The three biological assemblies, distinctly coloured, in the asymmetric unit. (c) Labeled components of a biological assembly complex containing FBXW11, SKP1 and β-catenin. The F-box and WDR domain, and linker helix of FBXW11 are indicated. (d-e) Different FBXW11-SKP1-peptide biological assemblies superimposed at SKP1 with 90o horizontal rotation.

154

Table 11: Crystallographic data collection and refinement statistics for FBXW11-SKP1 complex. PDBID: 6wnx Data collection Space group P 1 21 1 Wavelength (Å) 1.00004 Cell dimensions a, b, c (Å) 119.7, 81.9, 133.2 α, β, γ () 90, 92.9, 90 Resolution (Å) 66.5 - 2.50 (2.59 - 2.50) Rmerge 0.223 (1.48) < I/σ(I)> 1.44 (at 2.50 Å) CC1/2 0.984 (0.289) Completeness (%) 62 Redundancy 6.4 Refinement Resolution (Å) 2.50 No. reflections 55225 (245) Rwork / Rfree 0.206/0.251 Wilson B factor (Å2) 40.6 No. atoms Protein 13336 Ligand/ion 38 Water 277 B-factor (Å2) Average 60.1 Macromolecules 60.64 Ligands 64.25 R.m.s. deviations Bond lengths (Å) 0.014 Bond angles () 1.93

Values in parentheses are for highest-resolution shell.

155

In the SKP, Cullin, F-box-containing E3 ubiquitin ligase complexes (SCF complexes), both BTRC and FBXW11 bind indirectly to Cullin-1 via SKP1.We aligned the structures of the observed 3 biological assemblies of SKP1 to understand if the individual assemblies exhibited any structural diversity. We observed that while the SKP1 molecules aligned similarly, FBXW11 and the β-catenin peptide progressively diverged from the F-box domain (Figure 43d-e). We measured the distances of the linker α-helices of FBXW11 and found that they spanned 11.0 Å at the C- terminal end on H115 (Figure 43d). Based on the divergent character of the FBXW11 alignments, we suspect that the peptides may have an even larger displacement. The interatomic displacements of phosphate atoms conjugated to S33 were 19.1, 17.9 and 5.5 Å, while the phosphate atoms on S37 spanned distances of 12.9, 12.8 and 5.8 Å (Figure 43e). These findings indicate that under crystallization conditions, FBXW11 adopts conformational variabilities that may indicate potential conformational dynamics at the substrate binding interface within the SCF complex.

On the β-catenin protein, serine 37 and then serine 33 are sequentially phosphorylated by glycogen synthase kinase 3 (GSK3), thereby enabling the formation of a protein-protein interaction with the SCF complex that leads to ubiquitination and degradation. We examined the interface between FBXW11 and the pS33/pS37 β-catenin peptide to identify how phosphorylation of β- catenin promotes binding. The pS33/pS37 β-catenin peptide forms a network of hydrogen bonds, salt bridges and hydrophobic interactions with FBXW11 (Figure 44a, Table 12). Notably, half of the 20 interactions that we identified involved either S33 or S37. Additionally, S33 makes 2 more hydrogen bonding interactions than S37, suggesting that phosphorylation of S33 makes a great contribution to the protein-protein interaction. To test this, we used surface plasmon resonance (SPR) to measure the affinity of FBXW11-SKP1 towards β-catenin peptides modified with either pS33/pS37, pS33/S37 or S33/pS37 using surface plasmon resonance (SPR). As expected, the pS33/pS37 peptide had the smallest dissociation constant (KD) of 1.8 ± 0.2 µM (mean ± standard deviation) (Figure 44b). Next, we found that phosphorylation of S33 contributed to the majority of the total affinity (KD of 2.9 ± 0.3 µM), while the KD of the phosphorylated S37 peptide could not be determined. Our finding that the phosphorylation of S33 is more important than S37 for the stabilization of the protein-protein interaction is consistent with other reports regarding BTRC and β-catenin [255], suggesting that small molecules that bind to the S33 binding site of BTRC and FBXW11 could be able to compete with this native interaction and hijack the E3 ligase in order to

156

target non-native substrates. This is in contrasts to the “molecular glues”, that enhance the affinity of the mono-phosphorylated substrate β-catenin [255]. Interestingly, all the reported molecular glue compounds bind in the highly conserved substrate binding pocket (Figure 41d) and are at positions of perfect sequence conservation between BTRC and FBXW11 [255]. Therefore, it is likely that the molecular glue compounds exert the same effect on FBXW11 as they do upon BTRC.

157

Figure 44. Binding characterization of pSer33/pSer37 β-catenin peptide on FBXW11. (a) Protein-peptide interactions indicating hydrogen bonds and salt bridges are represented as green and yellow dashed lines, respectively. Chemical interaction distances are available in Table 12. SPR sensorgram (top) and binding curve (bottom) of sensor-immobilized FBXW11-SKP1 and the β-catenin peptide (residues LDSGIHSGA) with (b) pS33/pS37, (c) pS33/S37 and (d) S33/pS37 serine phosphorylation. Mean ± standard deviation KD values from 3 replicates are shown.

158

Table 12. Protein-peptide interactions between FBWX11 and β-catenin. Related to Figure 44a. Values in parentheses indicate distance between heavy atoms in Å. Distances of water mediated hydrogen bonds (H-bonds) between protein and peptide are indicated with a W.

Hydrogen Salt Hydrophobic bonds bridges interactions L31 D32 Y348(2.31), R381(3.90) R334(3.67) Y131(3.03), Y131(2.81), pS33 S169(2.58), S185(2.64), R145(4.39) S169(2.78W2.80) G34 R334(4.03) I35 A294(3.86), L332(3.96) H36 N254(2.92), N254(2.97) A252(3.45) G292(2.80), S308(2.63), pS37 R291(3.92) R270(3.77W2.84) G38 G268(3.27) A39

159

6.5 Discussion

The generation of novel E3 handles will facilitate the discovery and development of PROTACs. Attenuation of PROTAC function due to mutations or silencing of the participating E3 ligase can be accounted for by using two E3 ligases simultaneously. BTRC and FBXW11 are highly similar E3 ligase paralogs with identical substrate binding pockets (Figure 41d). Here we presented data to support the overall goal of discovering a small molecule binder of both BTRC and FBXW11 that may be utilized in the development of future PROTAC molecules. Specifically, we generated FBXW11 protein construct suitable for screening small molecule binders and we have begun a screening campaign using a compound fragment library to identify BTRC binders.

The initial focus of our study was towards the development of a protein construct of FBXW11 that could enable structure-based compound screening methods such as NMR or x-ray crystallography. We elected to focus our attention on the ~37 kDa WDR domain, which would be suitable for heteronuclear single quantum coherence (HSQC) spectroscopy-based library screening, as well as potentially compound soaking in protein crystals. In parallel to work carried out at the SGC Toronto that focused on the WDR domain of BTRC, we sought to express and purify 3 different constructs encoding WDR domain of FBXW11 (residues 211-542, 216-542 and 219-542). We were unable to purify any WDR FBXW11 constructs from expression attempts using E. coli cells and baculovirus-insect cell expression system. For our attempts in E. coli, we experimented with variables including IPTG induction and autoinduction medias, and growth temperature (15oC to 22oC). Additionally, we assessed protein expression using N-terminal affinity tags including a polyhistidine-tag (6xHis-tag), a -tag, and a maltose-binding protein tag (MBP-tag). We found that 6xHis-tagged WDR proteins expressed better in E. coli cells than Myc or MBP tagged proteins, however screening of buffers, pH ranges, salt concentrations and surfactants during the elution stage of 6xHis-tagged WDR constructs did not result in soluble protein. We assessed cellular lysates to determine if WDR protein was forming inclusion bodies, but we were unable to detect enrichment of any of the affinity tags using Western blot analysis. Therefore, to express and purify any FBXW11 protein for further study, we derived FBXW11 and SKP1 constructs corresponding to a crystal structure of BTRC and SKP1 [254].

160

The purified FBXW11-SKP1 complex was stable at concentrations suitable for protein crystallization (~12mg/ml) and screened for crystal growth in 480 crystallization buffers. The FBXW11-SKP1 complex crystallized in 3 unique buffer conditions, as well as 15 other buffered conditions in the pSer33/pSer37 β-catenin peptide. Although all of the conditions with apo structures produced proteins with diffracted x-rays at low resolution (>6 Å), future work to optimize the crystal growth conditions could yield crystals that diffract near atomic resolution (< 2.0 Å) suitable for fragment screening. We solved the crystal structure of FXBW11-SKP1 in complex with a pSer33/pSer37 peptide and found that crystal grown under the same conditions could reproduce the same diffraction resolution (~2.5 Å). Examination of the crystal packing showed that the peptides in asymmetric unit localized near solvent channels and may be amenable to compound-induced displacement. Our collaborators working at the SGC Toronto had previously identified a group of small molecules capable of competing with a pSer33/pSer37 peptide off of BTRC, FBXW11 and FBXW7. Soaking FXBW11-SKP1-peptide crystals using the most potent of these compounds did not result in the displacement of any peptide electron density or the presence of any new density on FBXW11. Follow-up studies attempting to co-crystalize FBXW11-SKP1 with peptide-competitive compounds in the absence of peptide during crystallization may be more successful for determining small molecule-bound FBXW11 structures.

In our 19F compound fragment library screening campaign, we identified 18 hits that appeared to interact with BTRC in a peptide-competitive manner. Therefore, these hits behave in contrast to another class of CF3-containing compounds that have been identified, which enhance the affinity of a S37A mutant β-catenin to BTRC [255]. Previous structural elucidation of these “molecular glue” compounds demonstrated that the molecular glues enhance binding by providing an additional stabilizing interface between either WT or S37A β-catenin and BTRC.

Our peptide affinity measurements by SPR indicate that phosphorylation of S33 on β- catenin is more important than S37, in agreement with previous findings [255]. Therefore, compounds that bind to FBXW11 near the interface with S33 on β-catenin would likely provide a better starting point for optimization of a peptide competitive binder. Further validation of the 18

161

hits from the 19F library screen with BTRC and FBXW11 using a mono-phosphorylated pS33/S37 peptide could provide important information in this regard. Furthermore, structural information will be necessary to progress with rational compound design. In addition to optimizing the crystallization protocol for apo-FBXW11-SKP1 mentioned before, selective labeling protocols for NMR may be able to provide a useful platform for further structure-based screening. One possibility is the selectively carbon-13 and nitrogen-15 labeling of tyrosine residues ([13C,15N]- Tyr), which has previously been achieved in high incorporation yield (>96%) using a baculovirus- insect cell expression system for ubiquitin [256]. Tyrosine is the fourth least abundant amino acid encoded in our FBXW11 construct, which contains 12 tyrosine residues, while the corresponding BTRC contains 13 tyrosine residues. Of these tyrosine residues, only 2 are within 14.0 Å of the bound β-catenin, only Y131 on BTRC and Y271 on FBXW11 come within 6.0 Å of pS33 at ~2.0 Å. Therefore, while we were unable to purify a WDR FBXW11 protein construct suitable for HSQC-based screening, selective tyrosine labeling could provide a useful avenue to screen for E3 ligase handles in the peptide-binding sites of FBXW11-SKP1 and BTRC-SKP1.

162

6.6 Methods

6.6.1 Protein production

FBXW11 (114-542) containing an N-terminal 6xHis tag and TEV cleavage site along with a double deletion mutant SKP1 (1-37,44-70,82-163) were cloned in pET28-MHL donor plasmids. The plasmids were each transformed into DH10Bac Competent E. coli cells (Invitrogen) and recombinant Bacmid DNA was purified using a Plasmid Mini Kit (QIAGEN). Sf9 cells grown in serum containing Insect-Xpress (Bio Whittaker) media were used to grow P1 and P2 viral stocks. Hi5 insect cells (2x109 /L) were grown in serum-free Insect-Xpress (Bio Whittaker) media with penicillin, streptomycin, fungizone and L-glutamine were co-infected with 3 ml of each P2 virus (FBXW11 and SKP1) per 1 L of Hi5 cells and grown shaking at 27 ° C for 48 hours when viability dropped to 70-80%. Insect cells were pelleted and frozen at -80oC until protein purifications. Cells were resuspended in in lysis buffer (20 mM NaH2PO4 pH 8.0, 150 mM NaCl, 10% glycerol, 0.5% TRITON-X100, 2mM TCEP, 50 U/mL Benzonase nuclease and protease inhibitor cocktail (cOMPLETE ROCHE)) [80mL buffer per 1L culture] by rotating with a magnetic stir bar on ice for 60 minutes. Cells were lysed by dounce homogenizer and then lysate was clarified by centrifugation (40-60 min, 33000xg). The lysate was incubated with Ni-NTA agarose excel (QIAGEN) for 2 hours [2mL bead bed volume per 1L culture] at 4oC. Beads were washed with >10 column volumes of wash buffer (20 mM NaH2PO4 pH 8.0, 150 mM NaCl, 10% glycerol, 10 mM Imidazole, 2 mM TCEP), and eluted with elution buffer (wash buffer + 250 mM imidazole [pH 8]). Eluate was dialysis into Tris buffer (20 mM TRIS pH 8.0, 150 mM NaCl, 2 mM TCEP, 10% glycerol) overnight at 4oC. Protein solution was concentrated and purified with a MonoQ 10/100 column (GE) and eluted by a linear salt gradient up to into high salt buffer (20 mM TRIS pH 8.0, 1 M NaCl, 2 mM TCEP, 10% glycerol). Fractions containing FBXW11-SKP1 complexes were concentrated and further purified by size exclusion chromatography (Superdex200 HiLoad 16/600, GE) into final buffer (20 mM TRIS pH 8.0, 150 mM NaCl, 2 mM TCEP, 10% glycerol). Protein of the BTRC-SKP1 complex was produced from a similar baculovirus expression system

163

and was kindly provided by Taraneh Hajian and Dr. Masoud Vedadi from the SGC Toronto

6.6.2 19F-NMR library screen

BTRC-SKP1 was buffered in D2O containing Tris pH 7.5 and 100 mM NaCl. NMR spectra were recorded on a Bruker Avance III HD 600 MHz spectrometer equipped with a cryogenically cooled 5mm TCI probe (TCI 600 H&F-C/N-D). NMR samples contained a mixture 19 of 6 to 16 F-containing fragments at a final concentration of either 50 µM or 25 µM each, of CF3 and CF containing molecules, respectively. Each fragment recorded individually for follow-up hit assignment. Each fragment pool was recorded without protein (reference spectrum) and with 1.0 µM protein. Fragment hits were individually validated with protein at initial concentrations, with 19 and without 10 µM peptide (Ac-LDpSGIHpSGA-NH2). A F-T2 filter experiment was used entailing a CPMG scheme (τCP – 180° – τCP) with τCP = 0.005 s. Each sample was recorded twice with two different CPMG delays (τCP = 500 ms & τCP = 100 ms). Each spectrum was acquired with 128 scans and an acquisition time of 480 ms. Processing and analysis were carried out with Topspin 3.5 (Bruker BioSpin).

6.6.3 Crystallization and structure determination

190 µM FBXW11-SKP1 was incubated with 950 µM di-phosphorylated β-catenin peptide o (Ac-LDpSGIHpSGA-NH2) for 30 minutes at 4 C. An hexagonal-shaped crystal was obtained using the sitting-drop, vapour-diffusion method by mixing 0.2 µl protein solution (20 mM TRIS pH 8.0, 150 mM NaCl, 2 mM TCEP, 10% glycerol) with 0.2 µl of reservoir solution (120 mM ethylene glycol, 100 mM Morpheus buffer 1 at pH 6.5 and 30% GOL_P4K [Hampton]) at 20 °C. After 3 days of growth, the crystal was flash frozen in liquid nitrogen. Data was generated at the Swiss Light Source (Paul Scherrer Institute) synchrotron on beamline X06SA using a wavelength of 1.00004 Å at 100 K and the crystal diffracted to 2.5 Å. Data was collected on a DECTRISEIGER X 16M detector. Data was processed and scaled with XDS [145] and Aimless [146]. The crystal

164

belonged to space group P1211 and contained three biological assemblies per asymmetric unit. The structure was solved by molecular replacement using a previously solved BTRC-SKP1 structure (PDBID 1p22) trimmed at differing residues as the search model using the program PhaserMR [147]. The structural models were refined using REFMAC5 [148] and manually checked with COOT [149]. The Ramachandran values were 91%, 8% and 1% for favored, allowed and outliers, respectively. Images were generated using PyMOL (The PyMOL Molecular Graphics System, v2.2.0, Schrödinger, LLC.). Statistics for the respective data collection and refinement can be found in Table 11.

6.6.4 Surface plasmon resonance

Peptide binding assays were performed using a Biacore T200 (GE Health Sciences). Immobilization of 3000 response units (RU) of His-tagged FBXW11-SKP1 was achieved using Ni-conjugation followed by lysine coupling to flow cells of an SA chip as per the manufacturer’s protocol. An empty flow cell was used for reference subtraction. All interaction experiments are performed at 20 °C in 10mM HEPES pH 8.0, 150mM NaCl, 2mM TCEP, 0.005% TWEEN, 1% DMSO. Data analysis and sensorgrams were generated using Biacore T200 evaluation software. Dissociation constants were calculated from binding curves fitted using a 1:1 interaction model.

165

Reference

[1] S. Demirci, N. Uchida and J. Tisdale, "Gene therapy for sickle cell disease: An update.," Cytotherapy, vol. 20, no. 7, pp. 899-910, 2018.

[2] J. Butterfield, K. Hege, R. Herzog and R. Kaczmarek, "A Molecular Revolution in the Treatment of Hemophilia.," Mol Ther, vol. 28, no. 4, pp. 997-1015, 2020.

[3] J. Shendure, G. Findlay and M. Snyder, "Genomic Medicine-Progress, Pitfalls, and Promise.," Cell, vol. 177, no. 1, pp. 45-57, 2019.

[4] C. Arrowsmith, C. Bountra, P. Fish, K. Lee and M. Schapira, "Epigenetic protein families: a new frontier for drug discovery.," Nat Rev Drug Discov, vol. 11, no. 5, pp. 384-400, 2012.

[5] S. Ilango, B. Paital, P. Jayachandran, P. Padma and R. Nirmaladevi, "Epigenetic alterations in cancer.," Front Biosci (Landmark Ed), vol. 25, no. 1, pp. 1058-109, 2020.

[6] K. Murray, "The occurrence of epsilon-n-methyl lysine in histones.," Biochemistry, vol. 3, no. 1, pp. 10-5, 1964.

[7] B. Strahl and C. Allis, "The language of covalent histone modifications.," Nature, vol. 403, no. 6765, pp. 41-5, 2000.

[8] S. Rea, F. Eisenhaber, D. O'Carroll, B. Strahl, Z. Sun, M. Schmid, S. Opravil, K. Mechtler, C. Ponting, C. Allis and T. Jenuwein, "Regulation of chromatin structure by site-specific histone H3 methyltransferases.," Nature, vol. 406, no. 6796, pp. 593-9, 2000.

[9] K. Hyun, J. Jeon, K. Park and J. Kim, "Writing, erasing and reading histone lysine methylations.," Exp Mol Med, vol. 49, no. 4, p. e324, 2017.

[10] E. Blanco, M. González-Ramírez, A. Alcaine-Colet, S. Aranda and L. Di Croce, "The Bivalent Genome: Characterization, Structure, and Regulation," Trends Genet, vol. 36, no. 2, pp. 118-31, 2020.

[11] R. Ambler and M. Rees, "Epsilon-N-Methyl-lysine in bacterial flagellar protein.," Nature, vol. 184, no. 1, pp. 56-7, 1959.

[12] J. Huang, L. Perez-Burgos, B. Placek, R. Sengupta, M. Richter, J. Dorsey, S. Kubicek, S. Opravil, T. Jenuwein and S. Berger, "Repression of p53 activity by Smyd2-mediated methylation.," Nature, vol. 444, no. 7119, pp. 629-32, 2006.

[13] E. Greer and Y. Shi, "Histone methylation: a dynamic mark in health, disease and inheritance.," Nat Rev Genet, vol. 13, no. 5, pp. 343-57, 2012.

[14] M. Luo, "Chemical and Biochemical Perspectives of Protein Lysine Methylation," Chem Rev, vol. 118, no. 14, pp. 6656-705, 2018.

166

[15] M. Schapira, "Structural Chemistry of Human SET Domain Protein Methyltransferases.," Curr Chem Genomics, vol. 5, no. S1, pp. 85-94, 2011.

[16] P. Laurino, Á. Tóth-Petróczy, R. Meana-Pañeda, W. Lin, D. Truhlar and D. Tawfik, "An Ancient Fingerprint Indicates the Common Ancestry of Rossmann-Fold Enzymes Utilizing Different Ribose- Based Cofactors.," PLoS Biol, vol. 14, no. 3, p. e1002396, 2016.

[17] Y. Shi, F. Lan, C. Matson, P. Mulligan, J. Whetstine, P. Cole, R. Casero and Y. Shi, "Histone demethylation mediated by the nuclear amine oxidase homolog LSD1.," Cell, vol. 119, no. 7, pp. 941-53, 2004.

[18] H. Kaniskan, M. Martini and J. Jin, "Inhibitors of Protein Methyltransferases and Demethylases.," Chem Rev, vol. 118, no. 3, pp. 989-1068, 2018.

[19] L. Aravind and L. Iyer, "Provenance of SET-domain histone methyltransferases through duplication of a simple structural unit.," Cell Cycle, vol. 2, no. 4, pp. 369-76, 2003.

[20] R. Fick, G. Kroner, B. Nepal, R. Magnani, S. Horowitz, R. Houtz, S. Scheiner and R. Trievel, "Sulfur- Oxygen Chalcogen Bonding Mediates AdoMet Recognition in the Lysine Methyltransferase SET7/9.," ACS Chem Biol, vol. 11, no. 3, pp. 748-54, 2016.

[21] S. Horowitz, J. Yesselman, H. Al-Hashimi and R. Trievel, "Direct evidence for methyl group coordination by carbon-oxygen hydrogen bonds in the lysine methyltransferase SET7/9.," J Biol Chem, vol. 86, no. 21, pp. 18658-63, 2011.

[22] J. Couture, G. Hauk, M. Thompson, G. Blackburn and R. Trievel, "Catalytic roles for carbon-oxygen hydrogen bonding in SET domain lysine methyltransferases.," J Biol Chem, vol. 281, no. 28, pp. 19280-7, 2006.

[23] P. Hu and Y. Zhang, "Catalytic mechanism and product specificity of the histone lysine methyltransferase SET7/9: an ab initio QM/MM-FE study with multiple initial structures.," J Am Chem Soc, vol. 128, no. 4, pp. 1272-8, 2006.

[24] G. Schotta, R. Sengupta, S. Kubicek, S. Malin, M. Kauer, E. Callén, A. Celeste, M. Pagani, S. Opravil, I. De La Rosa-Velazquez, A. Espejo, M. Bedford, A. Nussenzweig, M. Busslinger and T. Jenuwein, "A chromatin-wide transition to H4K20 monomethylation impairs genome integrity and programmed DNA rearrangements in the mouse.," Genes Dev, vol. 22, no. 15, pp. 2048-61, 2008.

[25] M. Schapira, "Chemical Inhibition of Protein Methyltransferases.," Cell Chem Biol, vol. 23, no. 9, pp. 1067-76, 2016.

[26] S. Hoy, "Tazemetostat: First Approval.," Drugs, vol. 80, no. 5, pp. 513-21, 2020.

[27] S. Knutson, N. Warholic, T. Wigle, C. Klaus, C. Allain, A. Raimondi, M. Porte Scott, R. Chesworth, M. Moyer, R. Copeland, V. Richon, R. Pollock, K. Kuntz and H. Keilhack, "Durable tumor regression in

167

genetically altered malignant rhabdoid tumors by inhibition of methyltransferase EZH2.," Proc Natl Acad Sci U S A, vol. 110, no. 19, pp. 7922-7, 2013.

[28] H. Rugo, I. Jacobs, S. Sharma, F. Scappaticci, T. Paul, K. Jensen-Pergakes and G. Malouf, "The Promise for Histone Methyltransferase Inhibitors for Epigenetic Therapy in Clinical Oncology: A Narrative Review.," Adv Ther. , vol. 37, no. 7, pp. 3059-82, 2020.

[29] D. Dilworth and D. Barsyte-Lovejoy, "Targeting protein methylation: from chemical tools to precision medicines.," Cell Mol Life Sci, vol. 76, no. 15, pp. 2967-85, 2019.

[30] C. Fog, G. Galli and A. Lund, "PRDM proteins: important players in differentiation and disease.," Bioessays, vol. 34, no. 1, pp. 50-60, 2012.

[31] T. Hohenauer and A. Moore, "The Prdm family: expanding roles in stem cells and development.," Development, vol. 139, no. 13, pp. 2267-82, 2012.

[32] S. Mzoughi, Y. Tan, D. Low and E. Guccione, "The role of PRDMs in cancer: one family, two sides.," Curr Opin Genet Dev., vol. 36, pp. 83-91, 2016.

[33] M. Vervoort, D. Meulemeester, J. Béhague and P. Kerner, "Evolution of Prdm Genes in Animals: Insights from Comparative Genomics.," Mol Biol Evol, vol. 33, no. 3, pp. 679-96, 2016.

[34] I. Fumasoni, N. Meani, D. Rambaldi, G. Scafetta, M. Alcalay and F. D. Ciccarelli, "Family expansion and gene rearrangements contributed to the functional speciation of PRDM genes in vertebrates," BMC Evol Biol, vol. 7, no. 187, 2007.

[35] H. Wu, J. Min, V. Lunin, T. Antoshenko, L. Dombrovski, H. Zeng, A. Allali-Hassani, V. Campagna- Slater, M. Vedadi, C. Arrowsmith, A. Plotnikov and M. Schapira, "Structural biology of human H3K9 methyltransferases.," PLoS One, vol. 5, no. 1, p. e8570, 2010.

[36] S. Huang, G. Shao and L. Liu, "The PR domain of the Rb-binding zinc finger protein RIZ1 is a protein binding interface and is related to the SET domain functioning in chromatin-mediated gene expression.," J Biol Chem, vol. 273, no. 26, pp. 15933-9, 1998.

[37] R Core Team, "R: A language and environment for statistical computing.," R Foundation for Statistical Computing, Vienna, Austria., 2014. [Online]. Available: http://www.R-project.org/.

[38] K. Morishita, D. Parker, M. Mucenski, N. Jenkins, N. Copeland and J. Ihle, "Retroviral activation of a novel gene encoding a zinc finger protein in IL-3-dependent myeloid leukemia cell lines.," Cell, vol. 54, no. 6, pp. 831-40, 1988.

[39] A. Keller and T. Maniatis, "Identification and characterization of a novel repressor of beta- interferon gene expression.," Genes Dev, vol. 5, no. 5, pp. 868-79, 1991.

168

[40] I. Buyse, G. Shao and S. Huang, "The retinoblastoma protein binds to RIZ, a zinc-finger protein that shares an epitope with the adenovirus E1A protein.," Proc Natl Acad Sci U S A, vol. 92, no. 10, pp. 4467-71, 1995.

[41] S. Fears, C. Mathieu, N. Zeleznik-Le, S. Huang, J. Rowley and G. Nucifora, "Intergenic splicing of MDS1 and EVI1 occurs in normal tissues as well as in myeloid leukemia and produces a new member of the PR domain family.," Proc Natl Acad Sci U S A, vol. 93, no. 4, pp. 1642-7, 1996.

[42] X. Yang and S. Huang, "PFM1 (PRDM4), a new member of the PR-domain family, maps to a tumor suppressor locus on human chromosome 12q23-q24.1.," Genomics, vol. 61, no. 3, pp. 319-25, 1999.

[43] M. Clifton, B. Westman, S. Thong, M. O'Connell, M. Webster, N. Shepherd, K. Quinlan, M. Crossley, G. Blobel and J. Mackay, "The identification and structure of an N-terminal PR domain show that FOG1 is a member of the PRDM family of proteins.," PLoS One, vol. 9, no. 8, p. e106011, 2014.

[44] D. Zannino and C. Sagerström, "An emerging role for prdm family genes in dorsoventral patterning of the vertebrate nervous system.," Neural Dev, vol. 10, no. 24, pp. 0.1186/s13064-015-0052-8, 2015.

[45] A. Casamassimi, M. Rienzo, E. Di Zazzo, A. Sorrentino, D. Fiore, M. Proto, B. Moncharmont, P. Gazzerro, M. Bifulco and C. Abbondanza, "Multifaceted Role of PRDM Proteins in Human Cancer.," Int J Mol Sci., vol. 21, no. 7, p. E2648, 2020.

[46] A. Shaffer, K. Lin, T. Kuo, X. Yu, E. Hurt, A. Rosenwald, J. Giltnane, L. Yang, H. Zhao, K. Calame and L. Staudt, "Blimp-1 orchestrates plasma cell differentiation by extinguishing the mature B cell gene expression program.," Immunity, vol. 17, no. 1, pp. 51-62, 2002.

[47] M. Shapiro-Shelef, K. Lin, L. McHeyzer-Williams, J. Liao, M. McHeyzer-Williams and K. Calame, "Blimp-1 is required for the formation of immunoglobulin secreting plasma cells and pre-plasma memory B cells.," Immunity, vol. 19, no. 4, pp. 607-20, 2003.

[48] L. Pasqualucci, M. Compagno, J. Houldsworth, S. Monti, A. Grunn, S. Nandula, J. Aster, V. Murty, M. Shipp and R. Dalla-Favera, "Inactivation of the PRDM1/BLIMP1 gene in diffuse large B cell lymphoma.," J Exp Med, vol. 203, no. 2, pp. 331-7, 2006.

[49] Q. Deng and S. Huang, "PRDM5 is silenced in human cancers and has growth suppressive activities.," Oncogene, vol. 23, no. 28, pp. 4903-10, 2004.

[50] N. Tsuneyoshi, T. Sumi, H. Onda, H. Nojima, N. Nakatsuji and H. Suemori, "PRDM14 suppresses expression of differentiation marker genes in human embryonic stem cells.," Biochem Biophys Res Commun, vol. 367, no. 4, pp. 899-905, 2008.

[51] N. Nishikawa, M. Toyota, H. Suzuki, T. Honma, T. Fujikane, T. Ohmura, T. Nishidate, M. Ohe-Toyota, R. Maruyama, T. Sonoda, Y. Sasaki, T. Urano, K. Imai, K. Hirata and T. Tokino, "Gene amplification and overexpression of PRDM14 in breast cancers.," Cancer Res, vol. 67, no. 20, pp. 9649-57, 2007.

169

[52] E. Dettman, S. Simko, B. Ayanga, B. Carofino, J. Margolin, H. 3. Morse and M. Justice, "Prdm14 initiates lymphoblastic leukemia after expanding a population of cells resembling common lymphoid progenitors.," Oncogene, vol. 30, no. 25, pp. 2859-73, 2011.

[53] T. Zhang, L. Meng, W. Dong, H. Shen, S. Zhang, Q. Liu and J. Du, "High expression of PRDM14 correlates with cell differentiation and is a novel prognostic marker in resected non-small cell lung cancer.," Med Oncol, vol. 30, no. 3, p. e605, 2013.

[54] J. Gell, J. Zhao, D. Chen, T. Hunt and A. Clark, "PRDM14 is expressed in germ cell tumors with constitutive overexpression altering human germline differentiation and proliferation.," Stem Cell Res, vol. 27, pp. 46-56, 2018.

[55] Y. Chen, M. Auer-Grumbach, S. Matsukawa, M. Zitzelsberger, A. Themistocleous, T. Strom, C. Samara, A. Moore, L. Cho, G. Young, C. Weiss, M. Schabhüttl, R. Stucka, A. Schmid, Y. Parman, L. Graul-Neumann, W. Heinritz, E. Passarge, R. Watson and e. al., "Transcriptional regulator PRDM12 is essential for human pain perception.," Nat Genet, vol. 47, no. 7, pp. 803-8, 2015.

[56] J. Hanotel, N. Bessodes, A. Thélie, M. Hedderich, K. Parain, B. Van Driessche, O. Brandão Kde, S. Kricha, M. Jorgensen, A. Grapin-Botton, P. Serup, C. Van Lint, M. Perron, T. Pieler, K. Henningfeld and B. EJ, "The Prdm13 histone methyltransferase encoding gene is a Ptf1a-Rbpj downstream target that suppresses glutamatergic and promotes GABAergic neuronal fate in the dorsal neural tube.," Dev Biol, vol. 386, no. 2, pp. 340-57, 2014.

[57] A. Sorrentino, A. Federico, M. Rienzo, P. Gazzerro, M. Bifulco, A. Ciccodicola, A. Casamassimi and C. Abbondanza, "PR/SET Domain Family and Cancer: Novel Insights from the Cancer Genome Atlas.," Int J Mol Sci, vol. 19, no. 10, p. e3250, 2018.

[58] F. Baudat, J. Buard, C. Grey, A. Fledel-Alon, C. Ober, M. Przeworski, G. Coop and B. de Massy, "PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice.," Science, vol. 327, no. 5967, pp. 836-40, 2010.

[59] S. Myers, R. Bowden, A. Tumian, R. Bontrop, C. Freeman, T. MacFie, G. McVean and P. Donnelly, "Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination.," Science, vol. 327, no. 5967, pp. 876-9, 2010.

[60] L. Ségurel, E. Leffler and M. Przeworski, "The case of the fickle fingers: how the PRDM9 zinc finger protein specifies meiotic recombination hotspots in humans.," PLoS Biol, vol. 9, no. 12, p. e1001211, 2011.

[61] K. Hayashi, K. Yoshida and Y. Matsui, "A histone H3 methyltransferase controls epigenetic events required for meiotic prophase.," Nature, vol. 438, no. 7066, pp. 374-8, 2005.

[62] P. Oliver, L. Goodstadt, J. Bayes, Z. Birtle, K. Roach, N. Phadnis, S. Beatson, G. Lunter, H. Malik and C. Ponting, "Accelerated evolution of the Prdm9 speciation gene across diverse metazoan taxa.," PLoS Genet, vol. 5, no. 12, p. e1000753, 2009.

170

[63] A. Houle, H. Gibling, F. Lamaze, H. Edgington, D. Soave, M. Fave, M. Agbessi, V. Bruat, L. Stein and P. Awadalla, "Aberrant PRDM9 expression impacts the pan-cancer genomic landscape.," Genome Res, vol. 28, no. 11, pp. 1611-1620, 2018.

[64] J. Feichtinger, I. Aldeailej, R. Anderson, M. Almutairi, A. Almatrafi, N. Alsiwiehri, K. Griffiths, N. Stuart, J. Wakeman, L. Larcombe and R. McFarlane, "Meta-analysis of clinical data using human meiotic genes identifies a novel cohort of highly restricted cancer-specific marker genes.," Oncotarget, vol. 3, no. 8, pp. 843-53, 2012.

[65] J. Hussin, D. Sinnett, F. Casals, Y. Idaghdour, V. Bruat, V. Saillour, J. Healy, J. Grenier, T. de Malliard, S. Busche, J. Spinella, M. Larivière, G. Gibson, A. Andersson, L. Holmfeldt, J. Ma, L. Wei, J. Zhang, G. Andelfinger, J. Downing and e. al., "Rare allelic forms of PRDM9 associated with childhood leukemogenesis.," Genome Res, vol. 23, no. 3, pp. 419-30, 2013.

[66] E. Woodward, M. Olsson, B. Johansson and K. Paulsson, "Allelic variants of PRDM9 associated with high hyperdiploid childhood acute lymphoblastic leukaemia.," Br J Haematol, vol. 166, no. 6, pp. 947-9, 2014.

[67] Y. Zhang, S. Stehling-Sun, K. Lezon-Geyda, S. Juneja, L. Coillard, G. Chatterjee, C. Wuertzer, F. Camargo and A. Perkins, "PR-domain-containing Mds1-Evi1 is critical for long-term hematopoietic stem cell function.," Blood, vol. 118, no. 14, pp. 3853-61, 2001.

[68] K. Tokita, K. Maki and K. Mitani, "RUNX1/EVI1, which blocks myeloid differentiation, inhibits CCAAT-enhancer binding protein alpha function.," Cancer Sci, vol. 98, no. 11, pp. 1752-7, 2007.

[69] C. Bartholomew and J. Ihle, "Retroviral insertions 90 kilobases proximal to the Evi-1 myeloid transforming gene activate transcription from the normal promoter.," Mol Cell Biol, vol. 11, no. 4, pp. 1820-8, 1991.

[70] S. Barjesteh van Waalwijk van Doorn-Khosrovani, C. Erpelinck, W. van Putten, P. Valk, S. van der Poel-van de Luytgaarde, R. Hack, R. Slater, E. Smit, H. Beverloo, G. Verhoef, L. Verdonck, G. Ossenkoppele, P. Sonneveld, G. de Greef and e. al., "High EVI1 expression predicts poor survival in acute myeloid leukemia: a study of 319 de novo AML patients.," Blood, vol. 101, no. 3, pp. 837-45, 2003.

[71] K. Nayak, I. Sajitha, T. Kumar and S. Chakraborty, "Ecotropic viral integration site 1 promotes metastasis independent of epithelial mesenchymal transition in colon cancer cells.," Cell Death Dis, vol. 9, no. 2, p. 18, 2018.

[72] L. Liu, G. Shao, G. Steele-Perkins and S. Huang, "The retinoblastoma interacting zinc finger gene RIZ produces a PR domain-lacking product through an internal promoter.," J Biol Chem, vol. 272, no. 5, pp. 2984-91, 1997.

[73] L. He, J. Yu, L. Liu, I. Buyse, M. Wang, Q. Yang, A. Nakagawara, G. Brodeur, Y. Shi and S. Huang, "RIZ1, but not the alternative RIZ2 product of the same gene, is underexpressed in breast cancer,

171

and forced RIZ1 expression causes G2-M cell cycle arrest and/or apoptosis.," Cancer Res, vol. 58, no. 19, pp. 4238-44, 1998.

[74] P. Seale, S. Kajimura, W. Yang, S. Chin, L. Rohas, M. Uldry, G. Tavernier, D. Langin and B. Spiegelman, "Transcriptional control of brown fat determination by PRDM16.," Cell Metab, vol. 6, no. 1, pp. 38- 54, 2007.

[75] A. Arndt, S. Schafer, J. Drenckhahn, M. Sabeh, E. Plovie, A. Caliebe, E. Klopocki, G. Musso, A. Werdich, H. Kalwa, M. Heinig, R. Padera, K. Wassilew, J. Bluhm, C. Harnack, J. Martitz, P. Barton, M. Greutmann, F. Berger, N. Hubner and e. al., "Fine mapping of the 1p36 deletion syndrome identifies mutation of PRDM16 as a cause of cardiomyopathy.," Am J Hum Genet, vol. 93, no. 1, pp. 67-77, 2013.

[76] Z. Xiao, M. Zhang, X. Liu, Y. Zhang, L. Yang and Y. Hao, "MEL1S, not MEL1, is overexpressed in myelodysplastic syndromes patients with t(1;3)(p36;q21).," Leuk Res, vol. 30, no. 3, pp. 332-4, 2006.

[77] I. Nishikata, H. Sasaki, M. Iga, Y. Tateno, S. Imayoshi, N. Asou, T. Nakamura and K. Morishita, "A novel EVI1 gene family, MEL1, lacking a PR domain (MEL1S) is expressed mainly in t(1;3)(p36;q21)- positive AML and blocks G-CSF-induced myeloid differentiation.," Blood, vol. 102, no. 9, pp. 3323- 32, 2003.

[78] K. Kim, L. Geng and S. Huang, "Inactivation of a histone methyltransferase by mutations in human cancers.," Cancer Res, vol. 63, no. 22, pp. 7619-23, 2003.

[79] M. Eram, S. Bustos, E. Lima-Fernandes, A. Siarheyeva, G. Senisterra, T. Hajian, I. Chau, S. Duan, H. Wu, L. Dombrovski, M. Schapira, C. Arrowsmith and M. Vedadi, "Trimethylation of histone H3 lysine 36 by human methyltransferase PRDM9 protein.," J Biol Chem, vol. 289, no. 17, pp. 12177-88, 2014.

[80] N. Powers, E. Parvanov, C. Baker, M. Walker, P. Petkov and K. Paigen, "The Meiotic Recombination Activator PRDM9 Trimethylates Both H3K36 and H3K4 at Recombination Hotspots In Vivo.," PLoS Genet, vol. 12, no. 6, p. e1006146, 2016.

[81] X. Koh-Stenta, J. Joy, A. Poulsen, R. Li, Y. Tan, Y. Shim, J. Min, L. Wu, A. Ngo, J. Peng, W. Seetoh, J. Cao, J. Wee, P. Kwek, A. Hung, U. Lakshmanan, H. Flotow, E. Guccione and J. Hill, "Characterization of the histone methyltransferase PRDM9 using biochemical, biophysical and chemical biology techniques.," Biochem J, vol. 461, no. 2, pp. 323-34, 2014.

[82] L. Blazer, E. Lima-Fernandes, E. Gibson, M. Eram, P. Loppnau, C. Arrowsmith, M. Schapira and M. Vedadi, "PR Domain-containing Protein 7 (PRDM7) Is a Histone 3 Lysine 4 Trimethyltransferase.," J Biol Chem, vol. 291, no. 26, pp. 13509-19, 2016.

[83] H. Wu, H. Zeng, A. Dong, F. Li, H. He, G. Senisterra, A. Seitova, S. Duan, P. Brown, M. Vedadi, C. Arrowsmith and M. Schapira, "Structure of the catalytic domain of EZH2 reveals conformational plasticity in cofactor and substrate binding sites and explains oncogenic mutations.," PLoS One, vol. 8, no. 12, p. e83737, 2013.

172

[84] I. Gyory, J. Wu, G. Fejér, E. Seto and K. Wright, "PRDI-BF1 recruits the histone H3 methyltransferase G9a in transcriptional silencing.," Nat Immunol, vol. 5, no. 3, pp. 299-08, 2004.

[85] K. Ancelin, U. Lange, P. Hajkova, R. Schneider, A. Bannister, T. Kouzarides and M. Surani, "Blimp1 associates with Prmt5 and directs histone arginine methylation in mouse germ cells.," Nat Cell Biol, vol. 8, no. 6, pp. 623-30, 2006.

[86] L. Congdon, J. Sims, C. Tuzon and J. Rice, "The PR-Set7 binding domain of Riz1 is required for the H4K20me1-H3K9me1 trans-tail 'histone code' and Riz1 tumor suppressor function.," Nucleic Acids Res, vol. 42, no. 6, pp. 3580-9, 2014.

[87] I. Pinheiro, R. Margueron, N. Shukeir, M. Eisold, C. Fritzsch, F. Richter, G. Mittler, C. Genoud, S. Goyama, M. Kurokawa, J. Son, D. Reinberg, M. Lachner and T. Jenuwein, "Prdm3 and Prdm16 are H3K9me1 methyltransferases required for mammalian heterochromatin integrity.," Cell, vol. 150, no. 5, pp. 948-60, 2012.

[88] A. Yoshimi, S. Goyama, N. Watanabe-Okochi, Y. Yoshiki, Y. Nannya, E. Nitta, S. Arai, T. Sato, M. Shimabe, M. Nakagawa, Y. Imai, T. Kitamura and M. Kurokawa, "Evi1 represses PTEN expression and activates PI3K/AKT/mTOR via interactions with polycomb proteins.," Blood, vol. 117, no. 13, pp. 3617-28, 2011.

[89] A. Chittka, J. Nitarska, U. Grazini and W. Richardson, "Transcription factor positive regulatory domain 4 (PRDM4) recruits protein arginine methyltransferase 5 (PRMT5) to mediate histone arginine methylation and control neural stem cell proliferation and differentiation.," J Biol Chem, vol. 287, no. 51, pp. 42995-3006, 2012.

[90] Z. Duan, R. Person, H. Lee, S. Huang, J. Donadieu, R. Badolato, H. Grimes, T. Papayannopoulou and M. Horwitz, "Epigenetic regulation of protein-coding and microRNA genes by the Gfi1-interacting tumor suppressor PRDM5.," Mol Cell Biol, vol. 27, no. 19, pp. 6889-902, 2007.

[91] Y. Wu, J. 3. Ferguson, H. Wang, R. Kelley, R. Ren, H. McDonough, J. Meeker, P. Charles, H. Wang and C. Patterson, "PRDM6 is enriched in vascular precursors during development and inhibits endothelial cell proliferation, survival, and differentiation.," J Mol Cell Cardiol, vol. 44, no. 1, pp. 47-58, 2008.

[92] G. Eom, K. Kim, S. Kim, H. Kee, J. Kim, H. Jin, J. Kim, J. Kim, N. Choe, K. Kim, J. Lee, H. Kook, N. Kim and S. Seo, "Histone methyltransferase PRDM8 regulates mouse testis steroidogenesis.," Biochem Biophys Res Commun, vol. 888, no. 1, pp. 131-6, 2009.

[93] C. Yang and Y. Shinkai, "Prdm12 is induced by retinoic acid and exhibits anti-proliferative properties through the cell cycle modulation of P19 embryonic carcinoma cells.," Cell Struct Funct, vol. 38, no. 2, pp. 197-206, 2013.

[94] M. Yamaji, J. Ueda, K. Hayashi, H. Ohta, Y. Yabuta, K. Kurimoto, R. Nakato, Y. Yamada, K. Shirahige and M. Saitou, "PRDM14 ensures naive pluripotency through dual regulation of signaling and

173

epigenetic pathways in mouse embryonic stem cells," Cell Stem Cell, vol. 12, no. 3, pp. 368-82, 2013.

[95] B. Zhou, J. Wang, S. Lee, J. Xiong, N. Bhanu, Q. Guo, P. Ma, Y. Sun, R. Rao, B. Garcia, J. Hess and Y. Dou, "PRDM16 Suppresses MLL1r Leukemia via Intrinsic Histone Methyltransferase Activity.," Mol Cell, vol. 62, no. 2, pp. 222-36, 2016.

[96] V. Campagna-Slater, M. Mok, K. Nguyen, M. Feher, R. Najmanovich and M. Schapira, "Structural chemistry of the histone methyltransferases cofactor binding site.," J Chem Inf Model, vol. 51, no. 3, pp. 612-23, 2011.

[97] R. Trievel, B. Beach, L. Dirk, R. Houtz and J. Hurley, "Structure and catalytic mechanism of a SET domain protein methyltransferase.," Cell, vol. 111, no. 1, pp. 91-103, 2002.

[98] S. Mas-Y-Mas, M. Barbon, C. Teyssier, H. Déméné, J. Carvalho, L. Bird, A. Lebedev, J. Fattori, M. Schubert, C. Dumas, W. Bourguet and A. le Maire, "The Human Mixed Lineage Leukemia 5 (MLL5), a Sequentially and Structurally Divergent SET Domain-Containing Protein with No Intrinsic Catalytic Activity.," PLoS One., vol. 11, no. 11, p. e0165139, 2016.

[99] A. Osipovich, R. Gangula, P. Vianna and M. Magnuson, "Setd5 is essential for mammalian development and the co-transcriptional regulation of histone acetylation.," Development, vol. 143, no. 24, pp. 4595-4607, 2016.

[100] S. Yu, M. Kim, S. Park, B. Yoo, K. Kim and Y. Jang, "SET domain-containing protein 5 is required for expression of primordial germ cell specification-associated genes in murine embryonic stem cells.," Cell Biochem Funct, vol. 35, no. 5, pp. 247-53, 2017.

[101] H. Wu, N. Mathioudakis, B. Diagouraga, A. Dong, L. Dombrovski, F. Baudat, S. Cusack, B. de Massy and J. Kadlec, "Molecular basis for the regulation of the H3K4 methyltransferase activity of PRDM9.," Cell Rep, vol. 5, no. 1, pp. 13-20, 2013.

[102] A. Allali-Hassani, M. Szewczyk, D. Ivanochko, S. Organ, J. Bok, J. Ho, F. Gay, F. Li, L. Blazer, M. Eram, L. Halabelian, D. Dilworth, G. Luciani, E. Lima-Fernandes, Q. Wu, P. Loppnau, N. Palmer, S. Talib, P. Brown, M. Schapira and e. al., "Discovery of a chemical probe for PRDM9.," Nat Commun., vol. 10, no. 1, p. 5759, 2019.

[103] M. Schapira and C. Arrowsmith, "Methyltransferase inhibitors for modulation of the epigenome and beyond.," Curr Opin Chem Biol, vol. 33, pp. 81-7, 2016.

[104] A. Edwards, R. Isserlin, G. Bader, S. Frye, T. Willson and F. Yu, "Too many roads not taken.," Nature, vol. 470, no. 7333, pp. 163-5, 2011.

[105] O. Fedorov, S. Müller and S. Knapp, "The (un)targeted cancer kinome.," Nat Chem Biol, vol. 6, no. 3, pp. 166-9, 2010.

174

[106] R. J. Roskoski, "Properties of FDA-approved small molecule protein kinase inhibitors.," Pharmacol Res., vol. 144, no. 1, pp. 19-50, 2019.

[107] K. Bhullar, N. Lagarón, E. McGowan, I. Parmar, A. Jha, B. Hubbard and H. Rupasinghe, "Kinase- targeted cancer therapies: progress, challenges and future directions.," Mol Cancer., vol. 17, no. 1, p. 48, 2018.

[108] F. Ferguson and N. Gray, "Kinase inhibitors: the road ahead.," Nat Rev Drug Discov., vol. 17, no. 5, pp. 353-77, 2018.

[109] E. Jacoby, G. Tresadern, S. Bembenek, B. Wroblowski, C. Buyck, J. Neefs, D. Rassokhin, A. Poncelet, J. Hunt and H. van Vlijmen, "Extending kinome coverage by analysis of kinase inhibitor broad profiling data.," Drug Discov Today, vol. 20, no. 6, pp. 652-8, 2015.

[110] H. Lightfoot, F. Goldberg and J. Sedelmeier, "Evolution of Small Molecule Kinase Drugs," ACS Med CHem Lett, vol. 10, no. 2, pp. 153-60, 2018.

[111] S. Degorce, O. Tavana, E. Banks, C. Crafter, L. Gingipalli, D. Kouvchinov, Y. Mao, F. Pachl, A. Solanki, V. Valge-Archer, B. Yang and S. Edmondson, "Discovery of Proteolysis-Targeting Chimera Molecules that Selectively Degrade the IRAK3 Pseudokinase.," J Med Chem, vol. 63, no. 18, pp. 10460-73, 2020.

[112] O. Mihola, Z. Trachtulec, C. Vlcek, J. Schimenti and J. Forejt, "A mouse speciation gene encodes a meiotic histone H3 methyltransferase.," Science, vol. 323, no. 5912, pp. 373-5, 2009.

[113] E. Parvanov, P. Petkov and K. Paigen, "Prdm9 controls activation of mammalian recombination hotspots," Science, vol. 327, no. 5967, p. 835, 2010.

[114] K. Paigen and P. Petkov, "PRDM9 and Its Role in Genetic Recombination.," Trends Genet, vol. 34, no. 4, pp. 291-300, 2018.

[115] G. Margolin, P. Khil, J. Kim, M. Bellani and R. Camerini-Otero, "Integrated transcriptome analysis of mouse spermatogenesis.," BMC Genomics, pp. 15-39, 2014.

[116] Y. Soh, J. Junker, M. Gill, J. Mueller, A. van Oudenaarden and D. Page, "A Gene Regulatory Program for Meiotic Prophase in the Fetal Ovary.," PLoS Genet, vol. 11, no. 9, p. e1005531, 2015.

[117] J. Thomas, R. Emerson and J. Shendure, "Extraordinary molecular evolution in the PRDM9 fertility gene.," PLoS One, vol. 4, no. 12, p. e8505, 2009.

[118] A. Patel, J. Horton, G. Wilson, X. Zhang and X. Cheng, "Structural basis for human PRDM9 action at recombination hot spots," Genes Dev, vol. 30, no. 3, pp. 257-65, 2016.

[119] B. Davies, E. Hatton, N. Altemose, J. Hussin, F. Pratto, G. Zhang, A. Hinch, D. Moralli, D. Biggs, R. Diaz, C. Preece, R. Li, E. Bitoun, K. Brick, C. Green, R. Camerini-Otero, S. Myers and P. Donnelly, "Re-

175

engineering the zinc fingers of PRDM9 reverses hybrid sterility in mice.," Nature, vol. 530, no. 7589, pp. 171-6, 2016.

[120] R. Urrutia, "KRAB-containing zinc-finger repressor proteins.," Genome Biol, vol. 4, no. 10, p. 231, 2003.

[121] S. Iyengar and P. Farnham, "KAP1 protein: an enigmatic master regulator of the genome.," J Biol Chem, vol. 286, no. 30, pp. 26267-76, 2011.

[122] F. Lim, M. Soulez, D. Koczan, H. Thiesen and J. Knight, "A KRAB-related domain and a novel transcription repression domain in proteins encoded by SSX genes that are disrupted in human sarcomas.," Oncogene, vol. 17, no. 1, pp. 2013-8, 1998.

[123] Y. Imai, F. Baudat, M. Taillepierre, M. Stanzione, A. Toth and B. de Massy, "The PRDM9 KRAB domain is required for meiosis and involved in protein interactions.," Chromosoma, vol. 126, no. 6, pp. 681-95, 2017.

[124] S. Thibault-Sennett, Q. Yu, F. Smagulova, J. Cloutier, K. Brick, R. Camerini-Otero and G. Petukhova, "Interrogating the Functions of PRDM9 Domains in Meiosis.," Genetics, vol. 209, no. 2, pp. 475-87, 2018.

[125] B. Diagouraga, J. Clément, L. Duret, J. Kadlec, B. de Massy and F. Baudat, "PRDM9 Methyltransferase Activity Is Essential for Meiotic DNA Double-Strand Break Formation at Its Binding Sites.," Mol Cell, vol. 69, no. 5, pp. 853-65, 2018.

[126] C. Grey, J. Clément, J. Buard, B. Leblanc, I. Gut, M. Gut, L. Duret and B. de Massy, "In vivo binding of PRDM9 reveals interactions with noncanonical genomic sites.," Genome Res, vol. 27, no. 4, pp. 580-90, 2017.

[127] K. Brick, F. Smagulova, P. Khil, R. Camerini-Otero and G. Petukhova, "Genetic recombination is directed away from functional genomic elements in mice.," Nature, vol. 485, no. 7400, pp. 642-5, 2012.

[128] C. Grey, F. Baudat and B. de Massy, "PRDM9, a driver of the genetic map.," PLoS Genet, vol. 14, no. 8, p. e1007479, 2018.

[129] T. Miyamoto, E. Koh, N. Sakugawa, H. Sato, H. Hayashi, M. Namiki and K. Sengoku, "Two single nucleotide polymorphisms in PRDM9 (MEISETZ) gene may be a genetic risk factor for Japanese patients with azoospermia by meiotic arrest.," J Assist Reprod Genet, vol. 25, no. 11, pp. 533-7, 2008.

[130] S. Irie, A. Tsujimura, Y. Miyagawa, T. Ueda, Y. Matsuoka, Y. Matsui, A. Okuyama, Y. Nishimune and H. Tanaka, "Single-nucleotide polymorphisms of the PRDM9 (MEISETZ) gene in patients with nonobstructive azoospermia.," J Androl, vol. 30, no. 4, pp. 426-31, 2009.

[131] V. Narasimhan, K. Hunt, D. Mason, C. Baker, K. Karczewski, M. Barnes, A. Barnett, C. Bates, S. Bellary, N. Bockett, K. Giorda, C. Griffiths, H. Hemingway, Z. Jia, M. Kelly, H. Khawaja, M. Lek, S. 176

McCarthy, R. McEachan, A. O'Donnell-Luria and e. al., "Health and population effects of rare gene knockouts in adult humans with related parents.," Science, vol. 22, no. 352, pp. 474-7, 2016.

[132] A. Patel, X. Zhang, R. Blumenthal and X. Cheng, "tructural basis of human PR/SET domain 9 (PRDM9) allele C-specific recognition of its cognate DNA sequence.," J Biol Chem, vol. 292, no. 39, pp. 15994- 16002, 2017.

[133] I. Berg, R. Neumann, K. Lam, S. Sarbajna, L. Odenthal-Hesse, C. May and A. Jeffreys, "PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans.," Nat Genet, vol. 42, no. 10, pp. 859-63, 2010.

[134] F. Pratto, K. Brick, P. Khil, F. Smagulova, G. Petukhova and R. Camerini-Otero, "DNA recombination. Recombination initiation maps of individual human .," Science, vol. 346, no. 6211, p. 1256442, 2018.

[135] M. Hillmer, A. Summerer, V. Mautner, J. Högel, D. Cooper and H. Kehrer-Sawatzki, "Consideration of the haplotype diversity at nonallelic hotspots improves the precision of rearrangement breakpoint identification.," Hum Mutat, vol. 38, no. 12, pp. 1711-22, 2017.

[136] T. Oliver, C. Middlebrooks, A. Harden, N. Scott, B. Johnson, J. Jones, C. Walker, C. Wilkerson, S. Saffold, A. Akinseye, T. Smith, E. Feingold and S. Sherman, "Variation in the Zinc Finger of PRDM9 is Associated with the Absence of Recombination along Nondisjoined Chromosomes 21 of Maternal Origin.," J Down Syndr Chromosom Abnorm, vol. 2, no. 2, p. e115, 2016.

[137] B. Ding, L. Yan, Y. Zhang, Z. Wang, Y. Zhang, D. Xia, Z. Ye and H. Xu, "Analysis of the role of mutations in the KMT2D histone lysine methyltransferase in bladder cancer.," FEBS Open Bio, vol. 9, no. 4, pp. 693-706, 2019.

[138] S. Scheer, S. Ackloo, T. Medina, M. Schapira, F. Li, J. Ward, A. Lewis, J. Northrop, P. Richardson, H. Kaniskan, Y. Shen, J. Liu, D. Smil, D. McLeod, C. Zepeda-Velazquez, M. Luo, J. Jin, D. Barsyte-Lovejoy, K. Huber, D. De Carvalho and e. al., "A chemical biology toolbox to study protein methyltransferases and epigenetic signaling," Nat Commun, vol. 10, no. 1, p. 19, 2019.

[139] R. Ferreira de Freitas and M. Schapira, "A systematic analysis of atomic protein-ligand interactions in the PDB.," Medchemcomm, vol. 8, no. 10, pp. 1970-81, 2017.

[140] S. Salentin, S. Schreiber, V. Haupt, M. Adasme and M. Schroeder, "PLIP: fully automated protein- ligand interaction profiler.," Nucleic Acids Res., vol. 43, no. 1, pp. 443-7, 2015.

[141] D. Barsyte-Lovejoy, F. Li, M. Oudhoff, J. Tatlock, A. Dong, H. Zeng, H. Wu, S. Freeman, M. Schapira, G. Senisterra, E. Kuznetsova, R. Marcellus, A. Allali-Hassani, S. Kennedy, J. Lambert, A. Couzens, A. Aman, A. Gingras and e. al., "(R)-PFI-2 is a potent and selective inhibitor of SETD7 methyltransferase activity in cells.," Proc Natl Acad Sci U S A, vol. 111, no. 35, pp. 12853-8, 2014.

177

[142] E. Chan-Penebre, K. Kuplast, C. Majer, P. Boriack-Sjodin, T. Wigle, L. Johnston, N. Rioux, M. Munchhof, L. Jin, S. Jacques, K. West, T. Lingaraj, K. Stickland, S. Ribich, A. Raimondi, M. Scott, N. Waters, R. Pollock, J. Smith and e. al., "A selective inhibitor of PRMT5 with in vivo and in vitro potency in MCL models.," Nat Chem Biol, vol. 11, no. 6, pp. 432-7, 2015.

[143] F. Niesen, H. Berglund and M. Vedadi, "The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability.," Nat Protoc,, vol. 2, pp. 2212-21, 2007.

[144] S. Smits, T. Meyer, A. Mueller, N. van Os, M. Stoldt, D. Willbold, L. Schmitt and M. Grieshaber, "Insights into the mechanism of ligand binding to octopine dehydrogenase from Pecten maximus by NMR and crystallography.," PLoS One, vol. 5, no. 8, p. e12312, 2010.

[145] W. Kabsch, "XDS," Acta Crystallogr D Biol Crystallogr, vol. 66, no. 2, pp. 125-32, 2010.

[146] P. Evans and G. Murshudov, "How good are my data and what is the resolution?," Acta Crystallogr D Biol Crystallogr., vol. 69, no. 7, pp. 1204-14, 2013.

[147] A. McCoy, R. Grosse-Kunstleve, P. Adams, M. Winn, L. Storoni and R. Read, "Phaser crystallographic software.," J. Appl. Cryst. , vol. 40, pp. 658-74, 2007.

[148] G. Murshudov, P. Skubák, A. Lebedev, N. Pannu, R. Steiner, R. Nicholls, M. Winn, F. Long and A. Vagin, "REFMAC5 for the refinement of macromolecular crystal structures.," Acta Crystallogr D Biol Crystallogr., vol. 67, no. 4, pp. 355-67, 2011.

[149] P. Emsley, B. Lohkamp, W. Scott and K. Cowtan, "Features and development of Coot.," Acta Crystallogr D Biol Crystallogr., vol. 66, no. 4, pp. 486-501, 2010.

[150] R. Ferreira de Freitas, D. Ivanochko and M. Schapira, "Methyltransferase Inhibitors: Competing with, or Exploiting the Bound Cofactor.," Molecules, vol. 24, no. 24, p. 4492, 2019.

[151] V. Richon, D. Johnston, C. Sneeringer, L. Jin, C. Majer, K. Elliston, L. Jerva, M. Scott and R. Copeland, "Chemogenetic analysis of human protein methyltransferases.," Chem Biol Drug Des, vol. 78, no. 2, pp. 199-210, 2011.

[152] X. Cheng, S. Kumar, J. Posfai, J. Pflugrath and R. Roberts, "Crystal structure of the HhaI DNA methyltransferase complexed with S-adenosyl-L-methionine.," Cell, vol. 74, no. 2, pp. 299-307, 1993.

[153] P. Kozbial and A. Mushegian, "Natural history of S-adenosylmethionine-binding proteins.," BMC Struct Biol, vol. 5, no. 1, p. 19, 2005.

[154] H. Schubert, R. Blumenthal and X. Cheng, "Many paths to methyltransfer: a chronicle of convergence.," Trends Biochem Sci, vol. 28, no. 6, pp. 329-35, 2003.

[155] E. Stein, G. Garcia-Manero, D. Rizzieri, R. Tibes, J. Berdeja, M. Savona, M. Jongen-Lavrenic, J. Altman, B. Thomson, S. Blakemore, S. Daigle, N. Waters, A. Suttle, A. Clawson, R. Pollock, A. Krivtsov, S. Armstrong, J. DiMartino and e. al., "The DOT1L inhibitor pinometostat reduces H3K79

178

methylation and has modest clinical activity in adult acute leukemia.," Blood, vol. 131, no. 24, pp. 2661-9, 2018.

[156] M. Schapira and R. Ferreira de Freitas, "Structural biology and chemistry of protein arginine methyltransferases.," Medchemcomm, vol. 5, no. 12, pp. 1779-88, 2014.

[157] N. Troffer-Charlier, V. Cura, P. Hassenboehler, D. Moras and J. Cavarelli, "Functional insights from structures of coactivator-associated arginine methyltransferase 1 domains.," EMBO J, vol. 26, no. 20, pp. 4391-401, 2007.

[158] T. Kwon, J. Chang, E. Kwak, C. Lee, A. Joachimiak, Y. Kim, J. Lee and Y. Cho, "Mechanism of histone lysine methyl transfer revealed by the structure of SET7/9-AdoMet.," EMBO J, vol. 22, no. 2, pp. 292-303, 2003.

[159] B. Xiao, C. Jing, J. Wilson, P. Walker, N. Vasisht, G. Kelly, S. Howell, I. Taylor, G. Blackburn and S. Gamblin, "Structure and catalytic mechanism of the human histone methyltransferase SET7/9.," Nature, vol. 421, no. 6923, pp. 652-6, 2003.

[160] A. Fedoriw, S. Rajapurkar, S. O'Brien, S. Gerhart, L. Mitchell, N. Adams, N. Rioux, T. Lingaraj, S. Ribich, M. Pappalardi, N. Shah, J. Laraio, Y. Liu, M. Butticello, C. Carpenter, C. Creasy, S. Korenchuk, M. McCabe, C. McHugh and e. al., "Anti-tumor Activity of the Type I PRMT Inhibitor, GSK3368715, Synergizes with PRMT5 Inhibition through MTAP Loss.," Cancer Cell, vol. 36, no. 1, pp. 100-14, 2019.

[161] M. Eram, Y. Shen, M. Szewczyk, H. Wu, G. Senisterra, F. Li, K. Butler, H. Kaniskan, B. Speed, C. Dela Seña, A. Dong, H. Zeng, M. Schapira, P. Brown, C. Arrowsmith, D. Barsyte-Lovejoy, J. Liu, M. Vedadi and J. Jin, "A Potent, Selective, and Cell-Active Inhibitor of Human Type I Protein Arginine Methyltransferases.," ACS Chem Biol, vol. 11, no. 3, pp. 772-81, 2016.

[162] K. Nakayama, M. Szewczyk, C. Dela Sena, H. Wu, A. Dong, H. Zeng, F. Li, R. de Freitas, M. Eram, M. Schapira, Y. Baba, M. Kunitomo, D. Cary, M. Tawada, A. Ohashi, Y. Imaeda, K. Saikatendu, C. Grimshaw, M. Vedadi and e. al., "TP-064, a potent and selective small molecule inhibitor of PRMT4 for multiple myeloma.," Oncotarget, vol. 9, no. 26, pp. 18480-93, 2018.

[163] Y. Chang, T. Ganesh, J. Horton, A. Spannhoff, J. Liu, A. Sun, X. Zhang, M. Bedford, Y. Shinkai, J. Snyder and X. Cheng, "Adding a lysine mimic in the design of potent inhibitors of histone lysine methyltransferases.," J Mol Biol, vol. 400, no. 1, pp. 1-7, 2010.

[164] F. Liu, X. Chen, A. Allali-Hassani, A. Quinn, G. Wasney, A. Dong, D. Barsyte, I. Kozieradzki, G. Senisterra, I. Chau, A. Siarheyeva, D. Kireev, A. Jadhav, J. Herold, S. Frye, C. Arrowsmith, P. Brown, A. Simeonov, M. Vedadi and J. Jin, "Discovery of a 2,4-diamino-7-aminoalkoxyquinazoline as a potent and selective inhibitor of histone lysine methyltransferase G9a.," J Med Chem, vol. 52, no. 24, pp. 7950-3, 2009.

[165] E. Eggert, R. Hillig, S. Koehr, D. Stöckigt, J. Weiske, N. Barak, J. Mowat, T. Brumby, C. Christ, A. Ter Laak, T. Lang, A. Fernandez-Montalvan, V. Badock, H. Weinmann, I. Hartung, D. Barsyte-Lovejoy, M. Szewczyk, S. Kennedy, L. F. and e. al., "Discovery and Characterization of a Highly Potent and

179

Selective Aminopyrazoline-Based in Vivo Probe (BAY-598) for the Protein Lysine Methyltransferase SMYD2.," J Med Chem, vol. 59, no. 10, pp. 4578-600, 2016.

[166] A. Ferguson, N. Larsen, T. Howard, H. Pollard, I. Green, C. Grande, T. Cheung, R. Garcia-Arenas, S. Cowen, J. Wu, R. Godin, H. Chen and N. Keen, "Structural basis of substrate methylation and inhibition of SMYD2.," Structure, vol. 19, no. 9, pp. 1262-73, 2011.

[167] L. Mitchell, P. Boriack-Sjodin, S. Smith, M. Thomenius, N. Rioux, M. Munchhof, J. Mills, C. Klaus, J. Totman, T. Riera, A. Raimondi, S. Jacques, K. West, M. Foley, N. Waters, K. Kuntz, T. Wigle, M. Scott, R. Copeland, J. Smith and R. Chesworth, "Novel Oxindole Sulfonamides and Sulfamides: EPZ031686, the First Orally Bioavailable Small Molecule SMYD3 Inhibitor.," ACS Med Chem Lett, vol. 7, no. 2, pp. 134-8, 2015.

[168] R. Sweis, Z. Wang, M. Algire, C. Arrowsmith, P. Brown, G. Chiang, J. Guo, C. Jakob, S. Kennedy, L. F. D. Maag, B. Shaw, N. Soni, M. Vedadi and W. Pappano, "Discovery of A-893, A New Cell-Active Benzoxazinone Inhibitor of Lysine Methyltransferase SMYD2.," ACS Med Chem Lett, vol. 6, no. 6, pp. 695-700, 2015.

[169] M. Thomenius, J. Totman, D. Harvey, L. Mitchell, T. Riera, K. Cosmopoulos, A. Grassian, C. Klaus, M. Foley, E. Admirand, H. Jahic, C. Majer, T. Wigle, S. Jacques, J. Gureasko, D. Brach, T. Lingaraj, K. West, S. Smith, N. Rioux and e. al., "Small molecule inhibitors and CRISPR/Cas9 mutagenesis demonstrate that SMYD2 and SMYD3 activity are dispensable for autonomous cancer cell proliferation.," PLoS One, vol. 13, no. 6, p. e0197372, 2018.

[170] K. Bromberg, T. Mitchell, A. Upadhyay, C. Jakob, M. Jhala, K. Comess, L. Lasko, C. Li, C. Tuzon, Y. Dai, F. Li, M. Eram, A. Nuber, N. Soni, V. Manaves, M. Algire, R. Sweis, M. Torrent, G. Schotta, C. Sun, M. Michaelides and e. al., "The SUV4-20 inhibitor A-196 verifies a role for epigenetics in genomic integrity.," Nat Chem Biol, vol. 13, no. 3, pp. 317-24, 2017.

[171] C. Huang, S. Liew, G. Lin, A. Poulsen, M. Ang, B. Chia, S. Chew, Z. Kwek, J. Wee, E. Ong, P. Retna, N. Baburajendran, R. Li, W. Yu, X. Koh-Stenta, A. Ngo, S. Manesh, J. Fulwood, Z. Ke, H. Chung, S. Sepramaniam, X. Chew, N. Dinie and e. al., "Discovery of Irreversible Inhibitors Targeting Histone Methyltransferase, SMYD3.," ACS Med Chem Lett, vol. 10, no. 6, pp. 978-84, 2019.

[172] D. Tsao, L. Diatchenko and N. Dokholyan, "Structural mechanism of S-adenosyl methionine binding to catechol O-methyltransferase.," PLoS One, vol. 6, no. 8, p. e24287, 2011.

[173] P. Martínez-Martín and C. O'Brien, "Extending levodopa action: COMT inhibition.," Neurology, vol. 50, no. 6, pp. 27-32, 1998.

[174] J. Vidgren, L. Svensson and A. Liljas, "Crystal structure of catechol O-methyltransferase.," Nature, vol. 368, no. 6469, pp. 354-8, 1994.

[175] S. Czarnota, L. Johannissen, N. Baxter, F. Rummel, A. Wilson, M. Cliff, C. Levy, N. Scrutton, J. Waltho and S. Hay, "Equatorial Active Site Compaction and Electrostatic Reorganization in Catechol-O- methyltransferase.," ACS Catal, vol. 9, no. 5, pp. 4394-401, 2019.

180

[176] E. Schultz and E. Nissinen, "Inhibition of rat liver and duodenum soluble catechol-O- methyltransferase by a tight-binding inhibitor OR-462.," Biochem Pharmacol, vol. 38, no. 22, pp. 3953-6, 1989.

[177] M. Bonifácio, M. Archer, M. Rodrigues, P. Matias, D. Learmonth, M. Carrondo and P. Soares-Da- Silva, "Kinetics and crystal structure of catechol-o-methyltransferase complex with co-substrate and a novel inhibitor with potential therapeutic application.," Mol Pharmacol, vol. 62, no. 4, pp. 795-805, 2002.

[178] D. Learmonth, M. Bonifácio and P. Soares-da-Silva, "Synthesis and biological evaluation of a novel series of "ortho-nitrated" inhibitors of catechol-O-methyltransferase.," J Med Chem, vol. 48, no. 25, pp. 8070-8, 2005.

[179] M. Bonifácio, L. Torrão, A. Loureiro, P. Palma, L. Wright and P. Soares-da-Silva, "Pharmacological profile of opicapone, a third-generation nitrocatechol catechol-O-methyl transferase inhibitor, in the rat.," Br J Pharmacol, vol. 172, no. 7, pp. 1739-52, 2015.

[180] J. Stolk, G. Vantini, B. Perry, R. Guchhait and D. U'Prichard, "Assessment of the functional role of brain adrenergic neurons: chronic effects of phenylethanolamine N-methyltransferase inhibitors and alpha adrenergic receptor antagonists on brain norepinephrine metabolism.," J Pharmacol Exp Ther, vol. 230, no. 3, pp. 577-86, 1984.

[181] J. Martin, J. Begun, M. McLeish, J. Caine and G. Grunewald, "Getting the adrenaline going: crystal structure of the adrenaline-synthesizing enzyme PNMT.," Structure, vol. 9, no. 10, pp. 977-85, 2001.

[182] C. Gee, N. Drinkwater, J. Tyndall, G. Grunewald, Q. Wu, M. McLeish and J. Martin, "Enzyme adaptation to inhibitor binding: a cryptic binding site in phenylethanolamine N- methyltransferase.," J Med Chem, vol. 50, no. 20, pp. 4845-53, 2007.

[183] J. Yang, A. Roy and Y. Zhang, "BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions.," Nucleic Acids Res, vol. 41, pp. D1096-103, 2013.

[184] A. Ribeiro, S. Das, N. Dawson, R. Zaru, S. Orchard, J. Thornton, C. Orengo, E. Zeqiraj, J. Murphy and P. Eyers, "Emerging concepts in pseudoenzyme classification, evolution, and signaling.," Sci Signal, vol. 12, no. 594, p. eaat9797, 2019.

[185] C. Dalvit and A. Vulpetti, "Ligand-Based Fluorine NMR Screening: Principles and Applications in Drug Discovery Projects.," J Med Chem, vol. 62, no. 5, pp. 2218-44, 2019.

[186] Y. Luan, L. Blazer, H. Hu, T. Hajian, J. Zhang, H. Wu, S. Houliston, C. Arrowsmith, M. Vedadi and Y. Zheng, "Design of a fluorescent ligand targeting the S-adenosylmethionine binding site of the histone methyltransferase MLL1," Org Biomol Chem, vol. 14, no. 2, pp. 631-8, 2016.

181

[187] M. Minnich, H. Tagoh, P. Bönelt, E. Axelsson, M. Fischer, B. Cebolla, A. Tarakhovsky, S. Nutt, M. Jaritz and M. Busslinger, "Multifunctional role of the transcription factor Blimp-1 in coordinating plasma cell differentiation.," Nat Immunol, vol. 17, no. 3, pp. 331-43, 2016.

[188] S. Su, H. Ying, Y. Chiu, F. Lin, M. Chen and K. Lin, "Involvement of histone demethylase LSD1 in Blimp-1-mediated gene repression during plasma cell differentiation.," Mol Cell Biol, vol. 29, no. 6, pp. 1421-31, 2009.

[189] E. Gasteiger, C. Hoogland, A. Gattiker, S. Duvaud, M. Wilkins, R. Appel and A. Bairoch, "Protein Identification and Analysis Tools on the ExPASy Server," in The Proteomics Protocols Handbook, J. M. Walker, Ed., Humana Press, 2005, pp. 571-607.

[190] D. Ivanochko, L. Halabelian, E. Henderson, P. Savitsky, H. Jain, E. Marcon, S. Duan, A. Hutchinson, A. Seitova, D. Barsyte-Lovejoy, P. Filippakopoulos, J. Greenblatt, E. Lima-Fernandes and C. Arrowsmith, "Direct interaction between the PRDM3 and PRDM16 tumor suppressors and the NuRD chromatin remodeling complex," Nucleic Acids Res, vol. 47, no. 3, pp. 1225-38, 2019.

[191] M. Yoshida, K. Nosaka, J. Yasunaga, I. Nishikata, K. Morishita and M. Matsuoka, "Aberrant expression of the MEL1S gene identified in association with hypomethylation in adult T-cell leukemia cells.," Blood, vol. 103, no. 7, pp. 2753-60, 2004.

[192] L. Fagerberg, B. Hallström, P. Oksvold, C. Kampf, D. Djureinovic, J. Odeberg, M. Habuka, S. Tahmasebpoor, A. Danielsson, K. Edlund, A. Asplund, E. Sjöstedt, E. Lundberg, C. Szigyarto, M. Skogs, J. Takanen, H. Berling, H. Tegel, J. Mulder and e. al., "Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics.," Mol Cell Proteomics, vol. 13, no. 2, pp. 397-406, 2014.

[193] Y. Zhang, S. Stehling-Sun, K. Lezon-Geyda, S. Juneja, L. Coillard, G. Chatterjee, C. Wuertzer, F. Camargo and A. Perkins, "PR-domain-containing Mds1-Evi1 is critical for long-term hematopoietic stem cell function.," Blood, vol. 118, no. 14, pp. 3853-61, 2011.

[194] S. Chuikov, B. Levi, M. Smith and S. Morrison, "Prdm16 promotes stem cell maintenance in multiple tissues, partly by regulating oxidative stress.," Nat Cell Biol, vol. 12, no. 10, pp. 999-1006, 2010.

[195] F. Aguilo, S. Avagyan, A. Labar, A. Sevilla, D. Lee, P. Kumar, I. Lemischka, B. Zhou and H. Snoeck, "Prdm16 is a physiologic regulator of hematopoietic stem cells.," Blood, vol. 117, no. 19, pp. 5057- 66, 2011.

[196] L. Huang, D. Pan, Q. Chen, L. Zhu, J. Ou, M. Wabitsch and Y. Wang, "Transcription factor Hlx controls a systematic switch from white to brown fat through Prdm16-mediated co-activation.," Nat Commun, vol. 8, no. 1, p. 68, 2017.

[197] M. Harms, H. Lim, Y. Ho, S. Shapira, J. Ishibashi, S. Rajakumari, D. Steger, M. Lazar, K. Won and P. Seale, "PRDM16 binds MED1 and controls chromatin architecture to determine a brown fat transcriptional program.," Genes Dev, vol. 29, no. 3, pp. 298-307, 2015.

182

[198] K. Morishita, E. Parganas, C. William, M. Whittaker, H. Drabkin, J. Oval, R. Taetle, M. Valentine and J. Ihle, "Activation of EVI1 gene expression in human acute myelogenous leukemias by translocations spanning 300-400 kilobases on chromosome band 3q26.," Proc Natl Acad Sci U S A. , vol. 89, no. 9, pp. 3937-41, 1992.

[199] S. Lugthart, E. van Drunen, Y. van Norden, A. van Hoven, C. Erpelinck, P. Valk, H. Beverloo, B. Löwenberg and R. Delwel, "High EVI1 levels predict adverse outcome in acute myeloid leukemia: prevalence of EVI1 overexpression and chromosome 3q26 abnormalities underestimated.," Blood, vol. 111, no. 8, pp. 4329-37, 2008.

[200] S. Gröschel, S. Lugthart, R. Schlenk, P. Valk, K. Eiwen, C. Goudswaard, W. van Putten, S. Kayser, L. Verdonck, M. Lübbert, G. Ossenkoppele, U. Germing, I. Schmidt-Wolf, B. Schlegelberger, J. Krauter, A. Ganser, H. Döhner, B. Löwenberg and e. al., "High EVI1 expression predicts outcome in younger adult patients with acute myeloid leukemia and is associated with distinct cytogenetic abnormalities.," J Clin Oncol, vol. 28, no. 12, pp. 2101-7, 2010.

[201] C. Baldazzi, S. Luatti, E. Zuffa, C. Papayannidis, E. Ottaviani, G. Marzocchi, G. Ameli, M. Bardi, L. Bonaldi, R. Paolini, C. Gurrieri, G. Rigolin, A. Cuneo, G. Martinelli, M. Cavo and N. Testoni, "Complex chromosomal rearrangements leading to MECOM overexpression are recurrent in myeloid malignancies with various 3q abnormalities.," Genes Chromosomes Cancer, vol. 55, no. 4, pp. 375- 88, 2016.

[202] D. Brooks, S. Woodward, F. Thompson, B. Dos Santos, M. Russell, J. Yang, X. Guan, J. Trent, D. Alberts and R. Taetle, "Expression of the zinc finger gene EVI-1 in ovarian and other cancers.," Br J Cancer, vol. 74, no. 10, pp. 1518-25, 1996.

[203] K. Yasui, C. Konishi, Y. Gen, M. Endo, O. Dohi, A. Tomie, T. Kitaichi, N. Yamada, N. Iwai, T. Nishikawa, K. Yamaguchi, M. Moriguchi, Y. Sumida, H. Mitsuyoshi, S. Tanaka, S. Arii and Y. Itoh, "EVI1, a target gene for amplification at 3q26, antagonizes transforming growth factor-β-mediated growth inhibition in hepatocellular carcinoma.," Cancer Sci, vol. 106, no. 7, pp. 929-37, 2015.

[204] K. Haas, M. Kundi, W. Sperr, H. Esterbauer, W. Ludwig, R. Ratei, E. Koller, H. Gruener, C. Sauerland, C. Fonatsch, P. Valent and R. Wieser, "Expression and prognostic significance of different mRNA 5'- end variants of the oncogene EVI1 in 266 patients with de novo AML: EVI1 and MDS1/EVI1 overexpression both predict short remission duration.," Genes Chromosomes Cancer, vol. 47, no. 4, pp. 288-98, 2008.

[205] Y. Du, N. Jenkins and N. Copeland, "Insertional mutagenesis identifies genes that promote the immortalization of primary bone marrow progenitor cells.," Blood, vol. 106, no. 12, pp. 3932-9, 2005.

[206] S. Arai, A. Yoshimi, M. Shimabe, M. Ichikawa, M. Nakagawa, Y. Imai, S. Goyama and M. Kurokawa, "Evi-1 is a transcriptional target of mixed-lineage leukemia oncoproteins in hematopoietic stem cells.," Blood, vol. 117, no. 23, pp. 6304-14, 2011.

183

[207] G. Yamato, H. Yamaguchi, H. Handa, N. Shiba, M. Kawamura, S. Wakita, K. Inokuchi, Y. Hara, K. Ohki, J. Okubo, M. Park, M. Sotomatsu, H. Arakawa and Y. Hayashi, "Clinical features and prognostic impact of PRDM16 expression in adult acute myeloid leukemia.," Genes Chromosomes Cancer., vol. 56, no. 11, pp. 800-809, 2017.

[208] R. Delwel, T. Funabiki, B. Kreider, K. Morishita and J. Ihle, "Four of the seven zinc fingers of the Evi- 1 myeloid-transforming gene are required for sequence-specific binding to GA(C/T)AAGA(T/C)AAGATAA.," Mol Cell Biol, vol. 13, no. 7, pp. 4291-300, 1993.

[209] T. Funabiki, B. Kreider and J. Ihle, "The carboxyl domain of zinc fingers of the Evi-1 myeloid transforming gene binds a consensus sequence of GAAGATGAG.," Oncogene, vol. 9, no. 6, pp. 1575- 81, 1994.

[210] E. Bard-Chapeau, J. Jeyakani, C. Kok, J. Muller, B. Chua, J. Gunaratne, A. Batagov, P. Jenjaroenpun, V. Kuznetsov, C. Wei, R. D'Andrea, G. Bourque, N. Jenkins and N. Copeland, "Ecotopic viral integration site 1 (EVI1) regulates multiple cellular processes important for cancer and is a synergistic partner for FOS protein in invasive tumors.," Proc Natl Acad Sci U S A, vol. 109, no. 6, pp. 2168-73, 2012.

[211] C. Glass, C. Wuertzer, X. Cui, Y. Bi, R. Davuluri, Y. Xiao, M. Wilson, K. Owens, Y. Zhang and A. Perkins, "Global Identification of EVI1 Target Genes in Acute Myeloid Leukemia.," PLoS One, vol. 8, no. 6, p. e67134, 2013.

[212] K. Izutsu, M. Kurokawa, Y. Imai, K. Maki, K. Mitani and H. Hirai, "The corepressor CtBP interacts with Evi-1 to repress transforming growth factor beta signaling.," Blood, vol. 97, no. 9, pp. 2815- 22, 2001.

[213] S. Kajimura, P. Seale, T. Tomaru, H. Erdjument-Bromage, M. Cooper, J. Ruas, S. Chin, P. Tempst, M. Lazar and B. Spiegelman, "Regulation of the brown and white fat gene programs through a PRDM16/CtBP transcriptional complex.," Genes Dev, vol. 22, no. 10, pp. 1397-409, 2008.

[214] E. Nitta, K. Izutsu, Y. Yamaguchi, Y. Imai, S. Ogawa, S. Chiba, M. Kurokawa and H. Hirai, "Oligomerization of Evi-1 regulated by the PR domain contributes to recruitment of corepressor CtBP.," Oncogene, vol. 24, no. 40, pp. 6156-73, 2005.

[215] J. Basta and M. Rauchman, "The nucleosome remodeling and deacetylase complex in development and disease.," Trans Res, vol. 165, no. 1, pp. 36-47, 2015.

[216] M. Torrado, J. Low, A. Silva, J. Schmidberger, M. Sana, M. Sharifi Tabar, M. Isilak, C. Winning, C. Kwong, M. Bedward, M. Sperlazza, D. J. Williams, N. Shepherd and J. Mackay, "Refinement of the subunit interaction network within the nucleosome remodelling and deacetylase (NuRD) complex.," FEBS J, vol. 284, no. 24, pp. 4216-32, 2017.

[217] C. Millard, N. Varma, A. Saleh, K. Morris, P. Watson, A. Bottrill, L. Fairall, C. Smith and J. Schwabe, "The structure of the core NuRD repression complex provides insights into its interaction with chromatin.," Elife, vol. 5, p. e13941, 2016.

184

[218] F. Schmitges, A. Prusty, M. Faty, A. Stützer, G. Lingaraju, J. Aiwazian, R. Sack, D. Hess, L. Li, S. Zhou, R. Bunker, U. Wirth, T. Bouwmeester, A. Bauer, N. Ly-Hartig, K. Zhao, H. Chan, J. Gu, H. Gut, W. Fischle, J. Müller and N. Thomä, "Histone methylation by PRC2 is inhibited by active chromatin marks.," Mol Cell, vol. 42, no. 3, pp. 330-41, 2011.

[219] S. Lauberth and M. Rauchman, "A conserved 12-amino acid motif in Sall1 recruits the nucleosome remodeling and deacetylase corepressor complex.," J Biol Chem, vol. 281, no. 33, pp. 23922-21, 2006.

[220] S. Lejon, S. Thong, A. Murthy, S. AlQarni, N. Murzina, G. Blobel, E. Laue and J. Mackay, "Insights into association of the NuRD complex with FOG-1 from the crystal structure of an RbAp48·FOG-1 complex.," J Biol Chem, vol. 286, no. 2, pp. 1196-203, 2011.

[221] Z. Liu, F. Li, B. Zhang, S. Li, J. Wu and Y. Shi, "Structural basis of plant homeodomain finger 6 (PHF6) recognition by the retinoblastoma binding protein 4 (RBBP4) component of the nucleosome remodeling and deacetylase (NuRD) complex.," J Biol Chem, vol. 290, no. 10, pp. 6630-8, 2015.

[222] R. Moody, M. Lo, J. Meagher, C. Lin, N. Stevers, S. Tinsley, I. Jung, A. Matvekas, J. Stuckey and D. Sun, "Probing the interaction between the histone methyltransferase/deacetylase subunit RBBP4/7 and the transcription factor BCL11A in epigenetic complexes.," J Biol Chem, vol. 293, no. 6, pp. 2125-36, 2018.

[223] W. Hong, M. Nakazawa, Y. Chen, R. Kori, C. Vakoc, C. Rakowski and G. Blobel, "FOG-1 recruits the NuRD repressor complex to mediate transcriptional repression by GATA-1.," EMBO J, vol. 24, no. 13, pp. 2367-78, 2005.

[224] E. Bard-Chapeau, J. Gunaratne, P. Kumar, B. Chua, J. Muller, F. Bard, W. Blackstock, N. Copeland and N. Jenkins, "EVI1 oncoprotein interacts with a large and complex network of proteins and integrates signals through protein phosphorylation.," Proc Natl Acad Sci U S A, vol. 110, no. 31, pp. e2885-94, 2013.

[225] D. Spensberger, M. Vermeulen, X. Le Guezennec, R. Beekman, A. van Hoven, E. Bindels, H. Stunnenberg and R. Delwel, "Myeloid transforming protein Evi1 interacts with methyl-CpG binding domain protein 3 and inhibits in vitro histone deacetylation by Mbd3/Mi-2/NuRD.," Biochemistry, vol. 47, no. 24, pp. 6418-26, 2008.

[226] S. Zhu, Y. Xu, M. Song, G. Chen, H. Wang, Y. Zhao, Z. Wang and F. Li, "PRDM16 is associated with evasion of apoptosis by prostatic cancer cells according to RNA interference screening.," Mol Med Rep, vol. 14, no. 4, pp. 3357-61, 2016.

[227] S. Goyama, E. Nitta, T. Yoshino, S. Kako, N. Watanabe-Okochi, M. Shimabe, Y. Imai, K. Takahashi and M. Kurokawa, "EVI-1 interacts with histone methyltransferases SUV39H1 and G9a for transcriptional repression and bone marrow immortalization.," Leukemia., vol. 24, no. 1, pp. 81-8, 2010.

185

[228] A. Lai and P. Wade, "Cancer biology and NuRD: a multifaceted chromatin remodelling complex.," Nat Rev Cancer, vol. 11, no. 8, pp. 588-96, 2011.

[229] J. Fu, L. Qin, T. He, J. Qin, J. Hong, J. Wong, L. Liao and J. Xu, "The TWIST/Mi2/NuRD protein complex and its essential role in cancer metastasis.," Cell Res, vol. 21, no. 2, pp. 275-89, 2011.

[230] R. Srinivasan, G. Mager, R. Ward, J. Mayer and J. Svaren, "NAB2 represses transcription by interacting with the CHD4 subunit of the nucleosome remodeling and deacetylase (NuRD) complex.," J Biol Chem, vol. 281, no. 22, pp. 15129-37, 2006.

[231] C. Aguilera, K. Nakagawa, R. Sancho, A. Chakraborty, B. Hendrich and A. Behrens, "c-Jun N-terminal phosphorylation antagonises recruitment of the Mbd3/NuRD repressor complex.," Nature, vol. 469, no. 7329, pp. 231-5, 2011.

[232] N. Reynolds, M. Salmon-Divon, H. Dvinge, A. Hynes-Allen, G. Balasooriya, D. Leaford, A. Behrens, P. Bertone and B. Hendrich, "NuRD-mediated deacetylation of H3K27 facilitates recruitment of Polycomb Repressive Complex 2 to direct gene repression.," EMBO J, vol. 31, no. 3, pp. 593-605, 2012.

[233] S. Zhao, M. Choi, J. Overton, S. Bellone, D. Roque, E. Cocco, F. Guzzo, D. English, J. Varughese, S. Gasparrini, I. Bortolomai, N. Buza, P. Hui, M. Abu-Khalaf, A. Ravaggi, E. Bignotti, E. Bandiera, C. Romani, P. Todeschini, R. Tassi, L. Zanotti and e. al., "Landscape of somatic single-nucleotide and copy-number mutations in uterine serous carcinoma.," Proc Natl Acad Sci U S A, vol. 19, no. 110, pp. 2916-21, 2013.

[234] G. Teo, G. Liu, J. Zhang, A. Nesvizhskii, A. Gingras and H. Choi, "SAINTexpress: improvements and additional features in Significance Analysis of INTeractome software.," J Proteomics, vol. 4, no. 100, pp. 37-43, 2014.

[235] M. Luijsterburg, M. Lindh, K. Acs, M. Vrouwe, A. Pines, H. van Attikum, L. Mullenders and N. Dantuma, "DDB2 promotes chromatin decondensation at UV-induced DNA damage.," J Cell Biol, vol. 197, pp. 267-81, 2012.

[236] N. Baker, D. Sept, S. Joseph, M. Holst and J. McCammon, "Electrostatics of nanosystems: application to microtubules and the ribosome.," Proc Natl Acad Sci USA, vol. 98, pp. 10037-41, 2001.

[237] R. Abagyan, M. Totrov and D. Kuznetsov, "ICM - a new method for protein modeling and design. Applications to docking and structure prediction from the distorted native conformation," J Comp Chem, vol. 15, pp. 488-506, 1994.

[238] M. Schapira, M. Tyers, M. Torrent and C. Arrowsmith, "WD40 repeat domain proteins: a novel target class?," Nat Rev Drug Discov, vol. 16, no. 11, pp. 773-86, 2017.

[239] M. Schapira, M. Calabrese, A. Bullock and C. Crews, "Targeted protein degradation: expanding the toolbox.," Nat Rev Drug Discov, vol. 18, no. 12, pp. 949-63, 2019.

186

[240] P. Cromm and C. Crews, "Targeted Protein Degradation: from Chemical Biology to Drug Discovery.," Cell Chem Biol, vol. 24, no. 9, pp. 1181-90, 2017.

[241] S. Paiva and C. Crews, "Targeted protein degradation: elements of PROTAC design.," Curr Opin Chem Biol., vol. 50, pp. 111-9, 2019.

[242] Y. Ding, Y. Fei and B. Lu, "Emerging New Concepts of Degrader Technologies.," Trends Pharmacol Sci, vol. 41, no. 7, pp. 464-74, 2020.

[243] P. Cromm, K. Samarasinghe, J. Hines and C. Crews, "Addressing Kinase-Independent Functions of Fak via PROTAC-Mediated Degradation.," J Am Chem Soc, vol. 140, no. 49, pp. 17019-26, 2018.

[244] Y. Sun, N. Ding, Y. Song, Z. Yang, W. Liu, J. Zhu and Y. Rao, "Degradation of Bruton's tyrosine kinase mutants by PROTACs for potential treatment of ibrutinib-resistant non-Hodgkin lymphomas.," Leukemia, vol. 33, no. 8, pp. 2105-10, 2019.

[245] A. Buhimschi, H. Armstrong, T. M, S. Jaime-Figueroa, T. Chen, A. Lehman, J. Woyach, A. Johnson, J. Byrd and C. Crews, "Targeting the C481S Ibrutinib-Resistance Mutation in Bruton's Tyrosine Kinase Using PROTAC-Mediated Degradation.," Biochemistry, vol. 57, no. 26, pp. 3564-75, 2018.

[246] J. Salami, S. Alabi, R. Willard, N. Vitale, J. Wang, H. Dong, M. Jin, D. McDonnell, A. Crew, T. Neklesa and C. Crews, "Androgen receptor degradation by the proteolysis-targeting chimera ARCC-4 outperforms enzalutamide in cellular models of prostate cancer drug resistance.," Commun Biol, vol. 1, p. 100, 2018.

[247] A. Cipriano, G. Sbardella and A. Ciulli, "Targeting epigenetic reader domains by chemical biology.," Curr Opin Chem Biol, vol. 30, no. 57, pp. 82-94, 2020.

[248] K. Sakamoto, K. Kim, A. Kumagai, F. Mercurio, C. Crews and R. Deshaies, "Protacs: chimeric molecules that target proteins to the Skp1-Cullin-F box complex for ubiquitination and degradation.," Proc Natl Acad Sci U S A, vol. 98, no. 15, pp. 8554-9, 2001.

[249] A. Schneekloth, M. Pucheault, H. Tae and C. Crews, "Targeted intracellular protein degradation induced by a small molecule: En route to chemical proteomics.," Bioorg Med Chem Lett, vol. 18, no. 22, pp. 5904-8, 2008.

[250] M. Scudellari, "Protein-slaying drugs could be the next blockbuster therapies.," Nature, vol. 567, no. 7748, pp. 298-300, 2019.

[251] M. Zengerle, K. Chan and A. Ciulli, "Selective Small Molecule Induced Degradation of the BET Bromodomain Protein BRD4.," ACS Chem Biol, vol. 10, no. 8, p. 17707, 2015.

[252] M. Gadd, A. Testa, X. Lucas, K. Chan, W. Chen, D. Lamont, M. Zengerle and A. Ciulli, "Structural basis of PROTAC cooperative recognition for selective protein degradation.," Nat Chem Biol, vol. 13, no. 5, pp. 514-21, 2017.

187

[253] K. Baek, D. Krist, J. Prabu, S. Hill, M. Klügel, L. Neumaier, S. von Gronau, G. Kleiger and B. Schulman, "NEDD8 nucleates a multivalent cullin-RING-UBE2D ubiquitin ligation assembly.," Nature, vol. 578, no. 7795, pp. 461-6, 2020.

[254] G. Wu, G. Xu, B. Schulman, P. Jeffrey, J. Harper and N. Pavletich, "Structure of a beta-TrCP1-Skp1- beta-catenin complex: destruction motif binding and lysine specificity of the SCF(beta-TrCP1) ubiquitin ligase.," Mol Cell, vol. 11, no. 6, pp. 1445-56, 2003.

[255] K. Simonetta, J. Taygerly, K. Boyle, S. Basham, C. Padovani, Y. Lou, T. Cummins, S. Yung, S. von Soly, F. Kayser, J. Kuriyan, M. Rape, M. Cardozo, M. Gallop, N. Bence, P. Barsanti and A. Saha, "Prospective discovery of small molecule enhancers of an E3 ligase-substrate interaction.," Nat Comm, vol. 10, no. 1, p. 1402, 2019.

[256] Y. Zhang, H. Wei, D. Xie, D. Calambur, A. Douglas, M. Gao, F. Marsilio, W. Metzler, N. Szapiel, P. Zhang, M. Witmer, L. Mueller and D. Hedin, "An improved protocol for amino acid type-selective isotope labeling in insect cells.," J Biomol NMR, vol. 68, no. 4, pp. 237-47, 2017.

188