Structural Insights into Human Epigenetic Regulation from the Proteins SETD7 and Brd3

James Woodmansey

A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy

School of Life and Environmental Sciences

Faculty of Science

The University of Sydney

2021

i

Declaration

The work described in this thesis was performed between February 2016 and November 2019 in the School of Life and Environmental Sciences (formerly the School of Molecular Bioscience) at the University of Sydney. All experiments were conducted by the author unless otherwise specified. This work has not been submitted, in part or in full, for the purpose of obtaining a higher degree at any other institution.

James Nicolas Woodmansey

February 2021

ii

Table of Contents

Declaration ...... ii

Acknowledgements ...... viii

Abstract ...... ix

List of Abbreviations ...... x

List of Figures and Tables ...... xii

1. Introduction ...... 1

1.1 The Packaging of DNA ...... 1

1.2 The Burgeoning Field of Epigenetics ...... 3

1.2.1 Protein methylation ...... 5

1.2.2 Protein acetylation ...... 6

1.3 Epigenetic Readers and Writers...... 7

1.3.1 SET domains ...... 7

1.3.2 Bromodomains and bromodomain inhibitors ...... 9

1.3.3 The unknowns of epigenetics ...... 11

1.4 Many Diseases have Epigenetic Causes or Solutions ...... 11

1.5 Protein Engineering in an Epigenetic Context ...... 15

1.6 Fragment Based Drug Design ...... 17

1.7 Outline of this Thesis ...... 19

2. Materials and Methods ...... 20

2.1 Materials ...... 20

2.1.1 Chemicals and reagents ...... 20

2.1.2 Plasmids and oligonucleotides ...... 22

2.1.3 Bacterial strains and culture media ...... 22

2.2 Molecular Biology ...... 23

iii

2.2.1 Polymerase chain reaction (PCR) ...... 24

2.2.2 Vector ligation ...... 25

2.2.3 E. coli transformation ...... 25

2.2.4 Plasmid purification ...... 26

2.2.5 Dideoxy sequencing ...... 26

2.3 Bacterial Protein Expression and Purification ...... 26

2.3.1 Overexpression by IPTG induction ...... 26

2.3.2 Overexpression by autoinduction ...... 26

2.3.3 Expression of isotopically labelled proteins ...... 27

2.3.4 Bacterial cell lysis ...... 27

2.3.5 Affinity chromatography ...... 28

2.3.6 Dialysis and concentration of proteins ...... 28

2.3.7 Cleavage of affinity tags ...... 29

2.3.8 Purification and refolding of insoluble histones ...... 29

2.3.9 Ion exchange chromatography ...... 29

2.3.10 Size exclusion chromatography ...... 29

2.4 Mammalian Cell Culture ...... 30

2.4.1 Cell lines and culture conditions ...... 30

2.4.2 Transient plasmid transfection ...... 30

2.4.3 Cell harvesting and lysis ...... 31

2.4.4 Formaldehyde crosslinking ...... 31

2.5 Gel Electrophoresis ...... 31

2.5.1 Agarose gel electrophoresis ...... 31

2.5.2 SDS-PAGE ...... 31

2.5.3 Western blot ...... 32

2.5.4 Native PAGE ...... 32

iv

2.6 Mass Spectrometry ...... 32

2.7 RNA Quantification ...... 33

2.7.1 RNA extraction from mammalian cells ...... 33

2.7.2 Reverse transcription ...... 33

2.7.3 Quantitative polymerase chain reaction ...... 33

2.8 Quantification of Histone Post-Translational Modifications ...... 34

2.8.1 Chromatin immunoprecipitation ...... 34

2.8.2 In vitro histone methyltransferase assay ...... 35

2.8.3 Autoradiography ...... 35

2.9 Crystal Screening ...... 36

2.10 Nuclear Magnetic Resonance (NMR) ...... 36

2.11 Molecular Docking ...... 36

3. Design, Expression and Characterisation of Split Epigenetic Enzyme Constructs ...... 38

3.1 Introduction ...... 38

3.2 Enzyme Selection ...... 38

3.2.1 Construction of human epigenetic enzyme datasheet ...... 38

3.2.2 Selection of enzymes for proof-of-concept experiments ...... 40

3.3 SETD7 Methyltransferase Domain Assays ...... 43

3.3.1 Expression and purification of the SETD7 methyltransferase domain ...... 43

3.3.2 Expression and purification of Histone H3.3 ...... 48

3.3.3 Histone methylation assays ...... 50

3.4 Split Construct Design and Expression ...... 57

3.4.1 Design of split SETD7-leucine zipper constructs ...... 57

3.4.2 Production of split green fluorescent protein control constructs ...... 60

3.4.3 Expression, purification and testing of split SETD7 constructs ...... 65

3.5 Discussion ...... 68

v

4. Observation of Epigenetic Modifications in Cultured Human Cells ...... 72

4.1 Introduction ...... 72

4.2 Transfection of Cultured Cells ...... 74

4.2.1 Choice of vectors...... 74

4.3 Detection of Modifications by Reverse Transcription Quantitative Polymerase Chain Reaction ...... 74

4.4 Detection of Modifications by Chromatin Immunoprecipitation ...... 81

4.5 Transfection Protocol Troubleshooting ...... 83

4.5.1 Confirmation of enzyme expression ...... 84

4.5.2 Investigating transfection efficiency ...... 84

4.5.3 Effect of puromycin concentration on cell selection ...... 86

4.6 Discussion ...... 87

5. Inhibition of the Brd3 Extraterminal Domain ...... 91

5.1 Introduction ...... 91

5.2 Expression and Purification of the Brd3 ET Domain ...... 92

5.3 Crystallising the Brd3 ET Domain ...... 94

5.3.1 Design, expression and purification of truncated Brd3 ET constructs ...... 96

5.3.2 Crystallisation of truncated Brd3 ET constructs ...... 99

5.3.3 Aiding Brt3 ET crystallisation with a heterogenous nucleating agent ...... 100

5.4 Binding of Small Molecule Inhibitors to Brd3 ET: Investigation via NMR ...... 101

5.4.2 Analysis of the binding mode of MFP-7210 to Brd3 ET via molecular docking .... 108

Concluding Remarks...... 122

References ...... 124

Appendices ...... 135

Appendix 1: Sequences Used for Protein Expression ...... 135

Appendix 2: RT-qPCR Primers ...... 146

vi

Appendix 3: Human Epigenetic Enzymes ...... 147

Appendix 4: SETD7 Amino Acid Conservation Input Sequences ...... 177

Appendix 5: RT-qPCR Raw Data ...... 202

Appendix 6: FOG dCas9 Verification Sequencing ...... 214

vii

Acknowledgements

First and foremost, a huge thank you to my supervisors, Professor Jacqui Matthews and Professor Joel Mackay. Jacqui — thank you so much for mentoring me over so many years and so many projects, for always being there to bounce ideas off and offer up more when I ran out, and for being endlessly knowledgeable, insightful and patient. Joel — thank you for coming to the rescue whenever a project collapsed, giving me direction and sharing some of your endless depth of knowledge with me. Thank you also to my auxiliary supervisor Doctor Ann Kwan for helping me navigate my PhD and providing copious amounts of advice and moral support.

Thank you to Doctor Lorna Wilkinson-White and Doctor Jason Low for guiding my research and teaching me so many lab skills and techniques over the years. Almost everything I’ve learned about the practical side of biochemistry has come from one of you. Thank you to everyone in the G08 Structural Biology group for creating such a welcoming and supportive environment, always helping and listening, and basically being my second family for five years.

Thank you to my honourary sisters Taylor and Fiona, without whom I would probably have gone crazy long ago. Taylor, your friendship and support are the stuff of legends. You kept me grounded and stable, and helped me mature as a scientist and as a person. Fi, even from halfway across the world you always managed to keep my spirits up and be there to talk when I needed it.

And of course, thank you to Vania for loving me and standing by my side through all of this. You are and always will be my rock. Your love, your advice, and your endless compassion and capacity to help others fuel me. I couldn’t have reached this point without you.

viii

Abstract

Epigenetics—the study of the layer of gene regulation not accounted for by the DNA sequence itself—is a burgeoning field increasingly tied to diseases such as cancer and Alzheimer’s disease in humans. A prevailing avenue of current research is the so-called histone code—the large number of posttranslational modifications (PTMs) made to the histone proteins around which DNA is wrapped. These PTMs act in a highly specific and often synergistic manner to promote or repress transcription of DNA proximal to the histone. The histone code is created and managed by families of enzymes that are either readers or writers of histone PTMs. Dysregulation of the activity of these proteins can cause disease.

The effects of the majority of histone PTMs are currently unknown. A method to cause targeted, specific epigenetic modifications in human cells would be a great asset for investigating the effects of each histone modification, and to observe phenotypes resulting from the editing of histone PTMs at specific loci. Work presented in this thesis attempted to develop such a method using a dCas9 targeting system with a histone lysine methyltransferase, SETD7. Non-specific histone modification was achieved in solution, and attempts were made to test these reagents in mammalian cells. However, constructs designed for increased specificity did not cause histone PTMs either in solution or in mammalian cells.

Work was also undertaken to aid the development of inhibitory drug candidates for the extraterminal (ET) protein-protein interaction domain of Brd3, a clinically significant histone PTM reader that is dysregulated in numerous cancers. Candidate small molecule inhibitors were investigated by nuclear magnetic resonance spectroscopy (NMR). NMR binding data showed that the drug candidates are tight binders of the Brd3 ET domain, and molecular docking demonstrated that at least one candidate might act as a competitive inhibitor of the native ligand. Overall, this work provides progress towards both the study of histone regulation and the treatment of its dysregulation.

ix

List of Abbreviations

6xHis Hexahistidine tag ACG Automatic gain control ACN Acetonitrile AdoMet S-adenosylmethionine ADME Adsorption distribution metabolism excretion bp Base pairs BSA Bovine serum albumin Cas9 CRISPR associated protein 9 cDNA Complementary deoxyribonucleic acid ChIP Chromatin immunoprecipitation COSY Correlated spectroscopy CPM Counts per minute CRISPR Clustered regularly interspaced short palindromic repeats dCas9 Deactivated CRISPR associated protein 9 DMEM Dulbecco’s modified eagle’s medium DMSO Dimethyl sulfoxide DNA Deoxyribonucleic acid Dnmt3a DNA methyltransferase 3 alpha dNTPs Deoxyribonucleotide triphosphates DPM Disintegrations per minute DSS Sodium trimethylsilylpropanesulfonate DTT Dithiothreitol E. coli Escherichia coli EDTA Ethylenediaminetetraacetic acid ET Extraterminal ETS Extraterminal short ETSR Extraterminal shorter FBDD Fragment based drug design GFP Green fluorescent protein gRNA Guide ribonucleic acid GST Glutathione-S-transferase H1 Histone 1 H2A Histone 2A H2B Histone 2B H3 Histone 3 H3.3 Histone 3.3 H4 Histone 4 HDAC Histone deacetylase HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid HPLC High performance liquid chromatography HRP Horseradish peroxidase HRV Human rhinovirus HSQC heteronuclear single quantum coherence IPTG Isopropyl β-D-thiogalactopyranoside Kd Dissociation constant LB Luria broth MBP Maltose binding protein

x

MES 2-(N-morpholino)ethanesulfonic acid MQW MilliQ water MRI Magnetic resonance imaging mRNA Messenger ribonucleic acid MS Mass spectrometry NCE Normalised collision energy NMR Nuclear magnetic resonance NOE Nuclear Overhauser effect NOESY Nuclear Overhauser effect spectroscopy NTA Nitrilotriacetic acid NuRD Nucleosome remodelling and deacetylation OD Optical density ORS Off-rate screening PAGE Polyacrylamide gel electrophoresis PAM Protospacer adjacent motif PAMPA Parallel Artificial Membrane Permeability Assay PBS Phosphate buffered saline PCR Polymerase chain reaction PEI Polyethylenimine PET Positron emission tomography PIPES 1,4-Piperazinediethanesulfonic acid PMSF Phenylmethylsulfonyl fluoride PRC2 Polycomb repressive complex 2 P-TEFb Positive transcription elongation factor complex PTM Posttranslational modification qPCR Quantitative polymerase chain reaction REFiL Rapid Elaboration of Fragments into Leads RNA Ribonucleic acid RNAi Ribonucleic acid interference RPM Revolutions per minute RT Room temperature RT-qPCR Reverse transcription quantitative polymerase chain reaction SAR Structure-activity-relationship SDS Sodium dodecyl sulfate SDS-PAGE Sodium dodecylsulfate polyacrylamide gel electrophoresis SPR Surface plasmon resonance STD Saturation transfer difference TAE Tris/acetic acid/EDTA buffer TCEP tris(2-carboxyethyl)phosphine TOCSY Total correlation spectroscopy Tris Tris(hydroxymethyl)aminomethane TY Tryptone yeast broth UV Ultraviolet

xi

List of Figures and Tables

Figure 1.1: DNA packaging into ...... 2

Figure 1.2: Availability of histone ‘tails’ in the assembled nucleosome...... 4

Figure 1.3: Sample histone modifications...... 4

Figure 1.4: SET domain structure...... 8

Figure 1.5: Bromodomain structure...... 9

Figure 1.6: Schematic of BET protein structure...... 10

Table 1.1: Examples of diseases with epigenetic causes...... 12

Table 1.2: BET Inhibitors in clinical trials...... 13

Figure 1.7: Native Cas9 function...... 16

Figure 1.8: Design of a split enzyme dCas9 system for effecting epigenetic change...... 17

Figure 1.9 Advantage of fragment-based drug design over high throughput screening. .... 18

Table 2.1 Chemicals and Reagents ...... 20

Table 2.2 Bacterial culture media...... 23

Table 2.3: PCR mixture components...... 24

Table 2.4: Restriction enzyme and expression plasmid usage for each gene of interest. ... 25

Table 2.5: Lysis buffers used for protein purification...... 28

Figure 3.1: Excerpt from the compiled epigenetic enzyme datasheet...... 39

Figure 3.2: Structure of human SETD7 bound to the cofactor ...... 41

Figure 3.3: Structure of human KMT2D bound to the cofactor ...... 42

Figure 3.4: Structure of KAT8 bound to the cofactor ...... 43

Figure 3.5: SDS-PAGE analysis of trial expression of SETD7 methyltransferase domain. .... 44

Figure 3.6: Mascot analysis of candidate SETD7 overexpression band...... 45

Figure 3.7: SDS-PAGE analysis of repeat purification of SETD7 methyltransferase domain...... 47

Figure 3.8: SDS-PAGE analysis of trial histone 3.3 expression...... 49

xii

Figure 3.9: SDS-PAGE analysis of histone H3.3 expression and purification...... 50

Figure 3.10: Initial SETD7 activity assay via scintillation counting...... 51

Figure 3.11: Effect of various wash steps on background count in tritiated H3 scintillation counting...... 52

Figure 3.12: Effect of NaOH wash concentration on background count in scintillation counting...... 53

Figure 3.13: Reversing loading order to determine the effect of loading order on tritiated H3 scintillation counting...... 54

Figure 3.14: Determining the time dependent step of the SETD7 activity assay...... 55

Figure 3.15: SETD7 activity assay via scintillation counting with freshly prepared SETD7 and Histone H3.3...... 56

Figure 3.16: SETD7 activity assay via autoradiography...... 57

Figure 3.17: Cut sites, secondary structure, amino acid conservation and notable features of SETD7 (110–366)...... 59

Figure 3.18: The five proposed split domains of SETD7 shown on its structure...... 60

Figure 3.19: Split SETD7 construct design...... 60

Figure 3.20: Design of split GFP leucine zipper constructs...... 61

Figure 3.21: Trial co-expression of split GFP-leucine zipper constructs...... 62

Figure 3.22: Fluorescence of split GFP-leucine zipper constructs...... 63

Figure 3.23: Clear native PAGE analysis of split GFP overexpression and purification...... 64

Figure 3.24: Activity of split SETD7 lysate via scintillation counting...... 65

Figure 3.25: SDS-PAGE analysis of purification of split SETD7 constructs. Samples were taken following elution from Ni-NTA resin...... 66

Figure 3.26: Activity of purified split SETD7 pairs via scintillation counting...... 68

Figure 4.1: Mammalian methyltransferase expression workflow...... 73

Figure 4.2: FOG-dCas9 as an inhibitor of HER2 transcription...... 75

Figure 4.3: RT-qPCR of HER2 in FOG1-dCas9 transfected cells...... 76

xiii

Figure 4.4: RT-qPCR of HER2 in FOG1-dCas9 transfected cells...... 77

Figure 4.5: Unknown composition of a supposed FOG1-dCas9 plasmid sample...... 78

Figure 4.6: RT-qPCR of HER2 in KRAB-dCas9 or SETD7-dCas9 transfected cells...... 79

Figure 4.7: RT-qPCR of HER2 in KRAB-dCas9, SETD7-dCas9, KAT8-dCas9 or KMT2D-dCas9 transfected cells...... 80

Figure 4.8: RT-qPCR of HER2 in KRAB-dCas9, SETD7-dCas9, KAT8-dCas9 or KMT2D-dCas9 transfected cells...... 81

Figure 4.9: RT-qPCR of HER2 in FOG1-dCas9 transfected cells...... 82

Figure 4.10: qPCR of immunoprecipitated chromatin in FOG1-dCas9 transfected cells...... 83

Figure 4.11: Anti-FLAG Western blot of FOG1-dCas9, dCas9, KRAB-dCas9, and SETD7-dCas9...... 84

Figure 4.12: Fluorescence microscopy of GFP-transfected HEK293-FT cells...... 85

Figure 4.13: Survival of puromycin-resistance and puromycin-sensitive cells following treatment with varying concentrations of puromycin...... 86

Figure 4.14: Map of the iCas9 KRAB plasmid...... 88

Figure 5.1: Leading small molecule inhibitor candidates for the Brd3 ET domain...... 92

Figure 5.2: SDS-PAGE analysis of small-scale expression of the Brd3 ET domain...... 94

Figure 5.3: Crystallisation of the Brd3 ET domain using commercial crystallisation screens...... 95

Figure 5.4: Design of less flexible truncation constructs of Brd3 ET...... 97

Figure 5.5: SDS-PAGE analysis of trial expression of truncated Brd3 ET constructs...... 98

Figure 5.6: Crystallisation of Brd3 ET truncation using commercial crystallisation screens...... 99

Figure 5.7: Crystallisation of the Brd3 ET domain and its truncations using a heterogenous nucleating agent...... 100

Figure 5.8: 15N-HSQC titrations of Brd3 ET binding to small molecule inhibitors...... 103

Figure 5.9: Binding affinity of Brd3 ET for small molecule inhibitors...... 104

xiv

Figure 5.10: Changes in chemical environment of Brd3 ET residues following small molecule inhibitor binding, judged from changes in signals in 15N-HSQC spectra...... 106

Figure 5.11: Structural data of the binding mode of small molecule inhibitors of Brd3 ET derived from HSQC titration...... 107

Table 5.1: Intermolecular NOEs used for Brd3 ET:MFP-7210 docking...... 110

Figure 5.12: MFP-7210 nomenclature...... 111

Figure 5.13: Structural data of the binding mode of small molecule inhibitors of Brd3 ET derived from HSQC titration...... 111

Figure 5.14: Molecular docking of the Brd3 ET:MFP-7210 complex compared to NMR solution structure of Brd3 ET bound to a native CHD4-derived ligand...... 113

Figure 5.15: Major and minor orientations of MFP-7210 determined by molecular docking of the Brd3 ET:MFP-7210 complex...... 114

Figure 5.16: NOEs identified between Brd3 ET and MFP-7210 by NMR spectroscopy, shown on the lowest energy structure of the of the Brd3 ET:MFP-7210 complex as determined by molecular docking...... 116

Figure 5.17: Hydrophobicity of the Brd3 ET:CHD4 and Brd3 ET:MFP-7210 complexes...... 117

xv

1. Introduction

All biological processes depend on the translation of to produce proteins that carry out their specific tasks. An essential part of translation is the regulated transcription of DNA into messenger RNA. As almost all eukaryotic cells contain the entire genome of the organism, precise regulation of DNA transcription is required in every cell type at each stage of its life cycle to differentiate cells and maintain normal cell function. A deeper understanding of the mechanisms and regulation of DNA transcription will allow us to observe and influence an enormous number of cellular events and responses. Furthermore, a detailed knowledge of how transcriptional regulation can go wrong may be the key to the treatment of many diseases and genetic disorders. The work presented in this thesis is a contribution to our rapidly growing body of knowledge on transcriptional regulation, with a focus on the development of tools to artificially regulate transcription in mammalian cells.

1.1 The Packaging of DNA

The nucleus of the human cell contains a very large amount of genomic DNA—on average, roughly 7 billion base pairs (bp) [1, 2], corresponding to a total length of almost 3 m of DNA per cell [3]. In order to fit into the nuclei of cells, often no more than 10 µM in diameter [1], this DNA must be supercoiled into a substance known as chromatin (Figure 1.1).

1

Figure 1.1: DNA packaging into chromosomes. Double stranded DNA is wrapped around histone octamers to create nucleosomes. Nucleosomes assemble into a coiled chromatin fibre. Coils of chromatin make up each .

Of particular importance in DNA packaging is the structure known as a nucleosome, consisting of an octamer of histone proteins (two monomers each of the histones H2A, H2B, H3 and H4) about which a DNA strand is wrapped approximately 1.7 times [4, 5]. Additionally, histone H1 binds to both the nucleosome and the encircling DNA, promoting nucleosome stability while also encouraging compaction of the DNA strand [5]. Each nucleosome contains approximately 150 bp of DNA [5]. The nucleosome is the basic structural unit of DNA packaging, but it is also a key part of the transcriptional regulatory apparatus [5-7]. The spooling and movement of DNA around and between nucleosomes determines the accessibility of that DNA to the transcriptional machinery, (including transcription factors, chromatin modifiers and RNA polymerases) and therefore the frequency with which it is transcribed [5, 7]. These processes

2 are controlled by epigenetic markers, most of which are post-translational modifications on the histones themselves [8].

1.2 The Burgeoning Field of Epigenetics

The term epigenetics refers to stable modifications of chromosomes that do not involve a change in DNA sequence [9]. In practice, these are modifications either to the DNA molecule or to the N-termini (‘tails’) of the histones around which DNA is wrapped (Figures 1.1–1.3). Modifications to DNA primarily consist of methylation of cytosine bases to produce 5-methylcytosine, which is associated with transcriptional repression [10]. This thesis is focused on the process and consequences of histone modification.

All four nucleosome-forming histones contain a highly structured alpha-helical core but have unstructured N-terminal tails that protrude from the nucleosome (Figure 1.2) [11]. These tails can be post-translationally modified in multiple ways (including but not limited to methylation, acetylation, phosphorylation, biotinylation, ubiquitination and ribosylation) [12- 14], at numerous residues within the tail (Figure 1.3). An increasing number of transcriptionally relevant post-translational modifications (PTMs) to the helical histone core are also being discovered [13]. Though these are generally less well characterised, they have been shown to play roles in nucleosome assembly, telomere regulation and DNA repair as well as transcriptional regulation [13, 15]. The sum of these modifications to histones, and their effects on , is often referred to as ‘the histone code’.

The effects of these histone modifications can be broadly categorised as either activating or repressing transcription of DNA proximal to the nucleosome, but the mechanisms by which they cause changes in transcription are varied and often synergistic [16, 17]. For example, transcription of DNA may be promoted by recruitment of both histone remodelling enzymes that promote the formation of open chromatin, and proteins that actively recruit RNA polymerase [16, 17].

3

Figure 1.2: Availability of histone ‘tails’ in the assembled nucleosome. The histone octamer is shown in ribbon format with DNA (grey) wrapped around it, forming a complete nucleosome. Protruding histone tails are indicated with arrows. PDB ID: 1KX5 [11].

Figure 1.3: Sample histone modifications. A subset of known histone modifications on the first 20 residues of histone H3. Methyl groups are shown in light grey, acetyl groups in dark grey, and phosphate groups in yellow.Figure courtesy of Jason Low, Mackay Laboratory, University of Sydney. 4

The first epigenetic marker (DNA methylation) was characterised in 1978 [18]. Bird and Southern were able to establish that methylation of CpG islands in DNA was heritable in Xenopus and proposed that methylation played a role in restricting accessibility of the DNA to RNA polymerase. Research in the field picked up immensely in the 1990s with the discovery of histone modifications and their significance in terms of gene expression [7, 16, 17, 19], although the role of histones in gene regulation had been proposed as early as 1950 [20]. Work by the Grunstein laboratory was able to show that histone tails played a crucial role in transcriptional regulation [21, 22]. Confirmation of histone PTMs being responsible for this came when Taunton et al. and Brownell et al. independently and simultaneously discovered homology between mammalian histone deacetylases and known yeast transcriptional regulators [23, 24]. Examination of the amino acid sequences of histones throughout nature provided support for the crucial role of histone modifications in cell metabolism, as the sequences of the four core histones are highly conserved in all cases [25]. Ongoing research is now focused not only on discovering new epigenetic modifications (e.g., crotonylation, tail cleavage) and their effects, but on leveraging the histone code to treat disease, as is outlined in Section 1.4 [26-30].

1.2.1 Protein methylation

The existence of a methylated lysine species in protein from living cells was first observed in bacteria by Ambler and Rees in 1959 via ion exchange chromatography [31]. It was very quickly discovered in numerous other contexts, including in mammalian histones in 1964 [32]. Further studies demonstrated that the methyl group was added independently of translation [33-35]. It was hypothesised by Allfrey et al. in 1964 that histone methylation played a role in transcriptional regulation [34]. However, it was not until 1998 that functions of methylation were confirmed, initially by Shen et al. who demonstrated that it was responsible for nuclear export of heterogenous nuclear ribonucleoproteins [36]. The next year, Chen et al. and Strahl et al. both demonstrated that histone methylation is associated with active transcription [37, 38]. Numerous studies since then have confirmed this finding and showed that histone methylation is recognised by transcription factors and RNA polymerases, and therefore acts as a controller of transcriptional activation [39-43]. Knockout phenotypes of numerous histone methyltransferases have been shown to be embryonic lethal in mice [44-47].

While the bulk of research on protein methylation has been focused on histones, protein methyltransferases (including those known to methylate histones) have also been shown to

5 have significant non-histone targets. As mentioned above, the first discovered function of protein methylation was in the control of nuclear export of riboproteins. The role of arginine methylation in the regulation of RNA-binding proteins has since been further investigated, and discovered to include control of functions such as pre-mRNA splicing and RNA-protein complex stability [48, 49]. More unusually, methylation of lysine and arginine residues on the tumour suppressor protein p53 has been shown to either activate or suppress protein function in a site-specific manner [50, 51]. Other instances of protein methylation are now being discovered by proteomic analysis, suggesting that methylation plays a regulatory role for a wide variety of proteins [52, 53].

1.2.2 Protein acetylation

The acetylation of proteins was first observed in histones by Phillips in 1963 [54]. Allfrey et al. proposed a role for protein acetylation in transcriptional regulation in the same 1964 paper that hypothesised a similar function for histone methylation [34]. Once again, it was not until the advent of proteomic techniques that functions of acetylation were also confirmed for histones. A 1988 paper by Hebbes et al. showed that acetylated histones were associated with actively transcribed chromatin [55]. Two years later, a pair of papers showing correlations between histone acetylation and transcriptional silencing were released [56, 57], indicating a site-specific role for histone acetylation in transcriptional regulation. Confirmation that this observation was more than correlation, and histone acetylation played a causal role in transcriptional silencing, was provided by Brownell et al. and Taunton et al. in 1996 [23, 24].

Research on protein acetylation has investigated a diverse array of non-histone targets. Mass spectrometry and pulldown-based approaches have identified over four thousand acetylated residues in more than two thousand proteins [58, 59]. Influential proteins that are dynamically acetylated include the tumour suppressor p53 and the transcription factor NF-κB [60, 61]. Most well-studied acetyltransferase and deacetylase enzymes, such as p300, MYST proteins, sirtuins, and the Nucleosome Remodelling and Deacetylation (NuRD) complex are capable of acetylating or deacetylating a wide range of histone and non-histone substrates [62-66].

6

1.3 Epigenetic Readers and Writers

Proteins involved in maintaining and implementing the histone code can be broadly classified as either ‘readers’ or ‘writers’ [14]. Readers are enzymes that recognise histone modifications and (directly or indirectly) cause a change in gene expression [14, 19]. There are currently over 25 families of histone readers known, such as chromobarrels and bromodomains (which recognise acetylated lysine) and PWWP domains (a proline-tryptophan-tryptophan-proline motif which recognises methylated lysine) [19]. Writers are enzymes that add or remove histone modifications, such as EZH2 and SET domains (which are methyltransferases), or MYST domains (which are acetyltransferases) [14]. Some regulatory complexes, such as the polycomb repressive complex, contain both reading and writing subunits [67]. The work in this thesis will focus on two histone regulatory protein families: SET domains, a family of lysine methyltransferase enzymes; and bromodomains, a widespread domain responsible for detection of acetylated lysine.

1.3.1 SET domains

More than 20 unique types of histone methylation have been identified in eukaryotes to date, associated with both activation and repression of transcription [68]. The vast majority of these modifications are at lysine residues, with methylation being mediated by different SET family proteins [69]. Indeed, the SET domain is the most common histone lysine methyltransferase domain family [69, 70]. First characterised by Jenuwein et al. in 1998 as a common feature between three previously discovered histone methyltransferases [71], there are currently over 100 known SET domain-containing proteins [69], and over 280 published structures of human SET domains, alone or in complex with over 170 different protein ligands [72].

The SET domain is beta sheet rich, with a knot of beta strands forming and surrounding the active site (Figure 1.4a). A central channel can accommodate the lysine side chain to be modified (Figure 1.4b), and a binding site for S-adenosylmethyonine (AdoMet), the cofactor that donates a methyl group [69, 73, 74].

7

a)

b)

Figure 1.4: SET domain structure. a) The structure of SETD7, a representative SET domain-containing enzyme, in complex with TAF10 (a synthetic methyltransferase substrate) and the methyl-donating cofactor S-adenosylmethionine. b) The active site of SETD7, showing how S-adenosylmethionine enters a binding pocket. PDB ID: 4J83 [74]

8

1.3.2 Bromodomains and bromodomain inhibitors

Bromodomains are a eukaryotic protein domain family that read acetylated lysines on histones [75, 76], as well as a wide variety of other regulatory proteins such as various JAK/STAT components [77], the regulator PHIP [78], and the transcription initiating TAF proteins [79]. Bromodomains are the only known acetyllysine-binding domain family involved in gene expression regulation [76]. While the first bromodomain-containing protein was identified as a homeotic gene activator in Drosophila [80], over 60 have now been identified in humans [76], and ~1500 structures published at time of writing, alone or in complex with ~700 unique ligands [72]. Bromodomains are characterised by a conserved fold consisting of four alpha helices and two variable loops that are responsible for substrate binding (Figure 1.5) [81].

Figure 1.5: Bromodomain structure. The structure of bromodomain 1 (cyan) of Brd4, a representative bromodomain-containing enzyme, in complex with the N-terminal tail of histone H3 (red). Residue K14 of H3 binds to the bromodomain via the variable loops. PDB ID: 3JVK [82].

Approximately 25 unique histone lysine acetylation marks that are read by bromodomains have been identified [68]. Most well-studied histone acetylations, such as H3K9ac, H3K14ac, and H3K27ac, are associated with active transcription [68]. However, H4K20ac appears to be a repressive mark [83] and H4K16ac (and potentially other H4 acetylation marks) play a role

9 in the regulation of higher order chromatin structure, thus being indirectly responsible for both activation and repression of transcription [84]. Many bromodomain-containing proteins do not directly effect changes in gene transcription, but are responsible for the recruitment of effector proteins or complexes via protein-protein interaction domains [76, 85]. Significant examples include the positive transcription elongation factor complex (P-TEFb), which is recruited by Brd2 and Brd4 and responsible for elongation of an mRNA transcript by RNA polymerase II [85], and Polycomb repressive complex 2 (PRC2), which is recruited by numerous bromodomain containing proteins in order to silence gene expression at a specific locus by methylation of lysine 27 of histone 3 [76].

Of particular interest to this thesis is the family of bromodomain-containing proteins known as BET proteins. These are characterised by tandem N-terminal bromodomains and a C- terminal extraterminal domain (Figure 1.6) [86].

Figure 1.6: Schematic of BET protein structure. BET proteins consist of tandem N-terminal bromodomains and a C-terminal extraterminal domain.

The BET protein family consists of four proteins—Brd2, Brd3, Brd4 and BrdT [87]. Brd2, Brd3 and Brd4 are ubiquitously expressed and highly conserved in mammals, and are required for normal mammalian development—loss of function of any of the three is embryonic lethal in mice [88, 89]. BrdT expression is localised to the testes and plays a vital role in chromatin condensation in spermatogenic cells [90].

On a cellular level, BET proteins are readers of acetylated lysine, as implied by their tandem bromodomains. They are readers of histone acetylation but also bind acetyllysine on a wide range of other protein ligands such as transcription factors and RNA polymerases [91-93]. BET proteins can also bind transcription factors in an acetylation-independent manner, which is mediated by the ET domain [87, 91]. At least Brd4 and potentially other BET proteins also display evidence of kinase activity via a currently unknown mechanism [94]. Taken broadly, these functions all relate to transcriptional regulation, and underpin the essential role of BET proteins in maintenance of chromatin and control of gene transcription.

10

1.3.3 The unknowns of epigenetics

Although an enormous body of literature exists regarding histone modifications, much of it is concentrated on a small number of modifications such as H3K4me and H3K9ac. The majority of known modifications are little-understood beyond whether they have a gross activating or repressive effect on transcription [68]. Novel histone modifications such as citrullination and hydroxylation are being discovered at a rapid rate [95, 96]. Approximately 70% of currently characterised histone modifying enzymes have no published structural or functional data [97]. Studies of effects of histone modifications at a fine-grained rather than gross level are also currently quite difficult, requiring precise gene editing or engineered proteins (see Section 1.5). For these reasons, the development of a generalised set of tools that allow the investigation of specific histone modifications at specific loci would be of enormous benefit to the field.

1.4 Many Diseases have Epigenetic Causes or Solutions

Several diseases and conditions, particularly cancers, have recently been found to have wholly or partially epigenetic causes (Table 1.1) [26, 76, 85, 98-111].

Many of the most studied diseases (e.g., Fragile X syndrome, Angelman’s/Prader-Willi syndrome, many forms of cancer) are related to DNA methylation [26, 105, 108]. Dysregulation of histone modification is strongly implicated in the occurrence of cancer [104, 106, 107, 110, 112]. For example, silencing of tumour suppressor genes by overactive PRC2 plays a causal role in tumourigenesis in prostrate and breast cancer [106], and the histone kinase Pak1 promotes mitosis of breast cancer cells by initiating chromosome condensation through the phosphorylation of H3 [112]. Dysregulation of histone deacetylation in the nervous system is also known to play a role in some forms of epilepsy, and these can be successfully treated using histone deacetylase (HDAC) inhibitors [113]. Neuronal histone deacetylation is now being implicated in conditions such as Alzheimer’s disease, Huntington’s disease, and depression [114, 115].

As discussed in Section 1.3.2, the focus of drug targeting in this thesis is the BET family of bromodoman-containing proteins. BET protein dysregulation is responsible for the growth and maintenance of many cancers, such as acute myeloid leukemia [116], breast cancer [117] and myeloma [118]. Inhibition of the acetyllysine-binding bromodomain of BET proteins has

11 therefore been investigated as a strategy for treating numerous cancers, with a dozen different BET inhibitors in clinical trials for the treatment of over thirty forms of cancer [85].

Table 1.1: Examples of diseases with epigenetic causes.

Epigenetic dysfunction Disease(s) caused Improper activation of BET proteins Leukemia leading to dysregulated gene Breast cancer activation Lung cancer Colorectal cancer Glioblastoma Prostate cancer NUT midline carcinoma Multiple myeloma Adenocarcinoma Lymphoma Melanoma Neuroblastoma Deletion of region of one copy of Angelman/Prader-Willi Syndromes chromosome 15 and epigenetic silencing of other copy Abnormal DNA methylation on Beckwith-Wiedemann Syndrome chromosome 11 Mutation of MECP2 gene (responsible Rett Syndrome for neuronal gene silencing) Mutation of STX16 gene (responsible Pseudohypoparathyroidism for DNA methylation at various loci) Mutation of the FMR1 gene leading to Fragile X syndrome its methylation and silencing Overproduction of the histone Epilepsy deacetylase HDAC2 in neurons

12

Table 1.2: BET Inhibitors in clinical trials. Adapted from Pérez-Salvia and Esteller (2017) [85].

Compound Cancer type inhibited JQ1 Hematologic malignancies Lung cancer Breast cancer Prostate cancer Pancreatic cancer Colon cancer Hepatocellular cancer Glioblastoma Medulloblastoma I-BET762 Hematologic malignancies NUT midline carcinoma Small cell lung cancer Non-small cell lung cancer Colorectal cancer Neuroblastoma Castration resistant prostate cancer Triple negative breast cancer ER positive breast cancer Multiple myeloma Prostate cancer OTX015 NUT midline carcinoma Triple negative breast cancer Non-small cell lung cancer Castration-resistant prostate cancer Pancreatic ductal adenocarcinoma Acute myeloid leukemia Diffuse large B-cell lymphoma Acute lymphoblastic leukemia Multiple myeloma Glioblastoma multiforme I-BET151 Mixed lineage leukemia Myeloma Glioblastoma Acute myeloid leukemia Melanoma CPI203 Multiple myeloma Pancreatic neuroendocrine tumours Mantle cell lymphoma PFI-1 Leukemia FT-1101 Acute myeloid leukemia Acute myelogenous leukemia CPI-0610 Lymphoma Multiple myeloma Acute leukemia Myelofibrosis

13

BAY1238097 Hepatocellular carcinoma Lung cancer NUT midline carcinoma Melanoma INCB054329 Lymphoma Hematologic malignancies TEN-010 NUT midline carcinoma Advanced solid malignancies ZEN003694 Metastatic castration-resistant prostate cancer BMS-986158 Advanced solid tumours ABBV-075 Breast cancer Non-small cell lung cancer Acute myeloid leukemia Multiple myeloma GS-5829 Metastatic castration-resistant prostate cancer Lymphoma Breast cancer PLX51107 Solid tumours Acute myeloid leukemia Myelodysplastic syndrome

All BET inhibitors currently being trialled are competitive inhibitors of acetyllysine binding [119]. They tend to have very broad specificities in that they do not differentiate between BRD2, -3 -4 and -T, and can therefore potentially affect many different transcriptional events in a wide range of tissues [119]. These unintended transcriptional changes are known as off- target effects.

None of the drugs mentioned in Table 1.2 are currently past phase 1 trials, which use a small (usually 20–40-person) group of human subjects to evaluate dosages and safety. Results of these trials have been mixed. Although lifespan increases of up to 8 months have been achieved, the development of multiple forms of resistance to the treatments has been observed, and the low specificity of the drugs and inhibition of the wide-ranging roles of BET proteins by off-target effects cause severe toxicity, resulting in several trials being ended early due to dangerous side effects [119]. There is enormous interest in developing more specific bromodomain inhibitors due to the lower potential for harmful side effects and the potential to use inhibitors in combination to reduce the development of resistance.

A few drug candidates intentionally target both epigenetic and non-epigenetic enzymes (e.g., mycophenolic acid, which inhibits both HDACs and inosine-5’-monophosphate

14 dehydrogenase, a non-epigenetic transcriptional regulator) and thereby provide a synergistic effect [120]. However, improved specificity is an important goal in many epigenetic therapies. For example, broad spectrum HDAC inhibitors (even those approved for clinical use) include severe side effects associated with chemotherapy, such as dehydration, fatigue, nausea, diarrhea, thrombocytopenia and hyperglycaemia [121, 122]. Although classical drug design pathways are starting to produce drugs with increased specificity for targeted HDAC isoforms [123], this only enables a higher dosage with similar side effects. The work in this thesis considers protein engineering as an alternative pathway to produce targeted epigenetic effects, for both research and clinical use.

1.5 Protein Engineering in an Epigenetic Context

Protein engineering refers to the creation of proteins not found in nature using any combination of rational or arbitrary design, mutation, and the combination of multiple natural proteins or peptides [124]. Protein engineering is a young field—the first proteins produced by directed evolution were created in the 1970s [125] and the first rationally designed proteins followed in the late 1980s [126]. However, it has grown rapidly thanks to our continually increasing understanding of protein structure and expansion of computational power. The incorporation of techniques such as multiple sequence alignment [127, 128], computational structure prediction [129], error-prone PCR [130] and homology modelling [131] allows researchers to ever more accurately produce engineered proteins for their desired application. Current computational power is now such that proteins can now be designed de novo without reference to an existing sequence [132].

The major application of protein engineering in epigenetics to date is the use of designed transcription factors such as artificial zinc fingers [133-135] and transcription-activator-like- effectors (TALEs) [136, 137], both of which can be programmed to bind to specific gene sequences. Other protein engineering-based approaches that have attempted to increase the specificity of epigenetic probes and drugs include PROTACs, which are bifunctional molecules capable of targeting a specific protein of interest for degradation [138], and stable RNAi, which causes production of a small RNA molecule that interferes with mRNA coding for the protein of interest [139]. However, these mechanisms eliminate the target protein and are susceptible to the off-target effects described in Section 1.4. Here a method is proposed to directly target histone PTMs at specific genomic loci using a split dCas9 fusion system.

15

Cas9 is a bacterial protein involved in viral defence [140]. Its native function is to accept a small RNA sequence known as a guide RNA or gRNA, probe double-stranded DNA for a sequence complementary to the gRNA, and cleave this sequence (Figure 1.7) [141]. This activity has famously been harnessed in the CRISPR/Cas9 system to perform targeted gene editing [142]. However, a form of Cas9 in which the point mutations D10A and H840A abolish activity in both endonuclease domains, known as deactivated Cas9 or dCas9, retains the ability to target (but not cleave) a DNA sequence of interest when provided with a complementary gRNA [143].

Figure 1.7: Native Cas9 function. Cas9 scans double stranded DNA containing a protospacer adjacent motif (PAM) sequence (required for binding of Cas9 to the DNA) for a sequence complementary to a short gRNA. DNA with a matching sequence is cleaved.

As discussed above, dCas9 can be linked to an effector enzyme to produce a fusion construct that localises to a chosen DNA sequence and allows the enzyme to carry out its native function there. This allows specific targeting of the enzyme’s function, which has been used to generate target-specific epigenetic enzymes [144-146]. However, dCas9 fusion is an imperfect approach as the enzyme is still active in solution, and because dCas9 is somewhat prone to binding to non-complementary DNA sequences [147]. Such a fusion construct is therefore still capable of causing off-target effects. In order to reduce these off-target effects, a split enzyme system has been designed by the Mackay group in which the effector enzyme is split into two 16 inactive fragments and each fragment fused to dCas9. The fragments are designed such that they are inactive on their own, but activity is restored when they are brought into close proximity, resulting in a system where enzyme activity is only present when both fusion proteins are localised to the target DNA sequence, and virtually eliminating off-target effects (Figure 1.8). Similar systems involving proteins designed zinc finger or transcription activator- like effector proteins fused to dimeric transcription factors have already been described [148- 150], but the dCas9 approach presents significant advantages as targeting a specific gene requires only a guide RNA sequence, rather than design and production of a bespoke fusion protein for each gene. Work towards the testing of this dCas9-based system in vitro is presented in Chapters 3 and 4.

Figure 1.8: Design of a split enzyme dCas9 system for effecting epigenetic change. An enzyme performing the desired epigenetic modification (here an arbitrary histone deacetylase) is split into two fragments which are inactive by themselves, but able to restore activity when brought into close proximity. The fragments are fused to separate dCas9 proteins. When provided with two gRNAs for proximal gene loci, both dCas9s bind and allow resurrection of enzyme activity.

1.6 Fragment Based Drug Design

Conventional drug discovery involves the high-throughput screening of millions of candidate small molecules against the drug target of interest in order to identify molecules that display tight binding to the target [151-153]. Fragment based drug design (FBDD) reduces the requirement for a vast library of small molecules by using a smaller library of tiny (<500 Da) ‘fragment’ molecules, each of which could occupy a portion of the binding pocket of the target protein [154-156]. Although binding of any of these fragments is expected to be substantially

17 weaker than that of larger molecules, this approach makes it easier to sample a broader chemical space with far fewer molecules. Fragment libraries are screened for binding to the target protein, and following validation, can be expanded through medicinal chemistry, and/or by the semi-rational design of larger candidate molecules that occupy more of the binding site, or by combining the most promising fragments into a single small molecule (Figure 1.9) [155]. In most cases this process also results in a drug candidate that binds more tightly than candidates discovered through high-throughput screening [154]. An additional advantage is that libraries and subsequent iterations of binders can be designed to be drug- like (e.g., in terms of molecular size, water and lipid solubility, known or predicted ligand efficiency).

Figure 1.9 Advantage of fragment-based drug design over high throughput screening. High throughput screening requires an extremely large library of possible drugs and the ability to screen them in parallel to discover molecules that bind the target protein. FBDD requires a much smaller library of simpler molecules, which can be screened for any binding to the target and either extended by chemistry or assembled by rational design into a complete drug. 18

1.7 Outline of this Thesis

The tools used to investigate epigenetic markers, our knowledge of these markers and their effects, and the approaches we have available to control the consequences of dysregulated epigenetic markers are either limited or incomplete. Work in chapter 3 describes investigation of the function of histone methyltransferase SETD7, and progress towards the production of a generalisable system for producing or probing histone modifications at a targeted locus. Chapter 4 covers attempts to produce and monitor targeted epigenetic modifications in vivo in human cells. Chapter 5 details progress made towards the creation of a drug targeting the ET domain of the acetyllysine reader Brd3, as well as characterisation of the ET domain and potential methods of small molecule binding to its active site.

Taken as a whole, this work aimed to produce tools for examining, modifying and interacting with histone modifications and the enzymes that create, remove and read them.

19

2. Materials and Methods

Portions of the materials and methods outlined in this chapter have been adapted from previously published work [157] and standard Mackay/Matthews laboratory protocols. All commercially prepared kits were used according to manufacturer’s instructions unless otherwise indicated.

2.1 Materials

2.1.1 Chemicals and reagents

All reagents shown are analytical grade and were prepared in MilliQ water. Suppliers of materials described in this chapter are listed below.

Table 2.1 Chemicals and Reagents

Chemical/Reagent Supplier 1,4-Piperazinediethanesulfonic acid Sigma-Aldrich (Castle Hill, NSW) (PIPES) 13C glucose Sigma-Aldrich (Castle Hill, NSW) 15 NH4Cl Sigma-Aldrich (Castle Hill, NSW) 2-log DNA ladder New England Bio Labs (Beverly, MA) 2-mercaptoethanol Sigma-Aldrich (Castle Hill, NSW) Acetonitrile Mallinckrodt (Lane Cover (NSW) Agar Amyl media (Dandenong, VIC) Agarose Affymetrix (Santa Clara, CA) Ampicillin Gold Bio (St. Louis, MO) Antibodies Sigma-Aldrich (Castle Hill, NSW) Bis:tris-tricine Sigma-Aldrich (Castle Hill, NSW) Bovine serum albumin, monomeric (BSA) New England Bio Labs (Beverly, MA) CaCl2 Ajax Finechem (Taren Point, NSW) CH3COONa Astral Scientific (Gymea, NSW) cOmplete® Protease Inhibitor Cocktail Roche Applied Science (Castle Hill, NSW) (EDTA Free) Coomassie Brilliant Blue G-250 Bio Basic (Amherst, NY) CutSmart buffer New England Bio Labs (Beverly, MA) Deoxycholate Calbiochem (San Diego, CA) Deoxyribonucleotide triphosphates Roche Applied Science (Castle Hill, NSW) (dNTPs) Dimethyl sulfoxide (DMSO) Sigma-Aldrich (Castle Hill, NSW) Dithiothreitol (DTT) Sigma-Aldrich (Castle Hill, NSW) DNase Prepared in-house by Dr. Jason Low Dulbecco’s Modified Eagle’s Medium Gibco (Scoresby, VIC) (DMEM) Ethanol Chem-Supply (Gillman, SA)

20

Chemical/Reagent Supplier Ethylenediaminetetraacetic acid (EDTA) Ajax Finechem (Taren Point, NSW) Formic acid Sigma-Aldrich (Castle Hill, NSW) gBlocks gene fragments IDT (Boronia, VIC) Glucose Chem-Supply (Gillman, SA) Glutathione Sepharose 4B resin GE Healthcare (Parramatta, NSW) Glycerol Univar (Auburn, NSW) Glycine Sigma-Aldrich (Castle Hill, NSW) Guanidinium HCl Sigma-Aldrich (Castle Hill, NSW) HCl Univar (Auburn, NSW) 4-(2-hydroxyethyl)-1- Sigma-Aldrich (Castle Hill, NSW) piperazineethanesulfonic acid (HEPES) HRV 3C protease Prepared in-house by James Woodmansey and Athina Manakas (Section 2.3) HydraGreen™ Safe DNA Dye ACTGene Inc. (Piscataway, NJ) IGEPAL Sigma-Aldrich (Castle Hill, NSW) Imidazole Sigma-Aldrich (Castle Hill, NSW) Isopropyl β-D-thiogalactopyranoside Gold Bio (St. Louis, MO) (IPTG) K2HPO4 Chem-Supply (Gillman, SA) K2SO4 Univar (Auburn, NSW) KAc Calbiochem (San Diego, CA) Kanamycin Gold Bio (St. Louis, MO) KCl Ajax Finechem (Taren Point, NSW) KH2PO4 Chem-Supply (Gillman, SA) Lactose Sigma-Aldrich (Castle Hill, NSW) Lysozyme Prepared in-house by Dr. Jason Low Mark 12 Unstained Protein Standards Invitrogen (Mt. Waverly, VIC) McCoy’s 5a Medium Gibco (Scoresby, VIC) MES SDS Running Buffer Invitrogen (Mt. Waverly, VIC) Methanol Chem-Supply (Gillman, SA) Mg2SO4 Univar (Auburn, NSW) MgCl2 Univar (Auburn, NSW) MQW Millipore (Billerica, MA) MnCl2 Univar (Auburn, NSW) Na2HPO4 Ajax Finechem (Taren Point, NSW) NaCl Chem-Supply (Gillman, SA) NaHCO3 Chem-Supply (Gillman, SA) NaOH Chem-Supply (Gillman, SA) NH4Cl Ajax Finechem (Taren Point, NSW) Ni-NTA agarose resin Invitrogen (Mt. Waverly, VIC) NiSO4 Sigma-Aldrich (Castle Hill, NSW) Nitrocellulose GE Healthcare (Parramatta, NSW) Paraformaldehyde Sigma-Aldrich (Castle Hill, NSW) Phosphate-Buffered Saline (PBS) (cell Gibco (Scoresby, VIC) culture grade) Peptone Amyl media (Dandenong, VIC) Phenylmethylsulfonyl fluoride (PMSF) Sigma-Aldrich (Castle Hill, NSW)

21

Chemical/Reagent Supplier Phusion DNA polymerase Prepared in-house by Dr. Jason Low Polyethylenimine Invitrogen (Mt. Waverly, VIC) Protein A-agarose/salmon sperm DNA Sigma-Aldrich, Castle Hill, NSW Proteinase K Bioline (Alexandria, NSW) Puromycin Thermo Fisher Scientific (Scoresby, VIC) RbCl Chem-Supply (Gillman, SA) Restriction enzymes New England Bio Labs (Beverly, MA) Sodium dodecyl sulfate (SDS) Amresco (Taren Point, NSW) Thermosensitive alkaline phosphatase New England Bio Labs (Beverly, MA) Thiamine Sigma-Aldrich (Castle Hill, NSW) Trace element solution Thomas Scientific (Swedesboro, NJ) Tris(hydroxymethyl)aminomethane (Tris) Chem-Supply (Gillman, SA) Triton X-100 Sigma-Aldrich (Castle Hill, NSW) Trypsin-EDTA solution Gibco (Scoresby, VIC) Urea Astral Scientific (Gymea, NSW) Yeast extract Amyl Media (Dandenong, VIC)

2.1.2 Plasmids and oligonucleotides

The DNA sequence for Human Brd3 ET in pGEX 6P was gifted by the Gamsjaeger Laboratory (University of Western Sydney). Sequences for dCas9 fusion constructs in the iCas9v2 vector, HER2 gRNAs in the pgRNA vector, and the pBABE-puro plasmid were gifted by the Segal Laboratory (University of California, Davis). The sequence for Histone 3.3 in PET3 was prepared by Jessica Zhong (University of Sydney). The sequence for GFP in pcDNA3 was prepared by Gabrielle McClymont (University of Sydney). Other vectors used were available in the laboratory from prior research, and other plasmids were cloned by the author as indicated.

Oligonucleotides for use as PCR templates (gBlocks) were synthesised by Integrated DNA Technologies (IDT, Baulkham Hills, NSW). Dideoxy sequencing was performed by the Australian Genome Research Foundation (AGRF, Westmead, NSW).

2.1.3 Bacterial strains and culture media

For DNA cloning and amplification, the E. coli strain DH5α (supE44 ΔlacU169 [Φ80lacZΔM15] hsdR17 recA1 endA1 gyrA96 thi-1 RelA1; Bethesda Research Laboratories, Gaithersburg, MD) was used.

- - - + r Protein expression was carried out in E. coli BL21 (DE3) (F ompT hsdS[rB mB ] dcm Tet gal λ(DE3) endA The [argU ileY leuW Camr]; Agilent Technologies, Mulgrave, Vic) where not 22

– – – otherwise indicated. Where specified, E. coli BL21 pLysS (F ompT gal dcm lon hsdSB(rB mB )

+ S R λ(DE3 [lacI lacUV5-T7p07 ind1 sam7 nin5]) [malB ]K-12(λ ) pLysS[T7p20 orip15A](Cm ); Promega, Alexandria, NSW) was used.

All bacterial culture media were prepared in deionised water.

Table 2.2 Bacterial culture media.

Culture Medium* Ingredients Luria broth (LB) 1% w/v tryptone 0.5% w/v yeast extract 0.5% w/v NaCl 2 × TY broth 1.6% w/v tryptone 1% w/v yeast extract 0.5% w/v NaCl Minimal media 1x minimal media salts (see below) 30 µg/mL thiamine 10 mM MgCl2 1% v/v trace element solution [158] 0.1% w/v yeast extract 15 0.1% w/v NH4Cl (or NH4Cl) 0.3% w/v glucose (or 13C glucose) Minimal media salts 13 g/L KH2PO4 10 g/L K2HPO4 9 g/L Na2HPO4 2.4 g/L K2SO4 * Antibiotics were added as required: ampicillin (Amp+) 100 µg/mL, kanamycin (Kan+) 100 µg/mL, chloramphenicol (Cam+) 34 µg/mL.

2.2 Molecular Biology

Genes of interest were cloned or subcloned into plasmids using restriction enzyme digestion and DNA ligation of complementary sticky ends. Regions of DNA were either amplified by PCR (using primers with appropriate restriction sites) from source plasmids or digested from source plasmids (refer to Section 2.2.1). Amplified genes and vectors were digested with appropriate restriction endonucleases (refer to Table 2.4 for vector and restriction enzyme usage). Following 1 h of digestion, vectors were treated with thermosensitive alkaline phosphatase and digested for a further 30 min. DNA was isolated by agarose gel electrophoresis (Section 2.5.1). Bands corresponding to inserts and vectors were excised and purified using the Isolate II PCR and Gel Kit (Bioline, Alexandria, NSW). Ligation was performed as in Section 2.2.2. Ligated plasmids were used to transform competent E. coli DH5α cells (Section 2.2.3), and single colonies cultured overnight in LB Amp+, Kan+, or Cam+ as

23 appropriate for the plasmid (Table 2.2). Plasmid DNA was amplified and purified from overnight cultures (as in Section 2.2.4).

The success of cloning was determined by an appropriate diagnostic digest using restriction endonucleases and diodeoxy sequencing performed by the AGRF (Section 2.2.5).

Sequences for all genes used are provided in Appendix 1.

2.2.1 Polymerase chain reaction (PCR)

Table 2.3: PCR mixture components.

Component Quantity (µL)

DNA template (20 ng/µL) 1 Forward primer (20 µM) 1 Reverse primer (20 µM) 1 Phusion DNA polymerase (50 U stock) 1 Deoxy-nucleotide triphosphate mix (2.5 mM) 4 5x Phusion buffer (250 mM KCl, 12.5 mM MgSO4, 0.5% (v/v) Triton X- 10 100, 1 mg mL-1 BSA, 50 mM Tris, pH 8.8) MilliQ Water (MQW) 32 Total 50

Reaction mixtures were prepared for PCR as described in table 2.3. PCR was performed in an Eppendorf MasterCycler Nexus Gradient Thermocycler (Merck Milipore, Bayswater, VIC). The program used consisted of:

 an initial 60 s denaturation at 98 °C;  35 cycles each containing a 10 s denaturation step at 98 °C, a 20 s annealing step at 62 °C, and a 30 s extension step at 72 °C; and  A final 60 s extension step at 72 °C.

PCR products were analysed via agarose gel electrophoresis (Section 2.5.1). Bands corresponding to the expected size of the desired products were excised and purified using the Isolate II PCR and Gel kit (Bioline, Alexandria, NSW).

24

2.2.2 Vector ligation

Purified PCR products or gBlocks and appropriate expression plasmids were digested with 20 U of appropriate restriction enzymes in CutSmart buffer (50 mM potassium acetate, 20 mM Tris-acetate pH 7.9, 10 mM magnesium acetate, 100 µg/mL BSA) at 37 °C for 90 min. Plasmids and restriction enzymes used are presented in Table 2.4.

Table 2.4: Restriction enzyme and expression plasmid usage for each gene of interest.

Gene Vector Restriction Enzymes Brd3 ET (and truncations) pGEX-6P BamHI/EcoRI dCas9 fusion constructs (all) iCas9v2 KpnI/FseI GFP pcDNA3 BamHI/EcoRI HER2 guide RNAs pgRNA SpeI/HindIII Histone 3.3 pET3 BamHI/EcoRI SETD7 (bacterial use) pET15bBE BamHI/EcoRI Split GFP (C terminal constructs) pCOLA-Duet NdeI/KpnI Split GFP (N terminal construct) pET-Duet NcoI/EcoRI Split SETD7 (all C terminal pCOLA-Duet NdeI/KpnI constructs) Split SETD7 (all N terminal pET-Duet NcoI/EcoRI constructs)

Digestion products were purified using the Isolate II PCR and Gel kit. Purified insert (10–20 ng) and plasmid were combined in a 1:1 molar ratio, and ligated using the Quick-Stick ligase kit (Bioline, Alexandria, NSW). The ligation mixture was transformed into E. coli DH5α cells (Section 2.3.3).

2.2.3 E. coli transformation

Chemically competent E. coli DH5α and BL21(DE3) cells were produced in-house [159] and stored at -80 °C until needed. A 50 µL aliquot of thawed competent cells was combined with

~20 ng of plasmid DNA and 50 µL of TFB1 buffer (100 mM RbCl, 50 mM MnCl2, 30 mM KAc,

10 mM CaCl2, 15% v/v glycerol, pH 5.8) and incubated on ice for 30 min. Cell were heat shocked at 42 °C for 50 s, then returned to ice for 5 min. LB medium (200 mL) was added and the mixture incubated for a further 60 min at 37 °C. Cells were plated onto LB-agar containing

25

100 mg/mL ampicillin and/or 33 mg/mL kanamycin (as appropriate for the plasmids being transformed) and incubated at 37 °C for ~16 h.

2.2.4 Plasmid purification

Individual colonies from incubated E. coli DH5α plates were used to inoculate 50 mL of LB Amp+ and/or Kan+ (as appropriate for the plasmid being purified; Table 2.2). The culture was incubated at 37 °C with shaking at 150 RPM for ~16 h, and cells pelleted via centrifugation at 5000 g for 10 min. Plasmid DNA was purified from the cells using either the Isolate II Plasmid Mini Kit (Bioline, Alexandria, NSW) or the NucleoBond Xtra Midi Kit (Machery-Nagel, Düren, Germany).

2.2.5 Dideoxy sequencing

Purified plasmid sequences were verified using the dideoxy sequencing service provided by the Australian Genomic Research Facility (Westmead, NSW) using recommended sequencing primers for each vector.

2.3 Bacterial Protein Expression and Purification

2.3.1 Overexpression by IPTG induction

Medium (LB Amp+ and/or Kan+ as appropriate for the E. coli strain and the plasmid being expressed; Table 2.2) was inoculated with single colonies of transformed E. coli BL21(DE3) cells and incubated overnight at 37 °C with shaking at 150 RPM, in a container at least 5 × the volume of culture. This culture was used to inoculate fresh LB to an OD600 of 0.05 in a container at least 5 × the volume of culture, to which 2% w/v glucose was added and which was further incubated at 37 °C with shaking at 150 RPM to an OD600 of 0.5 – 0.8. At this point expression was induced with IPTG (0.1 mM, 0.4 mM, 1 mM or 10 mM) for 4 h (37 °C), 8 h (25 °C) or 16 h (18 °C). Cells were harvested by centrifugation at 5000 g (4 °C, 10 min). If not used

immediately, pellets were snap-frozen in liquid N2 and stored at -20 °C.

2.3.2 Overexpression by autoinduction

Medium (2 × TY Amp+ and/or Kan+, as appropriate for the E. coli strain and the plasmid being expressed; Table 2.2) was inoculated with single colonies of transformed E. coli BL21(DE3) cells and incubated overnight at 37 °C with shaking at 140 RPM, in a container at least 5 × the volume of culture. This culture was diluted 1:50 into fresh LB in a container at least 5 × the

26 volume of culture. Studier salts (25 mM Na2HPO4, 25 mM KH2PO4, 2 mM Mg2SO4, 25 mM

NH4Cl, 0.5% glycerol, 0.2% lactose, 0.05% glucose) [160] were added to a final concentration of 1 × to induce expression, and cultures incubated for 6 h (37 °C), 20 h (25 °C) or 28 h (18 °C). Cells were harvested by centrifugation at 5000 g (4 °C, 10 min). If not used immediately,

pellets were snap-frozen in liquid N2 and stored at -20 °C.

2.3.3 Expression of isotopically labelled proteins

LB Amp+ (Table 2.2) was inoculated with single colonies of transformed E. coli BL21(DE3) cells and incubated overnight at 37 °C with shaking at 150 RPM, in a container at least 5 × the volume of culture. This culture was used to inoculate fresh LB to an OD600 of 0.05 in a container at least 5 × the volume of culture, to which 2% w/v glucose was added and which was further incubated at 37 °C with shaking at 150 RPM to an OD600 of 0.4 – 0.6. Cells were pelleted by centrifugation at 5000 g (4 °C, 10 min). Cells were washed by resuspending in minimal medium salts (500 mL per 1 L of LB culture) and pelleting again. The washing process was repeated.

Washed cells were resuspended in minimum medium (500 mL per 1 L of LB culture) containing

15 13 1 g/L NH4Cl, and an additional 3 g/L C glucose if producing double-labelled protein. Cultures were grown at 37 °C with shaking at 150 RPM for 60 min. Expression was induced with 1 mM IPTG and cultures grown at 37 °C with shaking at 150 RPM for ~16 h. Cells were harvested by centrifugation at 5000 g (4 °C, 10 min). If not used immediately, pellets were

snap-frozen in liquid N2 and stored at -20 °C.

2.3.4 Bacterial cell lysis

Cell pellets were resuspended in an appropriate lysis buffer (Table 2.5, 30 mL/L culture). Lysozyme was added to 100 µg/mL and resuspended cells incubated at 4 °C for 20 min with agitation. DNase was added to 10 µg/mL and the incubation repeated. Cells were lysed via

® 3 × freeze-thaw in liquid N2 and sonication (Branson Sonifier 250, 30 second bursts, 40% power cycle, microtip) on ice until complete lysis occurred. Soluble and insoluble lysate fractions were separated by centrifugation at 18000 g (4 °C, 15 min).

27

Table 2.5: Lysis buffers used for protein purification.

Proteins Lysis Buffer SETD7, split GFP (both), split SETD7 50 mM Tris (pH 7.5) (all) 200 mM NaCl 5 mM DTT 20 mM Imidazole 1x cOmplete™ Protease Inhibitor Cocktail (EDTA- free) (Roche, Millers Point, NSW) Histone 3.3 50 mM Tris (pH 7.5) 100 mM NaCl 1 mM 2-mercaptethanol Brd3 ET (and truncations) 1x PBS (pH 7.5) 500 mM NaCl 0.1% v/v IGEPAL 1x cOmplete™ Protease Inhibitor Cocktail (EDTA- free) (Roche, Millers Point, NSW)

2.3.5 Affinity chromatography

For purification of soluble proteins containing a 6×His tag, soluble fractions were applied to Ni-NTA agarose resin that had been equilibrated with lysis buffer, and incubated for 1 h at 4 °C with agitation. The resin was washed with 5 column volumes of lysis buffer and six elution fractions collected, each using 0.5 column volumes of lysis buffer containing a step gradient of increasing concentrations of imidazole (100–600 mM).

For purification of soluble proteins containing a GST tag, soluble fractions were applied to Glutathione Sepharose 4B resin that had been equilibrated with lysis buffer, and incubated for 1 h at 4 °C with agitation. The resin was washed with 5 column volumes of lysis buffer and five elution fractions collected, each using 0.5 column volumes of lysis buffer containing 10 mM reduced glutathione.

2.3.6 Dialysis and concentration of proteins

Fractions containing proteins of interest were pooled. Proteins were dialysed overnight at 4 °C into 50 mM Tris, 100 mM NaCl overnight using SnakeSkin Pleated Dialysis Tubing (3500 MW cutoff) (Thermo Fisher Scientific, Scoresby, VIC). Following dialysis, proteins were filtered using sterile 0.2 µm syringe filter tips (Millipore, Bayswater, VIC), and concentrated using Vivaspin centrifugal concentrators (3000 MW cutoff) (Sartorius, Dandenong South, VIC).

28

2.3.7 Cleavage of affinity tags

Following affinity chromatography, the GST or 6×His tag was cleaved from the purified protein via incubation with HRV 3C protease (~1 U) at 4 °C overnight.

2.3.8 Purification and refolding of insoluble histones

Untagged Histone 3.3 was purified from bacterial inclusion bodies. Following cell lysis and centrifugation as described in Section 2.3.4, the insoluble pellet was resuspended in 25 mL (per L of LB culture) histone lysis buffer containing 1% v/v Triton X-100 and centrifuged (18000 g, 4 °C, 10 min). The resuspension and centrifugation process was repeated once as described, then twice more using histone wash buffer without added Triton X-100. The washed pellet was resuspended in 10 mL (per L of LB culture) histone unfolding buffer (6 M guanidinium HCl, 20 mM CH3COONa, 1 mM DTT, pH 5.2), and agitated at RT for 3 h. Undissolved material was pelleted by centrifugation (18000 g, 4 °C, 10 min). The soluble fraction was dialysed into SAUDE200 buffer (7 M urea, 20 mM CH3COONa, 200 mM NaCl, 5 mM 2-mercaptoethanol, 1 mM EDTA) and further purified via ion exchange chromatography (Section 2.3.9) and size exclusion chromatography (Section 2.3.10).

2.3.9 Ion exchange chromatography

Histone 3.3 (pI: 11.27) was purified via ion exchange chromatography. An Uno Q1 anion exchange column (Bio-Rad Laboratories, Gladesville, NSW) controlled by a BioLogic Fast protein liquid chromatography (FPLC) system (Bio-Rad Laboratories, Gladesville, NSW) was equilibrated with SAUDE200 buffer (refer to Section 2.3.8) and histone applied. A linear gradient up to 100% SAUDE600 (7 M urea, 20 mM CH3COONa, 600 mM NaCl, 5 mM 2- mercaptoethanol, 1 mM EDTA) and back to 100% SAUDE200 was run through the column over 20 column volumes at 1 mL/min. Elution fractions (1 mL) were collected, and protein- containing fractions determined by monitoring the A280 nm. Protein-containing elution fractions were analysed via SDS-PAGE (Section 2.5.2). Fractions containing >90% pure histone 3.3 were pooled and dialysed into MilliQ water (pH 7.0).

2.3.10 Size exclusion chromatography

All proteins other than histone 3.3 were further purified via size exclusion chromatography (SEC). A Superdex® 75 preparative grade HiLoad 16/60 column (GE Healthcare, Parramatta, NSW) controlled by a BioLogic Fast protein liquid chromatography (FPLC) system (Bio-Rad

29

Laboratories, Gladesville, NSW) or an Akta Pure system (GE Healthcare Parramatta, NSW) was equilibrated with SEC running buffer (100 mM NaCl, 50 mM Tris, pH 7.5) and protein applied. SEC running buffer (1.2 column volumes) was run through the column at 0.8 mL/min. Elution fractions (1 mL) were collected, and protein-containing fractions determined by monitoring the A280 nm. Protein-containing elution fractions were analysed via SDS-PAGE (Section 2.5.2). Fractions containing pure protein were pooled.

2.4 Mammalian Cell Culture

2.4.1 Cell lines and culture conditions

All experiments in mammalian cell lines were performed in either HEK293-FT cells (Thermo Fisher Scientific, Scoresby, VIC) or HCT116 cells (generously provided by the Bailey lab, Centenary Institute, Sydney). Cells from both cell lines were grown in T-25 flasks at 37 °C and passaged every three days or when 90% confluency was reached, whichever occurred sooner. HEK293-FT cells were grown in Dulbecco’s Modified Eagle’s Medium (DMEM) supplemented with 10% foetal bovine serum. HCT116 cells were grown in McCoy’s 5a Medium supplemented with 10% foetal bovine serum.

2.4.2 Transient plasmid transfection

Cells were passaged and split such that they reached 50–70% confluency at the time of transfection. Growth medium was changed immediately prior to transfection. The transfection DNA mix was prepared at a ratio of 0.25 µg per 1 mL of growth medium used to grow the cells being transfected. For the experiments reported here, this consisted of a 1:4:5 ratio of the pBABE-puro puromycin resistance plasmid, pooled HER2 gRNAs, and a dCas9 fusion construct expression vector. A 1:2.5 mixture of polyethylenimine (PEI) and growth medium was prepared to a final volume equal to 10% of the volume of growth medium used to grow the cells being transfected. The pooled transfection DNA was added to this mixture. The transfection mixture was immediately mixed and incubated for exactly 10 min before being introduced, dropwise with shaking, to the cell growth medium.

Cells were incubated at 37 °C overnight. Growth medium was exchanged and puromycin introduced to the medium to a final concentration of 3 µg/mL. Growth medium was exchanged again after a further 24 h and the puromycin concentration maintained. After a final 24 h incubation period, cells were harvested.

30

2.4.3 Cell harvesting and lysis

Growth medium was discarded, and cells washed with an equal volume of PBS. The PBS was discarded and 20% the volume of trypsin-EDTA solution (0.25% w/v) was added. The growth vessel was gently tapped for 1–2 min until all cells had visibly dislodged. Fresh growth medium was added to quench trypsinisation. Cells were pelleted at 300 g for 3 min at room temperature, resuspended in an equal volume of PBS, and pelleted once more. If not used immediately, pellets were snap frozen in liquid N2 and stored at -80 °C.

2.4.4 Formaldehyde crosslinking

Formaldehyde (37%) was added to the growth medium of transfected cells at 27 µL per 1 mL of medium (i.e., to a final concentration of 1% v/v). Cells were gently shaken for 10 min at RT. Glycine (1.375 M) was added at 100 µL per 1 mL of medium to quench the crosslinking reaction. Growth medium was removed, and cells washed with 2 × 10 mL cold PBS. Another 10 mL of PBS was added and cells were detached with a scraper. Cells were pelleted by centrifugation (1500 g, 4 °C, 10 min). If not used immediately, the cell pellet was snap frozen in liquid N2 and stored at -80 °C.

2.5 Gel Electrophoresis

2.5.1 Agarose gel electrophoresis

HydraGreen stain (1 µL) was added to 50 mL 1% molten agarose in TAE immediately prior to pouring. Sucrose was added to samples to a concentration of 4% w/v prior to loading, and samples run with 2-log standards at 100 V for 45 min. Gels were visualised under ultraviolet light.

2.5.2 SDS-PAGE

Samples were diluted into 4x loading dye (250 mM Tris-HCl (pH 6.8), 8% (w/v) sodium dodecyl sulfate (SDS), 0.2% (w/v) bromophenol blue,40% (v/v) glycerol, 20% (v/v) β-mercaptoethanol) prior to loading, heated at 95 °C for 10 minutes, and run on 4–12% Bis-Tris Acrylamide gels with Mark 12 Unstained Protein Standards at 200 V, 120 mA for 35 min. Gels were then either stained immediately or imaged via laser scanner (Typhoon FLA 9000; GE Life Sciences) prior to staining. To stain, gels were covered in a sufficient volume of 0.001% w/v Coomassie brilliant blue, heated in a microwave for 30 s, then incubated with gentle rocking overnight. Gels were destained in ROW overnight. 31

2.5.3 Western blot

Samples were diluted into 4x loading dye prior to loading, heated at 95 °C for 10 minutes, and run on 4–12% Bis-Tris Acrylamide gels with Novex Sharp Pre-Stained Protein Standards (Thermo Fisher Scientific, Scoresby, VIC) and MagicMark XP Western Protein Standards (Thermo Fisher Scientific, Scoresby, VIC) at 200 V, 120 mA for 35 min. Proteins were transferred to a nitrocellulose membrane in transfer buffer (Table 2.2) at 30 V, 400 mA for 1 h. The membrane was washed for 3 × 5 min in PBS-T and blocked with 5% w/v skim milk powder in PBS-T for 1 h. Following three further washes, the membrane was incubated with an appropriate mouse antibody (in this thesis, monoclonal ANTI-FLAG antibody #F3165, Sigma-Aldrich, Castle Hill, NSW) for 1 h, and washed three more times. The membrane was then incubated with HRP-conjugated goat anti-mouse secondary antibody (1:25000) (#A5278, Sigma-Aldrich, Castle Hill, NSW) for 40 min and washed three final times with PBS alone. The membrane was exposed to Western Lightning Chemiluminescent HRP Substrates (PerkinElmer, Waltham, Massachusetts) according to the manufacturer’s instructions and immediately scanned using a C-DiGit Blot Scanner (LI-COR Biosciences, Lincoln, NE).

2.5.4 Native PAGE

Glycerol was added to all samples to a final concentration of 5% v/v. Samples were run on NativePAGE 3–12% Bis-Tris Protein Gels (Thermo Fisher Scientific, Scoresby, VIC) with NativeMark unstained protein standards (Thermo Fisher Scientific, Scoresby, VIC) at 150 V, 100 mA for 90 min. Bis:tris-tricine (100 mM) was used as both the anode and the cathode buffer. Gels were imaged using a Typhoon FLA 9000 gel imager (GE Healthcare, Parramatta, NSW).

2.6 Mass Spectrometry

For LC-MS/MS, peptides were resuspended in 3% (v/v) acetonitrile, 0.1% (v/v) formic acid and loaded onto a 20 cm × 75 µm inner diameter column packed in-house with 1.9 µm C18AQ particles (Dr Maisch GmbH HPLC, Entringen, Ammerbuch, Germany) using an Easy nLC-1000 nanoHPLC (Thermo Fisher Scientific, Scoresby, VIC). Peptides were separated using a linear gradient of 5–30% Buffer B over 30 min at 300 nL/min (Buffer A = 0.1% (v/v) formic acid; Buffer B = 80% (v/v) acetonitrile, 0.1% (v/v) formic acid).

Mass analyses were performed using a Q-Exactive mass spectrometer (Thermo Fisher Scientific, Scoresby, VIC). Following each full-scan MS1 at 70,000 resolution at 200 m/z (200– 32

2000 m/z; 3×106 AGC target; 50 ms injection time), up to 10 most abundant precursor ions were selected for MS/MS (17,500 resolution; 5 × 104 AGC; 60 ms injection time; 25, 27, 30 stepped-NCE; 2 m/z isolation window; 1.7 × 104 intensity threshold; allowable charge states of +2–5; dynamic exclusion of 20 s). A lock mass was utilised during MS1 with m/z = 445.12002 used for internal recalibration in real time. Peak lists were generated using the Proteowizard MSconvert tool and submitted to the database search program Mascot (Matrix Science, Boston, MA). The data were searched with oxidation (M), acrylamide (C) and carbamidomethyl (C) as variable modifications using a precursor-ion and product-ion mass tolerance of ±10 ppm and ±0.02 Da, respectively. The enzyme specificity was trypsin with up to 2 missed cleavages and all taxonomies in the Swiss-Prot database (May 2018; 557,491 entries) were searched. A decoy database of reversed sequences was used to estimate the false discovery rates. To be considered for further analysis, identified peptides had to be top- ranking and statistically significant (p < 0.05) according to the Mascot expect metric.

2.7 RNA Quantification

2.7.1 RNA extraction from mammalian cells

RNA extraction was performed using the SV Total RNA Isolation System (Promega, Alexandria, NSW). Purified RNA was stored in MilliQ water at -20 °C.

2.7.2 Reverse transcription

Production of cDNA via reverse transcription was performed using the SuperScript IV First- Strand Synthesis Kit (Thermo Fisher Scientific, Scoresby, VIC). cDNA was stored in MilliQ water at -20 °C.

2.7.3 Quantitative polymerase chain reaction qPCR was performed in 96 well plates. For each set of experiments, two master mixes were prepared, each containing SYBR Green JumpStart Taq ReadyMix (Sigma-Aldrich, St Louis, MO) (2x) and 0.4 µM each of either HER2 forward and reverse primers, or GAPDH forward and reverse primers (Appendix 2). DNA for analysis (1 µL) was added to each well at a concentration of either 10 ng/µL or 1 ng/µL, 5 µL of master mix added and the volume made up to 10 µL with MQW. Duplicates of every combination of primer target, DNA sample and DNA concentration were set up. Additionally, duplicate no-template control (NTC) reactions were set up for each pair of primers. In these controls the DNA was replaced with MQW. qPCR

33 was run on a 7500 Fast Real-Time PCR System (Thermo Fisher Scientific, Scoresby, VIC). The program used consisted of an initial melt step of 20 s at 95 °C, forty cycles of denaturation (95 °C, 3 s) and elongation (60 °C, 30 s), and generation of a final melt curve (95 °C for 15 s, 60 °C for 60 s, 95 °C for 15 s, 60 °C for 15 s).

2.8 Quantification of Histone Post-Translational Modifications

2.8.1 Chromatin immunoprecipitation

Cross-linked cell pellets (refer to Section 2.4.4) were resuspended in 10 mL/100 million cells ChIP lysis buffer (5 mM PIPES, 85 mM KCl, 0.5% v/v IGEPAL, pH 8.0) and were incubated on ice for 10 min. Nuclei were pelleted by differential centrifugation at 1000 g for 10 min at 4 °C. The supernatant was removed and the nuclear pellet was resuspended in 1 mL nuclei lysis buffer (50 mM Tris, 10 mM EDTA, 1% v/v SDS, 1x cOmplete™ Protease Inhibitor Cocktail (EDTA-free) (Roche, Millers Point, NSW), pH 8.0) and incubated for 10 min on ice. Chromatin was sheared by sonication (Branson Sonifier® 250, 6 × 15 s bursts, 40% power cycle, stepped tip). If not used immediately, sheared chromatin was stored at -80 °C.

Sheared chromatin was pelleted at 18000 g for 15 min at 4 °C. The pellet was discarded. Chromatin concentration in the supernatant was determined by UV spectrophotometry (NanoDrop, Thermo Fisher Scientific, Scoresby, VIC). An aliquot of 100 µg chromatin was made for each antibody condition. Each chromatin aliquot was diluted to a final volume of 300 µL with nuclei lysis buffer. Chromatin was precleared by addition of 50 µL of protein A- agarose/salmon sperm DNA and incubation at 4 °C with rotation for 60 min. These samples were centrifuged at 3000 g for 5 min at 4 °C, and antibodies of choice (5 µg) was added to the supernatant. Samples were incubated overnight at 4 °C with rotation.

Protein A-agarose/salmon sperm DNA (50 µL) was added to each sample and the samples were incubated at 4 °C with rotation for 120 min. Beads and supernatant were separated via centrifugation at 3000 g for 5 min at 4 °C and the supernatant was removed. High-salt ChIP wash buffer (50 mM HEPES, 500 mM NaCl, 1 mM EDTA, 0.1% v/v SDS, 1% v/v Triton X-100, 0.1% w/v deoxycholate, pH 7.9) was added to the beads to a final volume of 1 mL and samples were incubated at room temperature with rotation for 10 min. Samples were centrifuged at 3000 g for 2 min at room temperature and the supernatant was removed. Addition of high- salt ChIP wash buffer, incubation and centrifugation were repeated three further times. Two further washes were performed using TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0).

34

Washed beads were resuspended in 300 µL of nuclei lysis buffer supplemented with 20 µg proteinase K and incubated for 120 min at 55 °C. Crosslinking was reversed by incubation overnight at 65 °C. Samples were centrifuged at 18000 g at room temperature for 5 min and the DNA-containing supernatant was transferred to a fresh microcentrifuge tube. DNA was purified using the Isolate II PCR and Gel kit (Bioline, Alexandria, NSW), and used for qPCR as described in Section 2.7.3.

2.8.2 In vitro histone methyltransferase assay

A reaction mixture consisting of 100 pmol SETD7, 100 pmol histone 3.3 and 1 µCi of 3H S- adenosyl methionine (AdoMet; 1 mCi/mL) (PerkinElmer, Waltham, Massachusetts) was combined in methyltransferase buffer (20 mM Tris, 100 mM NaCl, 1 mM EDTA, 1 mM DTT, pH 7.5) (total reaction volume: 20 µL). Reactions were incubated at 30 °C for 30 min. Reaction mixtures were blotted onto 1 cm2 squares of nitrocellulose. Nitrocellulose squares were immersed in 50 mM NaHCO3 and incubated with rocking at RT for 10 min. NaHCO3 was removed and this wash step repeated twice. Washed membranes were placed in glass scintillation vials and 5 mL of Filter Count scintillation cocktail (PerkinElmer, Waltham, Massachusetts) was added. Each vial was vortexed until the nitrocellulose had completely dissolved.

Scintillation events of each vial, plus 3H and non-radioactive MQW standards, were measured using a Tri-Carb 4810 liquid scintillation counter (PerkinElmer, Waltham, Massachusetts). Scintillation events were recorded for 2 min per sample, and counts for each minute averaged and controlled for detector efficiency to determine mean disintegrations per minute (DPM).

2.8.3 Autoradiography

Histone methyltransferase assays were prepared as in Section 2.8.2. Instead of being blotted onto nitrocellulose, reaction mixtures were loaded onto a polyacrylamide gel as described in Section 2.5.2. The completed gel was sealed in plastic wrap and dried using a gel dryer (Bio- Rad Laboratories, Gladesville, NSW). The dried gel was placed on top of autoradiographic film (Kodak, Box Hill, VIC) in a dark room, and the assembly was sealed in a light-proof cassette. After 7 days, the film was removed in a dark room, and developed using an Agfa CP1000 film processor (Radincon, Dee Why, NSW).

35

2.9 Crystal Screening

Crystal screening was performed in MRC 96-well 2-drop plates using the commercial screens PACT (Molecular Dimensions, Holland, OH), Index-HT (Hampton Research, Aliso Viejo, CA), PEGRx (Hampton Research, Aliso Viejo, CA) and JCSG (Molecular Dimensions, Holland, OH). Aliquots of screening solutions (10 µL) were dispensed into trays with a Freedom EVO 100 Base Liquid Handling Robot (Tecan, Port Melbourne, VIC). Purified, concentrated protein was mixed with mother liquor in either a 1:1 or 2:1 ratio (drop volume: 2 µL or 3 µL) and added to the trays using a Mosquito Crystal Liquid Handling Robot (TTP LabTech, Melbourne, UK). Sealed trays were stored at 16 °C and checked for crystal growth via light microscope. Crystals were removed from trays with a 0.05 mM cryoloop, flash frozen in liquid N2, and analysed on the MX2 beamline at the Australian Synchrotron [161].

2.10 Nuclear Magnetic Resonance (NMR)

Protein samples for use in NMR were dialysed into NMR buffer (20 mM MES, 100 mM NaCl, 1 mM TCEP) using SnakeSkin Pleated Dialysis Tubing (3500 MW cutoff) (Thermo Fisher

Scientific, Scoresby, VIC). Samples were diluted to 150 µM total volume in 5% v/v D2O and 150 µM DSS in 3 mm NMR tubes (Shigemi, Tokyo, Japan). All spectra were acquired on Avance III 600 or 800 MHz NMR spectrometers (Bruker, Karlsruhe, Germany) with a fitted cryogenic TCI probe head, at 25 °C. Spectra were processed using TOPSPIN3 (Bruker, Karlsruhe, Germany).

Triple resonance spectra collected for Brd3 ET:MFP-7210 complex docking calculations included 15N-HSQC, HNCACB, CBCA(CO)NH, CC(CO)NH, HBHA(CO)NH and H(C)CH TOCSY experiments [162]. A 1H-1H NOESY and COSY were collected on MFP-7210 alone, and a filtered NOESY and COSY were collected for the Brd3 ET:MFP-7210 complex to facilitate assignment of the 1H resonances from MFP-7210 alone and in the Brd3 ET:MFP-7210 complex. A 3D ω1- 13C, 15N filtered, ω3-13Cali edited [1H, 1H]-NOESY [163] was collected on the ET:7210 complex to generate intermolecular NOEs.

2.11 Molecular Docking

The HADDOCK webserver was used for docking calculations [164]. Intermolecular distance restraints between the ligand and protein (according to the NOEs in Table 5.1) were all set at

36

5 Å. The restraints were tabulated in CNS format as input for HADDOCK. Settings were as described by Mohanty et al (2016) for weak protein-ligand complexes [165]. Briefly, molecular dynamics simulations were switched off (MD steps = 0) for both rigid body docking (it0) and the first slow cooling step of the simulated annealing process (it1). The weighting function for distance restraints was maximised for the scoring of structures following rigid body docking to maximise the number of acceptable starting structures utilised for the semi-flexible, slow- cooling iteration. Explicit solvent refinement was performed after the simulated annealing step. The final calculations were set to generate and score 1000 docked structures in the it0 iteration, 200 structures in the it1 iteration and 200 structures were evaluated in the final analysis. Structures were observed and figures rendered using PyMOL [166].

37

3. Design, Expression and Characterisation of Split Epigenetic Enzyme Constructs

3.1 Introduction

The goal of this work was to produce split enzyme constructs that are functional in mammalian cells. However, the mammalian cell is a complicated system that can confound troubleshooting of many issues, in particular those where the physical and/or structural properties of proteins and protein fragments need to be considered. It was decided that split enzyme designs could be best tested using bacterially expressed proteins. These aspects of the project are described in this chapter.

3.2 Enzyme Selection

Over 100 enzymes involved in epigenetic modification and maintenance have now been identified and partly characterized [167], but relatively few have been studied in detail. Research to date has focused on small numbers of enzymes such as DNA methyltransferase 3 alpha (Dnmt3a) [168-171]. A literature search was carried out to compile known information about epigenetic enzymes to provide a framework for choosing enzymes to focus on for proof-of-concept studies.

3.2.1 Construction of human epigenetic enzyme datasheet

Data were sourced from databases such as the Human Epigenetic Enzyme and Modulator Database [97], UniProt [172] and HIstome [167]. The list was intended to be broad but not necessarily comprehensive. In all, 181 enzymes were considered and categorised, as described in the following Section. An excerpt from the list illustrating the type of information gathered is shown in Figure 3.1, and a full list of enzymes is available in Appendix 3.

38

(quaternary) structures. structures. (quaternary) Figure 3.1: Excerpt

from from the compiled epigenetic enzyme datasheet .

Structure

availability availability refers only to published 3D

39

3.2.2 Selection of enzymes for proof-of-concept experiments

While there are many different characteristics of the listed enzymes that could be considered, the criteria used to identify enzymes that might be suitable for proof-of-concept work were as follows:

 The enzyme must perform an epigenetic function of known biological significance. This is necessary for future cell-based experiments as it allows for examination of a known phenotype, and increases the significance and potential usefulness of any findings.  The enzyme must function as a monomer, as the split enzyme construct approach is best suited to monomers. Some dimers could be suitable but would add complexity to the design and troubleshooting procedures.  The enzyme must have minimal required post-translational modifications, as bacterial expression is unable to replicate PTMs.  The enzyme must possess either constitutive activity, a known mutation causing constitutive activity, or a straightforward mechanism of activation.  The gene for the enzyme must be easily obtainable.  Cofactor requirements for protein function must be nonexistent or minimal.  One or more published 3D structures of the protein must exist, including the catalytic domain(s) of the enzyme. The presence of a structure informs design of the split constructs, as detailed in Section 3.4.1, and confirms that the protein can be expressed at reasonable to high levels.  Publications detailing expression conditions and in vitro functionality assays for the enzyme must be easily obtainable.

Based on these criteria, three enzymes were selected as candidates: SETD7, KMT2D and KAT8.

Histone-lysine N-methyltransferase SETD7 is a 41 kDa histone 3 lysine 4 mono- methyltransferase. The structure (Figure 3.2) and mechanisms of SETD7 are well studied (e.g., [173, 174]) and the protein can be readily expressed in E. coli [173, 174]. Studies have also confirmed its function in human cell culture via chromatin immunoprecipitation (ChIP) and

40 quantitative reverse transcription polymerase chain reaction (RT-qPCR) experiments. For example, upregulation of SETD7 causes methylation at loci corresponding to genes such as NF-κB and interleukins [175]. Histone 3 lysine 4 mono-methylation (H3K4me) is a mark of active transcription and prevents the binding of transcriptional repressors to chromatin [175, 176].

Figure 3.2: Structure of human SETD7 bound to the cofactor S-adenosyl methionine. From PDB ID 3CBP [173]. The methyl-donating substrate S-adenosyl methionine (AdoMet) is shown in yellow.

Histone-lysine N-methyltransferase 2D (KMT2D – also known as MLL2) is a histone 3 lysine 4 tri-methyltransferase. The full-length protein is 593 kDa in size, but the isolated 27 kDa C- terminal methyltransferase domain can be expressed in E. coli and purified with enzymatic activity intact [177]. The crystal structure of this domain has been determined (Figure 3.3) [177]. The methyl transferase domain of KMT2D has also been shown to be active in cultured cells and in vivo [178-180]. H3K4me3 is a mark of active transcription [46].

41

Figure 3.3: Structure of human KMT2D bound to the cofactor S-adenosyl methionine. From PDB ID 4Z4P [177]. The methyl-donating substrate AdoMet is shown in yellow.

Histone acetyltransferase KAT8 (also known as MYST1 and hMOF) is a histone 4 lysine 16 acetyltransferase. The full-length protein has a size of 52 kDa, and harbours a catalytic domain of 33 kDa that is sufficient for activity [181]. The acetyltransferase domain has been successfully expressed in E. coli [181, 182] and has been shown to be capable of acetylating histone 4 both in vitro [181] and in vivo [182]. Histone 4 lysine 16 acetylation (H4K16ac) has numerous roles, including in chromatin formation, activation of transcription, and DNA damage repair [182, 183]. The crystal structure of the catalytic domain of this protein has been determined (Figure 3.4) [184].

42

Figure 3.4: Structure of KAT8 bound to the cofactor acetyl coenzyme A. From PDB ID 2P8Q [184]. The acetyl-donating substrate acetyl coenzyme A is shown in yellow.

3.3 SETD7 Methyltransferase Domain Assays

After consideration, SETD7 was the enzyme chosen for expression, purification and in vitro assays, as it is the best studied of the chosen enzymes and is reportedly easily expressed and purified [173, 174]. The ultimate goal was the creation of a system for expressing a pair of split SETD7 constructs capable of reconstituting and regaining methyltransferase activity. However, before embarking on this it was important to confirm that the SETD7 methyltransferase domain could be produced as an intact recombinant construct, and exhibit methyltransferase activity.

3.3.1 Expression and purification of the SETD7 methyltransferase domain

SETD7 expression and purification was based on the protocol used by Kwon et al. [174]. The DNA sequence for the human SETD7 methyltransferase domain (residues 110–366; 28.8 kDa) was cloned into the pET15bBE vector (refer to Section 2.2), which meant that the expressed protein contained an N-terminal hexa-histidine tag, 6xHis, and an HRV-3C protease cleavage site for the removal of this tag. The cloned vector was transformed into E. coli BL21(DE3) cells, induced with 0.1 mM or 1 mM IPTG and grown either at 18 °C for 16 h or 22 h, 25 °C for 12 h or 16 h, or 37 °C for 4 h or 6 h (refer to Section 2.3.1). Following overexpression, cells were lysed, soluble and insoluble fractions were separated via centrifugation and SETD7 isolated

43 via Ni-NTA affinity chromatography (Figure 3.5, time course data not shown) (refer to Sections 2.3.3 – 2.3.4).

Figure 3.5: SDS-PAGE analysis of trial expression of SETD7 methyltransferase domain. Overexpression of SETD7 was induced in E. coli BL21(DE3) by either a) 0.1 mM IPTG or b) 1 mM IPTG, and cells grown either at 18 °C for 22 h, 25 °C for 16 h, or 37 °C for 4 h. Following cell lysis, soluble and insoluble fractions were separated, and the soluble fractions applied to Ni-NTA resin, which was then washed with lysis buffer. Beads refers to the washed resin sample.

A clear overexpression band was visible under all expression conditions (Figure 3.5). This band was largely soluble and could be immobilised on Ni-NTA beads. However, the size of the protein according to the ladder was ~35 kDa, whereas the expected size of 6xHis-SETD7 is 30.8 kDa. To confirm the identity of the protein, one of the ~35 kDa bands was excised from the gel, and the protein was extracted and subjected to mass spectrometry (Figure 3.6).

44 in red. highlighted detected fragments fragments. The sequence corresponding to the truncated region that was targeted for expression is highlighted in yellow, and expected contaminants, such as keratin from skin flakes. b) (overleaf) The fragments, proportion ranked of by percentage SETD7 coverage. covered SETD7 by is the the highest analysed coverage protein, with other high-ranking Figure proteins 3.6: being Mascot analysis of candidate SETD7 overexpression band.

a) a) A list of proteins that correspond to digested

45

46

Mass spectrometry analysis confirmed that the dominant protein in the excised band was SETD7 with the correct C-terminus, as peptides covering over approximately half of its sequence were detected. No peptides from the N-terminus were detected.

Large-scale overexpression of SETD7 was therefore performed using the same protocol, using the 0.1 mM IPTG + 25 °C condition as this generated the highest yield of soluble protein in the above trials (Figure 3.7). Soluble SETD7 was purified via Ni-NTA affinity chromatography, anion exchange chromatography, and size exclusion chromatography (SEC) (Figures 3.7–3.8).

Figure 3.7: SDS-PAGE analysis of repeat purification of SETD7 methyltransferase domain. a) Samples were taken of each purification step following elution from Ni-NTA resin, including combined peak fractions from anion exchange chromatography and size exclusion chromatography (labelled ‘anion’ and ‘sizing’ respectively. b) Size exclusion chromatography fractions from the only peak in the chromatogram with a significant A280 nm absorbance.

In this purification experiment a distinct band at the expected size for SETD7 was visible in the anion exchange elution fractions. The protein was further purified by size exclusion chromatography, with a single major peak being observed. Samples were collected across this peak and analysed by SDS-PAGE (Figure 3.7). The major band corresponds to SETD7 with an estimated purity of ~90%. The ~63 kDa band could not be positively identified. This band was excised from the gel, and the protein was extracted and subjected to mass spectrometry. It was identified as keratin, and the position of the gel band corresponds to the expected 55– 65 kDa size of keratin. However, it was not possible to determine whether the keratin identification accounted for the gel band observed (implying that it was accidentally 47 introduced to the sample during purification), or was introduced during preparation of the sample for mass spectrometry (keratin is a notoriously common mass spectrometry contaminant [185]). The purity of the sample was deemed acceptable for the applications detailed in Sections 3.3.3, so this contaminant was not investigated further.

SETD7 was subjected to 1H NMR spectroscopy to determine if the protein was folded (Figure 3.8).

Figure 3.8: 1H NMR spectrum of SETD7. The spectrum was recorded on a 110 µM sample at 25 °C in a buffer containing 20 mM Tris and 100 mM NaCl, pH 7.5.

SETD7 showed good dispersion of peaks under these conditions, particularly in the amide region and methyl region, so was judged to be folded. SETD7 was aliquoted and stored at -80 °C.

3.3.2 Expression and purification of Histone H3.3

Histone H3.3 was expressed and purified for use in assays to assess the activity of SETD7. Plasmids and protocols for expression of Xenopus laevis histones were already available within the laboratory from the work of Jessica Zhong [186], and the histone purification protocol developed for that research was also used here. Xenopus histone 3.3 (H3) is identical in sequence to human H3 due to the evolutionary conservation of histone sequences (refer to Section 1.2), so could be used unmodified for all following experiments.

Recombinant histone expression in bacteria requires purifying the histone from inclusion bodies and partially refolding in urea (refer to Section 2.3.8) [187]. Expression conditions

48 trialled in previous work by Jessica Zhong in the Mackay Laboratory at the University of Sydney were confirmed in small scale trials using either E. coli BL21(DE3) or E. coli BL21(DE3)pLysS cells (Figure 3.8). IPTG induction and autoinduction (Section 2.3.2) were compared at 25 °C and 37 °C.

Figure 3.8: SDS-PAGE analysis of trial histone 3.3 expression. Overexpression of H3 was induced in E. coli BL21(DE3) and E. coli BL21(DE3)pLysS cells by either 1 mM IPTG or autoinduction, and cells grown either at 25 °C for 16 h or 37 °C for 4 h. Following cell lysis, soluble and insoluble fractions were separated, and the insoluble fractions washed and prepared for gel loading.

Small scale expression in E. coli BL21(DE3)pLysS cells produced clear bands of approximately the expected size (15 kDa). The intense bands at ~10 kDa observed for BL21(DE3) expression were not further investigated. Large scale overexpression (1 L cultures) was carried out using autoinduction at 25 °C, as these conditions were found to produce the largest yield of soluble protein in a range of conditions tested. Washed insoluble pellets were dissolved in 6 M guanidine hydrochloride, dialysed into water, and purified by cation exchange chromatography at pH 5.2. 49

Figure 3.9: SDS-PAGE analysis of histone H3.3 expression and purification. Overexpression of H3.3 was induced in E. coli BL21(DE3)pLysS cells by autoinduction, and cells grown at 25 °C for 16 h. Following cell lysis, soluble and insoluble fractions were separated, and the insoluble fractions washed. a) Samples were taken at every step. b) The dialysed sample was subjected to anion exchange chromatography. “Fractions” refers to elution fractions. Refer to Section 2.3.8.

A clear band was visible by SDS-PAGE analysis at 15 kDa at all stages of purification. The protein eluted from the anion exchange column following application of 400–1000 mM NaCl. A substantial proportion of the histone H3.3 precipitated prior to loading onto the anion exchange column, or remained bound to the anion exchange resin after washing with 1 M NaCl (Figure 3.9b). Fractions with a purity of >50% estimated from the Coomassie stained SDS-PAGE (fractions 3–8 in Figure 3.9b) were dialysed into water, aliquoted and stored at -80 °C.

3.3.3 Histone methylation assays

In order to assess the activity of SETD7, we sought to carry out in vitro methyltransferase assays. The three general methods of detecting H3K4me that were considered for determining the activity of SETD7 were:

 Mass spectrometry to detect the altered mass of histone polypeptides caused by methylation;  Detection of radiolabelled methyl groups following addition of a radiolabelled methyl donor; and  Western blot with a primary antibody specific for methylated histones.

The method chosen to assay SETD7 activity was scintillation counting following incorporation of tritiated AdoMet. SETD7 was incubated with histone H3.3 and tritiated AdoMet, the

50 histone H3.3 immobilised on a nitrocellulose membrane and the AdoMet removed via repeated washing. In the negative control condition, the SETD7 was omitted but every other step was unchanged. In all cases, scintillation counts were performed for two minutes per sample and normalised to a water standard (refer to Section 2.7.2–2.7.3). This method was chosen due to its high sensitivity and the relatively short time required to perform the assay.

2500

2000

1500

DPM 1000 Control +Enzyme 500

0 1 2 3 4 Replicates

Figure 3.10: Initial SETD7 activity assay via scintillation counting. H3 and tritiated AdoMet were incubated in the presence or absence of SETD7. The incorporation of tritium into H3 was measured by scintillation counting.

The initial results shown in Figure 3.10 were promising, as the presence of SETD7 caused an increase in disintegrations per minute (DPM) — the number of radioactive decay events observed within the sample each minute (t(3) = 5.91, p < 0.005). However, the fold change above background was very low (~2-fold). Efforts were made to reduce the background count, including washing the nitrocellulose with acetone, NaOH or ‘stripping buffer’ (1.5% glycine, 1% Tween 80, 0.1% SDS, pH 2.2) (Figure 3.11). Of the washes tested, only the NaOH wash appeared to reduce levels of background signal. To further investigate the efficacy of this wash, the effect of different NaOH concentrations was tested (Figure 3.12).

51

4000 3500 3000 2500 2000

DPM Control 1500 1000 +Enzyme 500 0 No wash Acetone 0.5 M NaoH Stripping buffer Wash Conditions

Figure 3.11: Effect of various wash steps on background count in tritiated H3 scintillation counting. H3 and tritiated AdoMet were incubated in the presence or absence of SETD7. The nitrocellulose membrane upon which Histone H3.3 was immobilised was washed with different solutions as indicated (stripping buffer: 1.5% glycine, 1% tween, 0.1% SDS, pH 2.2). The incorporation of tritium into Histone H3.3 was measured by scintillation counting.

52

a)

b) 9000 8000 7000 6000 R² = 0.8667 5000

DPM 4000 3000 2000 1000 0 0 10 20 30 40 50 60 Loading order

Figure 3.12: Effect of NaOH wash concentration on background count in scintillation counting. a) H3 and tritiated AdoMet were incubated in the presence or absence of SETD7. The nitrocellulose membrane upon which H3 was immobilised was washed with either 0.5 M, 1 M, or 2 M NaOH. The incorporation of tritium into H3 was measured by scintillation counting. Each condition was performed in triplicate and counted over two minutes, with each minute being represented by one half of each column shown. b) Disintegrations per minute of each sample plotted against sample loading order.

The data presented in Figure 3.12a appear to show a time dependence of one of the wash steps, as DPM for the series of samples was strongly negatively correlated with the order in which the samples were prepared. This correlation was much stronger than any effects of NaOH concentration, rendering it difficult to draw conclusions from the dataset. To determine whether the cause was related to the scintillation counting equipment, the same samples were recounted in reverse order (Figure 3.13). 53

a)

b) 10000

8000

6000

DPM 4000 R² = 0.8699 2000

0 0 10 20 30 40 50 60 Loading order

Figure 3.13: Reversing loading order to determine the effect of loading order on tritiated H3 scintillation counting. The samples are the same samples used in Figure 3.12. a) H3 and tritiated AdoMet were incubated in the presence or absence of SETD7. The nitrocellulose membrane upon which H3 was immobilised was washed with either 0.5 M, 1 M, or 2 M NaOH. The incorporation of tritium into H3 was measured by scintillation counting. Each condition was performed in triplicate and counted over two minutes, with each minute being represented by one half of each column shown. b) Disintegrations per minute of each sample plotted against sample loading order.

The correlation obtained from reversing sample loading order was identical to the results of the previous set of counts, suggesting that the time dependence was introduced during one or more steps prior to scintillation counting. To investigate which step(s) were time dependent, the experiment was repeated, with one step of the assay being elongated in each

54 condition (Figure 3.14). Two aliquots of enzyme were also tested, one freshly thawed and the other thawed four days prior and stored at 4 °C to assess the stability of the SETD7 sample.

Figure 3.14: Determining the time dependent step of the SETD7 activity assay. Histone H3.3 and tritiated AdoMet were incubated in the presence of absence of either freshly thawed or four-day-old SETD7. The nitrocellulose membrane upon which H3 was immobilised was allowed to sit for 5 min in either the scintillation cocktail or the first NaOH wash. Care was taken to avoid altering the duration of any other step between samples. The incorporation of tritium into H3 was measured by scintillation counting over two minutes, with each minute being represented by one half of each column shown.

The ~50% decrease in DPM of the samples left to sit in the wash buffer relative to the samples left to sit in the scintillation cocktail suggests that the wash step was time dependent. In future tests, care was taken to ensure equal length of washes for all samples. The older SETD7 sample, which had been stored at 4 °C for two weeks following thawing, also demonstrated 40–50% decreased activity compared to the freshly thawed enzyme based on DPM. All SETD7 used from this point was freshly thawed.

Given the time dependence of SETD7 activity and the assay in general, fresh preparations of both Histone H3.3 and SETD7 were purified, with additional attention to rapid purification and timing of sample freezing and storage to maximize the enzyme activity. The same assay was repeated with freshly thawed aliquots of these new protein samples (Figure 3.15).

55

400000

350000

300000

250000

200000 Control DPM +enzyme 150000

100000

50000

0 1 2 3

Figure 3.15: SETD7 activity assay via scintillation counting with freshly prepared SETD7 and Histone H3.3. Histone H3.3 and tritiated AdoMet were incubated in the presence of absence of SETD7. The incorporation of tritium into H3 was measured by scintillation counting. 1, 2 and 3 are technical replicates.

These newly repeated experiments with carefully prepared proteins showed much higher dynamic range, with an average 16-fold increase in DPM of the enzyme-exposed samples relative to the no-enzyme control samples. This provided strong evidence that SETD7 was active and able to methylate Histone H3.3 in vitro (t(2) = 2.81, p < 0.05). The experiment was repeated with freshly prepared samples, with similar results (data not shown).

For further confirmation that the tritiated AdoMet was being specifically incorporated into H3, attempts were made to use autoradiography to detect methylation of H3. Samples were subjected to SDS-PAGE, and the completed gel was dried and exposed to radiographic film (Figure 3.16).

56

Figure 3.16: SETD7 activity assay via autoradiography. H3 and tritiated AdoMet were incubated in the presence of absence of SETD7. Reaction mixtures were separated via SDS-PAGE and gels exposed to radiographic film. a) Representative sample of SDS-PAGE gels used. b–h) Developed radiographic film after exposure to gel. Each film represents an independent experiment, with exposure between 3 and 14 days. i) Positive control – radiographic film exposed directly to tritiated AdoMet absorbed in blotting paper.

The failure of the positive control (radiographic film exposed directly to tritiated AdoMet; Figure 3.16i) indicates that the protocol used was not appropriate for purpose. Note that the dark patches on most film samples were attributed to light leaking through the case, rather than representing detection of any radioactive material. Due to the success of scintillation counting (Figure 3.15), autoradiography was not pursued further.

3.4 Split Construct Design and Expression

Having established that we could express and purify catalytically active recombinant SETD7, the next step was to investigate split SETD7 constructs. The design of the final system will include fusion of the SETD7 fragments to dCas9 in order to co-localise the fragments at the locus of interest. However, for initial testing constructs were fused to leucine zippers instead of dCas9. This enabled testing of recovery of methyltransferase activity following co- localisation without the need to introduce DNA into the reaction conditions.

3.4.1 Design of split SETD7-leucine zipper constructs

It was crucial for the specificity of each pair of split enzyme constructs that the individual constructs have no enzymatic activity, but that activity is reconstituted when the two

57 constructs are brought into close proximity. Four criteria were used to choose sites within SETD7 (110–366) at which to create split points:

 Avoiding disruption of secondary structure. No cut sites were placed within regions of secondary structure, due to concerns that cutting within secondary structure would leave that element of structure unable to correctly form. Secondary structure data were sourced from UniProt [172], which uses published structures to confirm computationally predicted secondary structure, and from manual inspection of multiple structures [173, 174, 188].  Low amino acid conservation. Poorly conserved amino acids were assumed to be less important to the structure and function of the enzyme than conserved residues, so a cut in a poorly conserved region was considered less likely to interfere with enzyme activity when the protein was reconstituted. Amino acid conservation in SETD7 across species was determined using ConSurf [189]. The 100 highest similarity sequences were shortlisted via BLAST [190]. Of these, 36 sequences either had no predicted protein identification or were designated as low quality by ConSurf. Amino acid conservation was therefore determined based on 64 species (Appendix 4). Regions and residues known to be catalytically relevant for SETD7 were confirmed using UniProt and published structural studies [173, 174, 188]. These residues and regions are marked on Figure 3.17 below.  More than 30 residues from either terminus. This criterion aims to avoid the truncated peptide being completely unstructured and potentially subject to proteolysis.  Potential for neither truncated construct to retain enzyme activity. Since the purpose of the split system was to create fragments of the enzyme that do not have methyltransferase activity on their own and regain it only when brought into close proximity, disrupting activity with the cut was vital. As such, sites were chosen that were predicted to disrupt the active site (determined via inspection of structures) or split important catalytic regions (as determined above) between the two fragments.

A summary of this information is provided in Figure 3.17. Based on these criteria, five cut sites were chosen (Figures 3.17 and 3.18) and five pairs of split SETD7 constructs were designed (Figure 3.19).

58

Figure 3.17: Cut sites, secondary structure, amino acid conservation and notable features of SETD7 (110–366).

59

Figure 3.18: The five proposed split domains of SETD7 shown on its structure. SETD7 is shown here bound to AdoMet [191].

Figure 3.19: Split SETD7 construct design. The 6xHis affinity tag was attached only to the N-terminal constructs, with the idea that successfully dimerised pairs would purify together.

3.4.2 Production of split green fluorescent protein control constructs

To test the reassembly of the split SETD7 constructs in the absence of dCas9 fusions, a heterodimeric leucine zipper fusion was proposed. This category of leucine zippers is incapable of homodimerization, but will heterodimerise with their partner leucine zipper. Fusing each split enzyme construct to one of a pair of heterodimeric leucine zippers was

60 therefore expected to bring the constructs together in a manner analogous to that of the final Cas9 fusion system. As shown in Figure 3.19, the zippers selected had been previously shown to allow reassembly of split green fluorescent protein (GFP) [192]. The pair of split-GFP- leucine zipper fusion constructs shown in Figure 3.20 were generated and cloned based on that original design, coexpressed in E. coli BL21(DE3) cells and purified using the protocol used for SETD7 purification (see Section 2.3.3), to confirm that the leucine zippers did not prevent expression or purification under these conditions (Figure 3.21).

Figure 3.20: Design of split GFP leucine zipper constructs. From Ghosh et al. (2000) [192]. Split GFP constructs are fused to the heterodimeric, antiparallel leucine zippers NZ and CZ. Dimerisation of these leucine zippers leads to reassembly of GFP, causing fluorescence.

61

Figure 3.21: Trial co-expression of split GFP-leucine zipper constructs. Overexpression of both GFP- leucine zipper constructs was induced in E. coli BL21(DE3) cells by either 0.1 mM IPTG, 0.4 mM IPTG, 1 mM IPTG or autoinduction, and cells grown either at 18 °C for 22 h, 25 °C for 16 h, or 37 °C for 4 h. Following cell lysis, a) soluble and b) insoluble fractions were separated and analysed by SDS-PAGE.

62

Bands corresponding to the three expected protein sizes (N-terminal, C-terminal, and the full- length construct formed by leucine zipper dimerisation) were present in the insoluble fractions of all overexpression conditions. Microcentrifuge tubes containing the samples were imaged under ultraviolet light to detect fluorescence (Figure 3.22).

Figure 3.22: Fluorescence of split GFP-leucine zipper constructs. Overexpression of both GFP-leucine zipper constructs was induced in E. coli BL21(DE3) by either a) 0.1 mM IPTG, b) 0.4 mM IPTG, c) 1 mM IPTG or d) autoinduction, and cells grown either at 18 °C for 22 h, or 25 °C for 16 h. Following cell lysis, the insoluble material was pelleted by centrifugation and resulting sampled imaged under ultraviolet light. Lighter areas in the tubes are the insoluble cell debris pellets, and darker areas are soluble cell contents in buffer. When laid horizontally for imaging, the cell pellet was largely at the top of the tube.

63

The 18 °C 0.4 and 1 mM IPTG conditions displayed clear fluorescence in the insoluble pellet, demonstrating that functional reassembled GFP existed in the insoluble fractions of at least these expression conditions.

The 18 °C 1 mM IPTG condition overexpression was repeated on a larger scale, and purified according to the protocol in Sections 2.3.4–2.3.6. Clear native PAGE was performed using eluted fractions to determine whether reconstituted, fluorescent GFP was present (Figure 3.23).

Figure 3.23: Clear native PAGE analysis of split GFP overexpression and purification. Fractions from each stage of purification were retained and run on a clear native PAGE gel alongside intact GFP as a positive control. The completed gel was imaged fluorescently with an excitation wavelength of 500 nm.

The purification of split GFP shown in Figure 3.23 did not reveal any fluorescent species present in the fractions, in contrast to the intense fluorescence seen for the intact GFP positive control. Since overexpression and reconstitution of the split GFP constructs already appeared successful based on the crude lysate testing shown in Figure 3.22, it was decided to

64 move onto split SETD7 constructs rather than performing further troubleshooting of this proof-of-concept system.

3.4.3 Expression, purification and testing of split SETD7 constructs

The five pairs of split SETD7 constructs discussed in Section 3.4.1 were cloned and overexpressed in E. coli BL21(DE3) (refer to Section 2.3.3–2.3.4). Each pair was coexpressed to encourage dimerisation of the constructs as soon as possible following translation, in case the unpaired constructs were prone to misfolding and/or aggregation. Following lysis of the overexpression cultures, soluble lysate was used as the substrate for radiometric histone methylation assay (Figure 3.24) (refer to Section 2.7.3).

Figure 3.24: Activity of split SETD7 lysate via scintillation counting. H3 and tritiated AdoMet were incubated in the presence of purified SETD7, soluble lysate from SETD7 overexpression, or soluble lysate from each of the five pairs of split SETD7 constructs, and in the absence of enzyme. The incorporation of tritium into H3 was measured by scintillation counting over two minutes, with each minute being represented by one half of each column shown.

Both the purified and unpurified SETD7 positive controls demonstrated significant fold increases in DPM of 20–100 relative to the no-enzyme negative control, indicating that SETD7

65 was folded and functional in unpurified lysate and that the assay is valid for measuring activity of the split SETD7 constructs. By contrast, none of the split SETD7 samples displayed differences in DPM relative to the no-enzyme negative controls that were considered to be significant. The SETD7 purification protocol (refer to Sections 2.3.4–2.3.6) was used to purify the overexpressed constructs in order to confirm their presence in the samples, and the histone methylation assay was repeated with these purified samples to rule out the possibility of contaminants inhibiting SETD7 activity.

Figure 3.25: SDS-PAGE analysis of purification of split SETD7 constructs. Samples were taken following elution from Ni-NTA resin. The five lanes correspond to the five pairs of split SETD7 constructs shown in Figure 3.18. Red boxes denote bands of correct size to be split SETD7 constructs. 66

The purification analysis in Figure 3.25 suggests that the split SETD7 construct pairs 1 and 2 were successfully overexpressed and purified. As only the N-terminal constructs were tagged with 6xHis, these pairs also appear to have successfully dimerised. It appears that construct pairs 3, 4 and 5 were either not successfully expressed or purified.

67

Figure 3.26: Activity of purified split SETD7 pairs via scintillation counting. H3 and tritiated AdoMet were incubated in the presence of purified SETD7, purified fractions of each of the five pairs of split SETD7 constructs, and in the absence of enzyme. The incorporation of tritium into H3 was measured by scintillation counting over two minutes, with each minute being represented by one half of each column shown.

The purified split SETD7 samples also displayed DPM values that were not significantly different to the no-enzyme negative control (Figure 3.26), compared to the intact SETD7 positive control which demonstrated a 20-fold DPM increase. This result suggests that even for the constructs that were successfully expressed and purified, dimerisation of the split constructs was not occurring, or that none of the reconstituted dimers are able to recover significant amounts of activity.

3.5 Discussion

This research has resulted in the creation of a list of known human epigenetic enzymes, and their structural and functional characteristics. From this list the histone methyltransferase SETD7 was selected as a candidate to develop a split-enzyme system for highly specific epigenetic modification. 68

Replicating the work of Zhang et al. [177], SETD7 was expressed recombinantly and shown to be able to methylate recombinant histone 3 in vitro (Figure 3.15). It was demonstrated that fluorescence of split GFP could be recovered following dimerisation of split GFP-leucine zipper fusions, replicating the work of Ghosh et al. [192] in the creation of antiparallel, heterodimeric leucine zippers.

Semi-rational design of split SETD7-leucine zipper fusions in the same fashion resulted in five pairs of candidate constructs. However, when expressed in E. coli only two of these pairs showed evidence of successful expression and complementation through co-purification, but none displayed histone methyltransferase activity. Notably, the only split SETD7 pairs that did appear to complement, sets 1 and 2, are the pairs in which all active site residues remain on one of the fragments, whereas they are separated between the fragments in the other three pairs (Figure 3.18). It may be that splitting the active site and surrounding fold between fragments disrupts a folding nucleus for the domain, making those fragments prone to misfolding and unable to dimerise.

The protocol used to purify the fragment pairs relies heavily on the solubility of the N-terminal fragment, as Ni-NTA affinity chromatography was used to purify the fragments and the His- tag the method depends on was present on the N-terminal fragments. The larger N-terminal fragments present in fragment pairs 3–5 may be less soluble, but information about the solubility or folding of the C-terminal fragments was difficult to establish through the purification data.

The presence and concentration of fragments pairs in the assays performed was assumed from the appearance of bands of the appropriate size by SDS-PAGE rather than being more accurately determined. Mass spectrometry to confirm the identity of the fragments and better estimates of their concentration would be helpful to eliminate the possibility that bands visible by SDS-PAGE are not the proteins of interest.

The approach of splitting an enzyme in order to prevent off-target effects has been shown to be a viable strategy for reducing off-target effects [193-195]. It has been successfully applied to DNA methylation, though not histone methylation, in vivo [194]. However, the general success of the strategy does not necessarily indicate that it is applicable to all proteins. Many steps are available to troubleshoot the expression, purification and complementation of the SETD7 fragment pairs:

69

 Expression levels of the fragment pairs could be tested under a range of temperature and induction conditions to determine the conditions that produce maximum soluble expression in each pair. Insoluble expression could also be monitored, as purifying the protein from inclusion bodies and refolding it could prove a viable purification strategy. If necessary, expression of single fragments (rather than co-expression of fragment pairs) could confirm whether a single fragment is capable of expression at all, even though the resulting protein would likely be partially unfolded.  The purification used in the above work specifically purified only the N-terminal fragments and any C-terminal fragments that had dimerised to them. Adding an affinity tag to the C-terminal fragments and purifying them directly would provide information about which fragments had been successfully expressed, and which were unable to dimerise to their partner fragment. The use of different affinity tags may also improve soluble protein yields. For example, the MBP fusion tag has been shown to enhance the solubility of recombinantly expressed proteins [196].  While the five pairs of fragments used in this work were the only possible pairs that fit the design constraints outlined in Section 3.4.1, these constraints may not be required in order to produce a functional fragment pair. A further round of semi- rational fragment pair design with different constraints (e.g., allowing splits within regions of defined secondary structure) may produce a functional fragment pair.  The length of the serine-glycine linkers connecting the fragments and leucine zippers may require alteration. A linker that is too short could prevent the fragments from orienting correctly to restore functionality. A linker that is too long could allow too much freedom in the movement of the fragments, preventing them from being brought together in the correct orientation. Fusion of the split GFP fragments described in Section 3.4.2 to the SETD7-leucine zipper constructs may also serve as a useful reporter mechanism for successful complementation, as a green glow would indicate that the two fragments have dimerised.

Based on the previous successes of the split enzyme system [193-195], a combination of these approaches would be likely to result in a functional split SETD7 system that could then be used for dCas9 targeting.

70

Although a range of possible approaches were available to further pursue the production of a split-enzyme system in bacteria, it was instead decided to attempt a different approach, creating analogous split-enzyme systems directly in cultured human cells.

71

4. Observation of Epigenetic Modifications in Cultured Human Cells

4.1 Introduction

The ultimate goal of this project was to produce split enzyme systems that are functional in eukaryotic cells, including mouse and human cells. A risk associated with the experiments involving bacterial expression systems presented in Chapter 3 was that any success might not be transferable to mammalian cells, due to differences in folding or post-translational modification of the expressed enzymes, or the more complex environment of the cytosol compared to an in vitro buffer. To address this risk, it was decided to test the SETD7 system described in the previous chapter in a human cell line model. Many of these experiments were performed concurrently to those presented in Chapter 3.

Work performed by collaborators in the Segal group at University of California, Davis, had confirmed that changes in patterns of histone modifications in human cell lines are detectable both directly by chromatin immunoprecipitation (ChIP) using antibodies against the modification, and indirectly by reverse transcription quantitative polymerase chain reaction (RT-qPCR) to assay changes in gene expression resulting from the modification [197]. Their published experiments were conducted by transiently transfecting stable human cell lines with a mixture of five plasmids:

 A plasmid containing a gene that encodes a histone methyltransferase or histone acetyltransferase fused to catalyically dead Cas9 (dCas9). Cas9 is a bacterial enzyme that is natively involved in bacterial defence against DNA viruses [198]. It uses a 20 bp RNA template (guide RNA or gRNA) to find and cleave complementary DNA sequences. dCas9 is a mutant form of Cas9 that retains the ability to bind to DNA sequences complementary to a provided gRNA, but no longer has endonuclease activity [199]. dCas9 can therefore be used to target a fused protein to a specific locus or other sequence of DNA with high specificity but without cutting the DNA.  Three plasmids encoding RNA sequences complementary to the HER2 gene to serve as gRNAs for the dCas9-enzyme fusion. HER2 encodes a growth factor receptor that is implicated in aggressive forms of breast cancer [200]. It therefore serves both as a gene of biological interest (as preventing HER2 overexpression may slow the growth of these breast cancers) and a reporter gene for experimental success (as HER2 is constitutively expressed at low levels in the tissues the cell lines used are derived

72

from). Thus, changes in levels of HER2 mRNA serve as an indicator of changes in histone methylation/acetylation caused by the enzyme-dCas9 fusion construct. Each of these plasmids encoded RNA targeted to different regions of the HER2 gene to increase the likelihood of successful dCas9 localisation to the HER2 locus [201].  A plasmid containing a gene that encodes resistance to the antibiotic puromycin. Adding this plasmid allows selection of cells have taken up the plasmids by adding small amounts of puromycin to the growth medium. While this is not a perfect selection mechanism to indicate the successful uptake of all five plasmids, the experiment can be designed in such a way to maximise uptake (Section 4.2.1).

In the Segal group’s experiments, RT-qPCR performed using mRNA extracted from transfected cells demonstrated increases or decreases in HER2 mRNA according to the histone-modifying enzyme used. ChIP performed on genomic DNA from the same cells demonstrated increased levels of methylation or acetylation around the HER2 locus. Refer to Section 4.3 for further details.

A similar workflow was set up in order to first replicate and then elaborate on these results (Figure 4.1).

Figure 4.1: Mammalian methyltransferase expression workflow.

73

4.2 Transfection of Cultured Cells

Two cell lines were used for this work: the human colorectal carcinoma line HCT116 and the human embryonic kidney line HEK293-FT. HCT116 was used in the Segal Laboratory and was utilised here to attempt to replicate the experiments. HEK293-FT cells are fast growing, easily transfected and well understood by our laboratory. Due to their ease of use, it was decided to perform as many experiments as possible in HEK293-FT cells if they proved fit for purpose, but use HCT116 cells if the experiments could not be replicated in HEK293-FT cells.

4.2.1 Choice of vectors

The transient transfection system used by the Segal lab included the pBABE-puro puromycin resistance plasmid. During transfection, pBABE-puro was transfected at a concentration roughly ten times lower than those of the other plasmids. Following the assumption that cells take up plasmids in rough proportion to the concentration at which they are supplied, any cell that survived the introduction of puromycin into their growth medium 24 h after transfection would also most likely have taken up the other plasmids added during transfection. Refer to Section 2.4.2 for the full transfection protocol.

For the work in this thesis, the three gRNA plasmids contained sequences complementary to the same HER2 reference gene.

4.3 Detection of Modifications by Reverse Transcription Quantitative Polymerase Chain Reaction

RT-qPCR allows the detection of changes in transcription levels following an event, which in this case is histone modification. RNA is extracted from lysed cells and complementary DNA (cDNA) sequences are synthesised using reverse transcriptase. These cDNA mixtures are then subjected to PCR using primers complementary to either the sequence of interest or an unrelated ‘housekeeping’ gene (to normalise for the number of cells used), and the abundance of the resulting DNA product measured using an intercalating fluorescent dye. RT- qPCR was chosen as the method for detecting gene expression changes caused by histone modifications due to its sensitivity (being able to detect <10 copies of mRNA under ideal circumstances using a two-step protocol [202]) and the relative speed of the assay (~6 h total).

Initially the histone modifier transfected into cells was dCas9 fused both N- and C-terminally to a 45 amino-acid peptide, derived from the N-terminus of FOG1, which can recruit the

74

Nucleosome Remodelling and Deacetylation (NuRD) complex to histones and therefore alter the expression of associated genes (Figure 4.2a, b) [203, 204]. The Segal laboratory found this construct could inhibit HER2 mRNA production by ~80% in HCT116 cells, and to be more effective than other permutations of FOG1-dCas9 constructs (Figure 4.2c) [204].

Figure 4.2: FOG-dCas9 as an inhibitor of HER2 transcription. Figure 3 of O’Geen et al. (2017) [204]. Reproduced with permission. a) Schematic of the mechanisms of action of EZH2-dCas9 (not used in this thesis) and FOG-dCas9. The FOG1 peptide recruits the NuRD complex, which causes deacetylation of H3K27 and recruits polycomb repressive complex 2, which mediates trimethylation of H3K27. This results in repression of transcription at the targeted locus. b) Schematic of the various FOG1-dCas9 constructs trialled by the Segal laboratory. c) N- and C-terminally fused FOG1-dCas9 was the most potent inhibitor of HER2 transcription of the constructs in panel b, inhibiting transcription by ~80%. d) FOG1-dCas9 also caused a 5-fold enrichment in HER2 proximal to methylated histones as determined by chromatin immunoprecipitation.

In an effort to replicate the results shown in Figure 4.2c, HCT116 and HEK293-FT cells were transfected with a vector encoding the FOG1-dCas9-FOG1 construct, together with the other

75 vectors described in Section 4.2.1. The negative control condition was transfection with empty gRNA cloning vectors instead of those containing HER2 complementary gRNA sequences. RNA from the harvested cells was used for RT-qPCR as described above, using GAPDH as the reference gene. GAPDH is a commonly used reference gene for RT-qPCR as it is a constitutive ‘housekeeping’ gene with reliably stable transcription levels [205]. HER2 mRNA levels were normalised to GAPDH mRNA levels, and normalised data presented as fold changes relative to HER2 mRNA in the HEK293-FT no gRNA condition. (Figure 4.3).

Figure 4.3: RT-qPCR of HER2 in FOG1-dCas9 transfected cells. HEK293-FT and HCT116 cells were transfected with FOG1-dCas9, pBABE-puro, and either three HER2 gRNA plasmids or an equivalent amount of empty vector. Transfected cells were selected for by 3 ng/µL puromycin, total RNA was extracted, cDNA synthesised and qPCR carried out. cDNA was normalised to the reference gene GAPDH. Fold changes in HER2 cDNA relative to the HEK293-FT no-gRNA condition were determined. Refer to Sections 2.4 and 2.7 for full methodological details. Raw data are presented in Appendix 5.

In this experiment, there was no observable difference in the levels of HER2 mRNA as a result of gRNA targeting of HER2 in either cell line, indicating that HER2-targeted histone deacetylation was not occurring. Repetition of the same experiment with separately transfected cells was performed twice (data not shown), with like results. A similar experiment was performed to investigate whether the FOG1-dCas9 construct was causing histone deacetylation at all. Transfections, harvesting and RT-qPCR were performed as described above, except that the negative control condition was transfected with dCas9 without the FOG1 peptide fusion, and was also transfected with the HER2 gRNA vectors

76

(Figure 4.4). The total absence of FOG1 from the negative control transfection was intended to test whether the results seen in Figure 4.3 were due to off-target effects of FOG1 masking its HER2-specific effects.

1.2

1

0.8

0.6

Fold change 0.4

0.2

0 HEK -FOG HEK +FOG HCT -FOG HCT +FOG

Figure 4.4: RT-qPCR of HER2 in FOG1-dCas9 transfected cells. HEK-293FT and HCT116 cells were transfected with FOG1-dCas9 or dCas9, pBABE-puro, and three HER2 gRNA plasmids. Transfected cells were selected for by addition of puromycin (3 ng/µL), total RNA was extracted, and cDNA synthesised and used for RT-qPCR. cDNA was normalised to the reference gene GAPDH. Fold changes in HER2 cDNA relative to the dCas9 condition were determined. Refer to Sections 2.4 and 2.7 for full methodological details. Raw data are presented in Appendix 5.

These data showed no change in HER2 mRNA abundance in the HEK293-FT cells. For the HCT116 cells a ~2.5-fold increase in HER2 mRNA was observed. However, this change in abundance was in the opposite direction to that expected from a transcriptional repressor (refer to Figure 4.2c), and to the other results of this experiment and the experiment presented in Figure 4.3. In both cases, the expected decrease in mRNA abundance was not observed. This experiment was repeated with freshly transfected cells (data not shown), with a similar result.

In an effort to troubleshoot this unexpected result, the vectors used for these experiments were checked via agarose gel electrophoresis. It was noted that the sample of FOG1-dCas9 vector being used for these experiments appeared to be contaminated or incorrect (Figure 4.5), despite sequencing after receipt of the plasmid confirming that it contained the construct (Appendix 6).

77

Figure 4.5: Unknown composition of a supposed FOG1-dCas9 plasmid sample. a) Undigested and b) linearised FOG1-dCas9 plasmid was run on a 1% agarose gel and visualised via HydraGreen staining. c) Undigested and d) linearised FOG1-dCas9 plasmid, from both our existing stocks (“old sample”) and a newly obtained sample from the Segal lab (“new sample”), was run on 1% agarose gel and visualised via HydraGreen staining. The expected size of the plasmid was 10.0 kb.

The vector was analysed by agarose gel electrophoresis in both an untreated and Kpn1- treated form; KpnI should cut the vector once to form a linear plasmid. Both samples showed multiple bands (Figure 4.5a and b), indicating that the sample contained plasmids other than FOG1-dCas9. Further, neither band corresponded to the expected size of the FOG1-dCas9 plasmid (~10.0 kb).

78

Another sample of the FOG1-dCas9 plasmid was sourced from the Segal laboratory. Unfortunately this sample, although showing as a single species in both non and KpnI-treated samples, appeared to be both much larger than 10 kb (Figure 4.5c) and had multiple restriction sites for the KpnI restriction enzyme as indicated by the broad band at ~4.5 kb, which suggests two or more bands of similar sizes (Figure 4.5d).

In the first instance it was decided to use other available constructs to attempt to reproduce the Segal group’s experiments. HEK293-FT cells were transfected as in Figure 4.3, except that the Cas9 fusion partner was either KRAB (a non-histone-modifying transcriptional repressor) or SETD7 (a histone methyltransferase and transcriptional activator, as discussed in Chapter 3). In these experiments the negative control was transfection with dCas9 alone, and normalised RT-qPCR data were displayed as a fold change in HER2 mRNA abundance relative to the negative control (Figure 4.6).

Figure 4.6: RT-qPCR of HER2 in KRAB-dCas9 or SETD7-dCas9 transfected cells. HEK293-FT cells were transfected with KRAB-dCas9, SETD7-dCas9 or dCas9, pBABE-puro, and three HER2 gRNA plasmids. Transfected cells were selected for by addition of puromycin (3 ng/µL), total RNA was extracted, and cDNA synthesised and used for RT-qPCR. cDNA was normalised to the reference gene GAPDH. Fold changes in HER2 cDNA relative to the dCas9 condition were determined. Refer to Sections 2.4 and 2.7 for full methodological details. Raw data are presented in Appendix 5.

79

Both the KRAB and SETD7 conditions in this dataset displayed small apparent fold increases in HER2 mRNA relative to dCas9 alone. However, KRAB is a transcriptional repressor, meaning that a decrease in abundance was expected. The increase seen in these data suggests that the small increase for the SETD7 condition was probably not due to specific activation by SETD7. The experiment was repeated with fresh transfections and more enzymes. KAT8 (a histone acetyltransferase and transcriptional activator) and KMT2D (a histone methyltransferase and transcriptional activator) were each also individually fused to dCas9 and tested. Again, normalised RT-qPCR data were compared to the dCas9-only negative control condition (Figure 4.7).

1.80 1.60 1.40 1.20 1.00 0.80 0.60 0.40

Fold change relative to dCas9 to relative change Fold 0.20 0.00 dCas9 SETD7 KAT8 KMT2D KRAB

Figure 4.7: RT-qPCR of HER2 in KRAB-dCas9, SETD7-dCas9, KAT8-dCas9 or KMT2D-dCas9 transfected cells. HEK293-FT cells were transfected with KRAB-dCas9, SETD7-dCas9, KAT8-dCas9, KMT2D-dCas9 or dCas9, pBABE-puro, and three HER2 gRNA plasmids. Transfected cells were selected for by puromycin (3 ng/µL), total RNA was extracted, and cDNA synthesised and used for RT-qPCR. cDNA was normalised to the reference gene GAPDH. Fold changes in HER2 cDNA relative to the dCas9 condition were determined. Refer to Sections 2.4 and 2.7 for full methodological details. Raw data are presented in Appendix 5.

The three transcriptional activators (SETD7, KAT8 and KMT2D) all showed small increases in mRNA abundance. However, the repressor KRAB displayed a similar increase over the dCas9 control. The lack of difference between transcriptional activators and repressors, as seen here and in Figure 4.6, suggests that the small increase in gene expression was not significant, as qPCR has at least a 2-fold margin of error [206].

A transfection time course experiment was conducted to investigate whether the unusual transcriptional activation observed for KRAB-transfected cells was transient or persistent.

80

Cells were transfected with either KRAB-dCas9 or dCas9 alone, and aliquots of cells harvested every 24 h following transfection. RNA was then extracted and RT-qPCR carried out to assay levels of HER2 expression (Figure 4.8).

80 70 60 50

40 +KRAB GAPDH 30 -KRAB 20 10 HER2 transcripts per 10,000 copies copies 10,000 per transcripts HER2 0 0 24 48 72 96 Time post transfection (h)

Figure 4.8: RT-qPCR of HER2 in KRAB-dCas9, SETD7-dCas9, KAT8-dCas9 or KMT2D-dCas9 transfected cells. HEK293-FT cells were transfected with KRAB-dCas9 or dCas9, pBABE-puro, and three HER2 gRNA plasmids. Transfected cells were selected for by addition of puromycin after 24 h. Cells were harvested every 24 h for 96 h. Total RNA was extracted, and cDNA synthesised and used for RT-qPCR. cDNA was normalised to the reference gene GAPDH. Fold changes in HER2 cDNA relative to the dCas9 condition were determined. Refer to Sections 2.4 and 2.7 for full methodological details. Raw data are presented in Appendix 5.

The relative abundance of HER2 mRNA in the +KRAB and -KRAB conditions fluctuated substantially over time. The only time point at which the +KRAB condition exhibited lower abundance was 24 h, immediately before puromycin selection began. At this point there should be little to no difference in HER2 gene expression as it is expected that only about 10% of the cells will have taken up plasmids and the majority of assayed cells should not. These data suggest that none of the dCas9 fusion constructs had a measurable effect on HER2 expression.

4.4 Detection of Modifications by Chromatin Immunoprecipitation

RT-qPCR as a reporter of chromatin modification relies on significant changes in gene expression measured from levels of mRNA following transcription. It is possible that the chromatin was being modified as intended, but that those modifications haven’t carried

81 through to the level of transcription. An alternative approach that provides a direct readout of chromatin modification is chromatin immunoprecipitation (ChIP). In general, this methodology identifies the DNA sequences that are bound by a protein of interest. Here we aimed to look at changes in histone methylation and acetylation at the HER2 locus. To achieve this goal, ChIP experiments were performed using histone-specific anti-methyllysine or anti- acetyllysine antibodies, and qPCR was carried out using primers for HER2 to determine whether histones near the gene have been differentially modified. In this experiment, cell transfection was carried out as in Figure 4.3 with dCas9-FOG1 (or a dCas9 control), pBABE- puro, and three HER2 gRNA plasmids. DNA and protein were crosslinked with formaldehyde prior to harvesting, nuclei were extracted by lysing cell membranes with detergent. Following separation of the nuclei from the cell lysate using differential centrifugation, chromatin within the nuclei were sheared into ~500 bp fragments by sonication (Section 2.8.1, Figure 4.9).

Figure 4.9: RT-qPCR of HER2 in FOG1-dCas9 transfected cells. HEK293-FT and HCT116 cells were transfected with either FOG1-dCas9 or dCas9, pBABE-puro, and three HER2 gRNA plasmids. Transfected cells were selected for by puromycin after 24 h. Cells were crosslinked with formaldehyde. Chromatin was extracted from the harvested cells and sheared to an average length of 500 bp.

The chromatin shown in Figure 4.9 appeared to be appropriately sheared, as demonstrated by the majority of the DNA clustering around 500 bp, so histone-specific anti-acetyllysine immunoprecipitation was carried out on the chromatin samples and the enriched DNA used as a template for qPCR (Figure 4.10). Immunoprecipitation with an antibody for H3K27me3 was used as the positive control (as the HER2 locus is a region of active low level transcription, and therefore histones proximal to the locus should be methylated), and immunoprecipitation with immunoglobulin G from non-immunised rabbits was used as the negative control (as non-immunised rabbits should not have anti-methyllysine or anti- 82 acetyllysine antibodies). The Segal lab consistently observed a five-fold enrichment in HER2 DNA in the FOG1-dCas9 condition relative to the dCas9 alone condition under these experimental conditions (Figure 4.2d).

Figure 4.10: qPCR of immunoprecipitated chromatin in FOG1-dCas9 transfected cells. Chromatin extracted from HEK293-FT cells transfected with FOG1-dCas or dCas9 was enriched for loci proximal to acetylated histones (test condition), anti-rabbit (negative control) or methylated histones (positive control). Enriched DNA was used as a template for qPCR using HER2 primers. DNA abundance was normalised to the reference gene GAPDH. Fold changes in HER2 DNA relative to the negative control condition were determined. Refer to Sections 2.4 and 2.7 for full methodological details. Raw data are presented in Appendix 5.

Magnitudes of enrichment in all conditions were extremely small, with the most substantial fold change being 0.86 (Figure 4.10). As the positive control for enrichment did not demonstrate significant enrichment, it appeared that the immunoprecipitation steps did not enrich the DNA as intended. The chromatin immunoprecipitation procedure was repeated with freshly transfected cells (data not shown), but again produced no significant enrichment.

4.5 Transfection Protocol Troubleshooting

After looking for parts of the protocol that could have led to the unexpected null results in the ChIP and RT-qPCR experiments, it was decided to more directly investigate the transfection protocols. Note that as described in Section 4.2.1, it is generally assumed that all plasmids are taken up by the cells in the proportion they are added and for these experiments 83 puromycin resistance should indicate that the cells contain all of the added plasmids, which were added at 10-fold the level of pBABE-puro. However, a lack or reduced levels of key plasmids in each cell could have a major effect on the experiments.

4.5.1 Confirmation of enzyme expression

All of the vectors used in this experiment encoded FLAG-tagged proteins that have the potential to be detected using an anti-FLAG Western blot. Cell populations were individually transfected with each of the enzyme-dCas9 fusions used in RT-qPCR experiments (plus the pBABE-puro, and three HER2 gRNA plasmids) and incubated as normal. Harvested cells were submitted to Western blot analysis to determine whether the transfected proteins were expressed (Figure 4.11).

Figure 4.11: Anti-FLAG Western blot of FOG1-dCas9, dCas9, KRAB-dCas9, and SETD7-dCas9. HEK293- FT cells were transfected with KRAB-dCas9, SETD7-dCas9, FOG1-dCas9 or dCas9, pBABE-puro, and three HER2 gRNA plasmids. Transfected cells were harvested after 96 h. The soluble lysates were analysed by SDS-PAGE and anti-FLAG Western blot.

Three proteins of interest display strong FLAG-tagged expression bands between 150 and 250 kDa, which is the approximate expected molecular weight of those fusion proteins. Isolated dCas9, however, displayed a very faint band, suggesting much lower expression levels in this experiment. A repeat of the experiment yielded the same result (not shown). It is unknown why dCas9 without a fusion partner would be expressed at much lower levels than the enzyme-dCas9 fusion constructs. However, as it is a non-functional enzyme included as a negative control, this observation was not considered to be a likely cause of the issues observed in the RT-qPCR and ChIP experiments.

4.5.2 Investigating transfection efficiency

To further test the transfection protocol, cells were transfected with GFP instead of a dCas9 fusion construct so that transfected cells could be visualised directly. After 24 h, and prior to 84 selection with puromycin, these cells were imaged via fluorescence microscopy, alongside untransfected cells as a negative control, and the number of fluorescent cells compared with the number of cells visible under bright field to estimate transfection efficiency (Figure 4.12). 10% transfection efficiency is considered normal for this protocol (H. O’Geen 2018, personal communication).

Figure 4.12: Fluorescence microscopy of GFP-transfected HEK293-FT cells. A GFP expression vector was transfected into HEK293-FT cells. Control cells were subjected to the transfection protocol in the absence of vector. 24 h following transfection, cells were imaged using the combined (left) and green- only (right) channels of a fluorescence microscope.

Both the GFP-transfected and untransfected cell sets showed similar levels of growth after 24 h as evidenced by the bright field images (left-hand panels), but only the GFP-transfected cells showed green fluorescence in any cells (right-hand panels). The number of green-fluorescing cells was visually estimated to be between 10% and 20%, suggesting that efficiency of the

85 transfection protocol used was likewise between 10% and 20%. Cells were subsequently treated with puromycin (3 ng/µL) and incubated for a further 72 h.

4.5.3 Effect of puromycin concentration on cell selection

Following puromycin addition, it was observed via microscopy and cell counting that >90% of cells died. Although it is expected that the ~90% of cells that haven’t taken up plasmids should die because they will be sensitive to puromycin, no cells were detected after harvesting in some cases. In order to determine whether puromycin selection was too stringent, or not functioning as expected, a test of the effect of puromycin concentration on cell survival was conducted, comparing cells transfected with a puromycin resistance vector and untransfected cells, using the normal transfection protocol. Survival of the cells over time was determined by visually estimating % confluence of the cells within their growth flask (Figure 4.13).

Figure 4.13: Survival of puromycin-resistance and puromycin-sensitive cells following treatment with varying concentrations of puromycin. HEK293-FT cells were transfected with pBABE-puro or put

86 through the transfection procedure in the absence of plasmid. 24 h following transfection, cells were exposed to the indicated concentration of puromycin (indicated by a red arrow). Percentage confluence was monitored every 24 h for 96 h.

All cells were 50% confluent at the start of the experiment, rising to ~70% confluence 24 h after either transfection or mock transfection, and prior to the addition of puromycin. The addition of puromycin caused significant cell death over time for all concentrations investigated. The higher puromycin concentrations caused higher levels of, and more rapid, cell death. Concentrations >1 ng/µL puromycin caused complete death within three days after addition. However, this was true of the cells transfected with a puromycin resistance plasmid as well as untransfected cells. Indeed, at lower concentrations, cells transfected with the puromycin resistance plasmid were actually less viable than mock-transfected cells. This observation indicated that selection of transfected cells may not have been functioning as expected in previous experiments, providing a possible explanation of why there was no significant changes from the different cell populations in either set of experiments.

4.6 Discussion

In summary, transfection of the mammalian cell lines HEK293-FT and HCT116 with dCas9- enzyme fusion constructs was only partially successful. Expression of transfected constructs was detectable (Figure 4.11), but a lack of selection for transfected cells due to failure of the pBABE-puro plasmid to provide puromycin resistance prevented study of successfully transfected cells in isolation (Figure 4.13).

Portions of the work presented in this chapter yielded interpretable data or gave a clear demonstration of which parts of a protocol were working as designed. We were able to establish that the protocol used for transfection into mammalian cells enabled 10–20% transfection efficiency (Figure 4.12), and that transfected HEK293-FT cells were able to express the dCas9-enzyme fusion constructs (Figure 4.11). RT-qPCR to detect levels of mRNA within transfected cells also proved reliable according to control conditions used, which displayed a high degree of consistency in HER2 and GAPDH mRNA levels (Figures 4.3, 4.4, 4.6, 4.7). However, additional troubleshooting work is still required to establish a robust system for testing targeted epigenetic enzymes in mammalian cells.

It is unclear whether the iCas9 and pBABE plasmids (used to transfect cells with dCas9 fusion constructs and puromycin resistance, respectively) contained the expected sequences, as

87 discussed in Sections 4.5.1 and 4.5.3. The fact that expression of dCas9 fusion proteins of the correct expected sizes was observed (Figure 4.11) means that no frame-shift or nonsense mutations are present within the protein-coding sequence and no mutations of any kind are likely to be present within the promoter region, but a missense mutation within the protein- coding sequence may both introduce an unexpected restriction site and affect the function of the resulting protein. Sequencing to verify the complete plasmid sequence would confirm whether the plasmid samples used were correct or contained any mutations or unexpected elements of DNA. This is particularly important for the iCas9 plasmid, as it is very large (~10 kb) and previous sequencing only verified the region surrounding the multiple cloning site (Figure 4.14). Multiple overlapping sequencing runs would be required to verify the sequence of the entire plasmid.

Figure 4.14: Map of the iCas9 KRAB plasmid.

Should any differences from the expected sequence be found in either plasmid, it would be ideal to reclone the plasmid from scratch, using an independent vector source and a newly synthesised insert gene sequence.

Regardless of the integrity of the plasmid, the pBABE puro selection system requires further testing. At a minimum, repetition of the time course experiment described in Section 4.5.3

88 with new plasmid, different cell lines, fresh puromycin and smaller time intervals will help single out the causes of the higher than expected levels of cell death upon addition of puromycin (Figure 4.13).

The transfection protocol as a whole is quite complex, harbouring many possible points of failure. The protocol used herein was based on a protocol successfully implemented by the Segal lab, but the complexity of mammalian systems affects reproducibility of experiments carried out in these systems [207]. In particular, the cell lines used in our work were not sourced from the same stocks as those used in the Segal lab, which may be problematic due to the high propensity for cross-contamination between different mammalian cell lines [208] and the tendency for properties of cell lines to change over time [209]. Verification of the HCT116 and HEK293-FT cell lines used for the work presented in this thesis should be carried out.

The fact that five different plasmids are used in the transfection protocol, each with precise concentration requirements, increases the likelihood of problems occurring during transfection. Ideally the number of plasmids used could be condensed, for example by combining the multiple gRNA sequences used on a single plasmid and/or by including the puromycin resistance gene on one of the other plasmids rather than on its own plasmid.

Finally, an important factor for the success of the experiments that was not varied in the work carried out here is construct design. As detailed in Figure 4.2, the position and number of FOG1 peptides fused to dCas9 had a significant effect on the degree of transcriptional repression observed in experiments performed by the Segal lab. Once a successful protocol for selecting for transfected cells has been established, trials of different construct designs should be carried out for each enzyme used in order to determine the optimal configuration for that enzyme.

Separately to the transfection and selection issues encountered, the ChIP protocol attempted in Section 4.4 was unsuccessful, as determined by a failure to observe enrichment of the HER2 locus in the anti-methyllysine positive control condition. As ChIP is a complicated procedure with a wide range of reported protocols, there are variations of the protocol that could be attempted in the hope of obtaining more successful results. The main limiting factor preventing these from being attempted was the cost of producing large numbers of transfected cells using the current transfection protocol. Advice provided by the Crossley lab

89 at the University of New South Wales indicated that the creation of a stably transfected cell line was a historically successful method of producing large numbers of transfected cells (G. Martyn 2017, personal communication). However, this would depend on reduction in the number of genes being transfected, as stable transfection of five separate plasmids is not feasible (G. Martyn 2017, personal communication).

Despite ongoing consultation with the Segal laboratory, their methodology for testing dCas9 fusion constructs in mammalian cells proved to be unexpectedly difficult to reproduce. A visit to the laboratory to confirm any differences in techniques, cell lines or reagents used would be prudent before future work on this system is undertaken.

90

5. Inhibition of the Brd3 Extraterminal Domain

5.1 Introduction

Fragment based drug design (FBDD) is a rapid and economical way to produce candidate molecules that can be elaborated into drug leads targeted to a protein of interest, as discussed in Section 1.6. Work presented in this chapter discusses efforts to use FBDD to produce an inhibitor against the ET domain of the bromodomain-containing protein Brd3, a widely expressed eukaryotic transcriptional regulator (Section 1.4). Dysregulation of Brd3 is implicated in oncogenesis in cancers such as NUT midline carcinoma and leukemia [210, 211]. While inhibitors of the Brd3 bromodomains have been characterised [75, 76, 85, 99, 118], the protein-protein interaction mediating ET domain has not been a target for inhibition. However, work in the Mackay laboratory has found that the ET domain interacts with chromatin remodellers such as CHD4, BRG1 and TAF7 and is able to recruit them to chromatin [87]. Due to these interactions, Brd3 ET was proposed as an alternative drug target to the bromodomains of Brd3. We have therefore attempted to design an inhibitor of the ET domain to provide the option of a synergistic drugging approach, as well as the ability to combat the emergence of drug-resistance mutations by inhibiting Brd3 in multiple ways.

A FBDD screen was carried out against Brd3 ET domain using Saturation Transfer Difference (STD)-NMR screening by Dr Lorna Wilkinson-White (Sydney Analytical, The University of Sydney). The top 60 hits from the primary screen were validated using 15N-1H HSQC NMR spectroscopy. The tightest binding fragment had a Kd of ~200 µM. A series of analogues were purchased based on this fragment hit, which allowed development of a structure-activity- relationship (SAR) around this molecule. These data were then used to undertake a medicinal chemistry campaign (carried out by Dr Luke Adams at the Monash Fragment Platform) using small scale, parallel synthesis. The screening workflow that was used is termed Rapid Elaboration of Fragments into Leads (REFiL) [212]. Molecules derived from this approach were tested for their ability to bind Brd3 ET at the Monash Fragment Platform. Surface Plasmon Resonance (SPR) utilising Off-Rate-Screening (ORS) was used for this step. The reaction mixtures are screened without any purification, under the assumption that only the products will have any discernible off-rate. Products from any reactions that showed the presence of a measurable off-rate were resynthesized, then retested by SPR by Lorna Wilkinson-White. The leading candidates from this approach were available at the commencement of my involvement in this project are presented in Figure 5.1. 91

Figure 5.1: Leading small molecule inhibitor candidates for the Brd3 ET domain. a) MFP-7137, b) MFP-7208 and c) MFP-7210 were produced through successive rounds of REFiL chemistry by the Monash Fragment Platform.

In order to validate the binding of these small molecules to Brd3 ET and determine their kinetics and mode of binding, efforts were made to interrogate the Brd3 ET:MFP-7137, Brd3 ET:MFP-7208 and Brd3 ET:MFP-7210 complexes by X-ray crystallography and NMR spectroscopy.

5.2 Expression and Purification of the Brd3 ET Domain

For the work described in this thesis, the Brd3 ET domain (residues 557–644; 10.7 kDa) was purified based on the protocol developed in our laboratory by Wai et al. [87]. Briefly, Brd3 ET had been cloned into the pGEX-6P vector (refer to Section 2.2), which incorporates an N- terminal GST tag into the expressed protein. The cloned vector was transformed into E. coli BL21(DE3) cells, induced with 0.4 mM IPTG and grown at 25 °C for 16 h (refer to Section 2.3.1). Following overexpression, cells were lysed, soluble and insoluble fractions were separated via centrifugation and Brd3 ET isolated via glutathione Sepharose affinity chromatography (Figure 5.2) (refer to Sections 2.3.3–2.3.4). 92

93

Figure 5.2: SDS-PAGE analysis of small-scale expression of the Brd3 ET domain. Overexpression of Brd3 ET was induced in E. coli BL21(DE3) by 0.4 mM IPTG, and cells grown at 25 °C for 16 h. a) Following cell lysis, soluble and insoluble fractions were separated by centrifugation, and the soluble fractions applied to glutathione Sepharose resin, which was then washed with lysis buffer. Beads refers to the washed resin sample. b) Elution fractions 3–5 from glutathione affinity chromatography were incubated with HRV 3C to cleave the GST tag, then applied to size exclusion chromatography. c) Selected fractions from the peak of size exclusion chromatography were analysed via SDS-PAGE.

Soluble Brd3 ET was obtained at a yield of approximately 4 mg per L of bacterial culture with an estimated purity of >95% (from fractions 60–66 of Figure 5.2c).

5.3 Crystallising the Brd3 ET Domain

Our laboratory has already solved the structure (using NMR spectroscopy) of complexes made by the Brd3 ET domain with peptides derived from chromatin remodelling enzymes [87]. However, while NMR methods can rapidly provide information about sites of binding [213], full structure determination by this method is time-consuming and therefore not ideal for the assessment of multiple fragment structures during a lead development campaign. X-ray crystallography is an ideal technique for structure determination in this context. Crystallisation of Brd3 ET had previously been explored by Taylor Szyszka in the Mackay laboratory, but this work was unsuccessful due to the inability to induce precipitation of Brd3 ET at the concentrations available. The work outlined below therefore focussed on using higher concentrations of Brd3 ET to attempt to induce precipitation and crystallization.

Initial attempts to crystallise the Brd3 ET domain alone for this thesis used commercially available 96 well sparse matrix and systematic grid crystallisation screens, including PACT, IndexHT, JCSG, and PEGRx. Brd3 ET was used at concentrations of 32 mg/mL, 45 mg/mL, 58 mg/mL and 81 mg/mL (in 10 mM Tris, 100 mM NaCl, pH 7.5). Macromolecular crystallisation sitting drop diffusion trays were used to set up two drops per well — one of 1:1 protein:mother liquor, and the other of 2:1 protein:mother liquor.

94

Figure 5.3: Crystallisation of the Brd3 ET domain using commercial crystallisation screens. Brd3 ET was concentrated to 81 mg/mL and the commercially available PACT screen (Hampton Research) was used to trial crystallography conditions. The crystal shown formed in 100 mM bis-tris propane, 200 mM sodium sulfate, 20% w/v PEG 3350, pH 6.5.

Based on these crystallisation trials, Brd3 ET appeared to be extremely soluble. Immediate precipitation occurred in approximately 20% of drops at 32 mg/mL, 25% of drops at 45 mg/mL, 35% of drops at 58 mg/mL, and 70% of drops at 81 mg/mL. However, even at 81 mg/mL all precipitate in the majority of conditions redissolved in a matter of days. After 20 days, from 12 trays including 6 different conditions at the two ratios of protein:mother liquor, only a single crystal was obtained (Figure 5.3). This crystal was sent to the Australian Synchrotron and tested for diffraction by Karishma Patel (University of Sydney), but no diffraction was observed.

Ultimately, Brd3 ET crystals would have been used to investigate the mode of binding in ET- small molecule complexes. In the absence of successful crystallisation of Brd3 ET by itself, it was decided to attempt to crystallise these complexes directly, as the complexes may be less soluble and therefore more amenable to crystallisation. Crystallisation screens were again performed with PACT, IndexHT, JCSG, and PEGRx. Brd3 ET at concentrations of 9 mg/mL, 22 mg/mL, 34 mg/mL or 44 mg/mL was mixed with 1.2 molar equivalents of either MFP-7208 or MFP-7210 (refer to section 5.1.2). Immediate precipitation was observed for 10–30% of conditions in the different screen sets. However, no crystal formation was observed under any condition.

95

5.3.1 Design, expression and purification of truncated Brd3 ET constructs

Given the lack of crystallisation, and the apparently high levels of solubility of the protein, analysis of the NMR structure of Brd3 ET was undertaken. As shown in Figure 5.4a, both the N- and C-termini of Brd3 ET are predicted to be disordered according to its solution structure. These potentially flexible regions could form an impediment to crystallisation by preventing crystal packing. To address this issue, truncated constructs of Brd3 ET were designed, termed ET short (ETS, Figure 5.4b) and ET shorter (ETSR, Figure 5.4c). These truncation constructs were subcloned from the original Brd3 ET plasmid (Appendix 1). The constructs were then expressed and purified in the manner outlined in section 5.2 for the original construct (Figure 5.5). Both truncation constructs behaved similarly to the original ET construct throughout expression and purification. They were largely soluble, of the size expected (Figure 5.5a) and yielded approximately 2 mg per L of bacterial culture of >90% pure protein (Figure 5.5c, d).

96

Figure 5.4: Design of less flexible truncation constructs of Brd3 ET. a) Ensemble of the 20 lowest energy solution structures of Brd3 ET as determined by NMR [PDB ID 6BGH], demonstrating the flexible N- and C-termini. b) “ET short” (ETS), a truncation of Brd3 ET made by removing six residues from the C-terminus in order to reduce the flexibility of the protein. c) “ET shorter” (ETSR), a further truncation of Brd3 ET made by removing eight residues from the N-terminus and six residues from the C-terminus. [87]. d) Schematic of the sequence of Brd3 ET, showing the residues removed in each truncation. 97

Figure 5.5: SDS-PAGE analysis of trial expression of truncated Brd3 ET constructs. Overexpression of the Brd3 ET truncations ETS and ETSR was induced in E. coli BL21(DE3) by 0.4 mM IPTG, and cells grown at 25 °C for 16 h. a) Following cell lysis, soluble and insoluble fractions were separated by centrifugation, and the soluble fractions applied to glutathione Sepharose resin, which was then washed with lysis buffer. Pooled elution fractions were incubated with HRV 3C to cleave the GST tag. b) Pooled, cleaved elution fractions were applied to size exclusion chromatography. c), d) Protein- containing elution fractions were examined for purity by SDS-PAGE.

98

5.3.2 Crystallisation of truncated Brd3 ET constructs

PACT and IndexHT 96 well crystallisation screens were used to screen crystallisation conditions for the truncated constructs. ETS and ETSR, each at concentrations of 11 mg/mL, 27 mg/mL and 39 mg/mL (in 10 mM Tris, 100 mM NaCl, pH 7.5) were used. Screens were carried out with protein only or protein plus 1.2 molar equivalents of either MFP-7208 or MFP-7210. As above, sitting drop diffusion trays were used to set up two drops per well — one of 1:1 protein:mother liquor, and the other of 2:1 protein:mother liquor. As with the full- length ET domain, these constructs appeared to be highly soluble. Depending on the 96-well plate, immediate precipitation occurred in 20–50% of wells, with an apparently higher percentage of wells containing precipitation in ETS plates than in full length or ETSR plates.

Figure 5.6: Crystallisation of Brd3 ET truncation using commercial crystallisation screens. ETS and ETSR at concentrations of 11 mg/mL, 27 mg/mL and 39 mg/mL (in 10 mM Tris, 100 mM NaCl, pH 7.5) were used, both alone and in combination with 1.2 molar equivalents of either MFP-7208 or MFP- 7210. Commercially available PACT and IndexHT screens (Hampton Research) were used to trial crystallography conditions. Conditions that caused crystal formation were: a) ETS + MFP-7210 in 100 mM Bis-Tris, 50 mM calcium chloride dihydrate, 30% PEG 500, pH 6.5; b) ETS + MFP-7208 in 100 mM HEPES, 500 mM magnesium formate dihydrate, pH 7.5; c) ETSR + MFP-7210 in 100 mM MIB, 25% PEG 1500, pH 4.0; d) ETS + MFP-7210 in 100 mM MIB, 25% PEG 1500, pH 5.0. 99

From these screens, four conditions yielded crystals (Figure 5.6). Crystals from each of these conditions were sent to the Australian Synchrotron and tested for diffraction by Karishma Patel, but no diffraction was observed.

5.3.3 Aiding Brt3 ET crystallisation with a heterogenous nucleating agent

Heterogenous substances are often used in crystallisation experiments to provide a surface to encourage crystal nucleation [214, 215]. It was decided to try this approach for the ET, ETS and ETSR truncations. In these experiments, finely ground dried seaweed was used [216]. Crystallisation of all three proteins (at the same concentrations as in Section 5.3.2) was repeated as described in Figure 5.3, with the addition of 0.5 µg of ground seaweed per µL of protein solution used.

Figure 5.7: Crystallisation of the Brd3 ET domain and its truncations using a heterogenous nucleating agent. Brd3 ET, ETS or ETSR were concentrated to ~40 mg/mL, finely ground dried seaweed was added (0.5 µg/µL) and the commercially available PACT and IndexHT screens (Hampton Research) were used to trial crystallography conditions. The crystals shown formed in a) ETSR, 100 mM MMT, 25% PEG 1500, pH 5.0; b) ETS, 2.8 M sodium acetate trihydrate, pH 7.0.

The seaweed powder did not appear to have a significant effect on levels of protein precipitation or crystallisation of the proteins used, as immediate precipitation remained at ~30% and only two crystals were observed (Figure 5.3). These crystals were sent to the Australian Synchrotron and tested for diffraction by Karishma Patel, but no diffraction was observed.

100

5.4 Binding of Small Molecule Inhibitors to Brd3 ET: Investigation via NMR

The high levels of solubility and low levels of crystal formation detailed in Section 5.3 suggested that Brd3 ET was not amenable to X-ray crystallography. It was therefore decided to use NMR spectroscopy to investigate fragment binding.

5.4.1 Analysis of binding affinity and mode via HSQC titration

As a first step to characterising fragment binding to the ET domain, 15N-HSQC titrations were carried out to assess whether the ET:fragment interactions would be suitable for analysis by NMR methods. For example, if addition of the small molecule resulted in intermediate exchange, peaks from residues at the interaction interface would disappear upon addition of binding partner, making it possible to identify putative interaction sites, but virtually impossible to confirm details of binding including the determination of full structures.

An 15N-HSQC spectrum ideally reports the chemical shift of each H-N pair within a protein (for those protons that do not exchange too rapidly with solvent water), including those in the backbone of every non-proline amino acid as well as asparagine and glutamine side chain groups and some lysine/arginine side chain groups. These chemical shifts are sensitive to changes in environment, so changing the environment will cause chemical shift changes. For example, addition of a binding partner typically causes chemical shift changes for residues that are at or close to the binding site [213], so that if the spectrum has been assigned, binding sites can be identified. The degree of chemical shift observed for residues involved in binding to the ligand can also be used to calculate the binding affinity of the protein for the ligand [213], provided that the residues are not in intermediate exchange (producing a shift so broad that it is often not detectable). Thus, by titrating a protein sample with increasing concentrations of ligand, chemical shift changes can be observed, binding affinities calculated, and information about the mode of binding obtained.

Full length Brd3 ET has been extensively studied by this method in our group [87], and was therefore used instead of a truncated version in order to be able to directly compare data, and infer residue assignments based on assignments of pre-existing spectra.

Brd3 ET was titrated with up to 1.2 molar equivalents of the three small molecule inhibitor candidates: MFP-7137, MFP-7208 and MFP-7210 (Figure 5.1). The spectra collected are shown in Figure 5.8. In general, spectra and patterns of shifts appeared to be of good quality and quite similar across the three fragments. Spectra of Brd3 ET alone before addition of 101 inhibitors are analogous to those collected previously in our group, with minimal differences in peak position or visibility observable [87].

In each case, ~5% of peaks exhibited shifts greater than 0.1 ppm in either dimension, and another ~20% exhibited shifts between 0.05 and 0.1 ppm. While the peaks that exhibited these shifts were broadly similar for all three fragments, MFP-7210 displayed the most shifted peaks overall. In each case, the intensity of ~5% of peaks was reduced to the point that they were not detectable when 1.2 molar equivalents of fragment were added, likely indicating that these residues were entering intermediate exchange. These residues were very similar between the three conditions. Kds for each small molecule were calculated using the chemical shift changes of residues that exhibited the most pronounced shifts (Figure 5.9) [217]. These residues were identified based on the total shift (the square of the sum of the 1H and 15N shifts) between 0 and 1.2 molar equivalents of ligand (Figure 5.10). Fits were carried out using a simple 1:1 Langmuir binding isotherm in the GraphPad Prism software package [218]. Finally, those residues were mapped onto the structure of Brd3-ET to gain insight into the mode of binding of each inhibitor (Figure 5.11).

102

Figure 5.8: 15N-HSQC titrations of Brd3 ET binding to small molecule inhibitors. 43 µM 15N-labelled Brd3 ET in 10 mM Tris, 50 mM NaCl, pH 7.5 was titrated with increasing amounts of a) MFP-7137; b) MFP-7208; or c) MFP-7210, up to 1.2 molar equivalents. 15N-HSQC spectra were collected following each addition.

103

Figure 5.9: Binding affinity of Brd3 ET for small molecule inhibitors. Brd3 ET was titrated with up to 1.2 molar equivalents of a) MFP-7137, b) MFP-7208 or c) MFP-7210. 15N-HSQC spectra were gathered following each addition and chemical shift changes of heavily shifted residues plotted. Binding curves were globally fitted to the shifts using a one-site binding model with ligand depletion. Curves shown are best fits by sum of squares. d) Kds for each fragment were calculated using the chemical shifts from the binding curves presented in panels a–c (with total shifts calculated using the square of the sum of

1 15 the H and N shifts), and compared to Kds obtained via SPR by the Monash Fragment Platform.

104

105

Figure 5.10: Changes in chemical environment of Brd3 ET residues following small molecule inhibitor binding, judged from changes in signals in 15N-HSQC spectra. Brd3 ET was titrated with up to 1.2 molar equivalents of a) MFP-7137, b) MFP-7208 or c) MFP-7210. 15N-HSQC spectra were gathered following each addition and chemical shift changes at 1.2 molar equivalents plotted. Residues were assigned based on comparison with previous assignments of Brd3 ET in complex performed by Dr Lorna Wilkinson-White.

106

Figure 5.11: Structural data of the binding mode of small molecule inhibitors of Brd3 ET derived from HSQC titration. Brd3 ET was titrated with up to 1.2 molar equivalents of a) MFP-7137, b) MFP-7208 or c) MFP-7210. 15N-HSQC spectra were gathered following each addition and shifted residues highlighted on the structure of Brd3 ET bound to its native peptide ligand (represented in purple). Residues that shifted between 0.05 and 0.1 PPM are shown in blue, and residues that shifted more than 0.1 PPM in red. From PDB ID 6BGH [87].

The HSQC titrations shown in Figure 5.8 indicate similar binding sites for all three small molecule inhibitors, as the patterns of shifting residues was very similar for all three titrations. 107

In each case, the Brd3 ET sample was not saturated with ligand, as some residues remained in intermediate exchange (indicated by the peaks representing the residue not being visible at higher concentrations of ligand) and because the affinities are not sufficiently high relative to the concentrations of ligand that were used. However, binding affinity information was still able to be gathered based on residues that were not in intermediate exchange (Figure 5.9).

As shown in Figure 5.9d, Kds determined for all three small molecule inhibitors indicated binding in the low micromolar range, agreeing closely with the results of complementary surface plasmon resonance (SPR) measurements performed by the Monash Fragment Platform.

Overlaying highly shifted residues (i.e., those most likely to be involved in binding) on the structure of Brd3 ET bound to a native peptide ligand (derived from CHD4 [87]) indicates that all three small molecule inhibitors use a similar mode of binding, as the shifting residues are similarly located in all three cases (Figures 5.10 and 5.11). In all cases the shifted residues are mostly clustered around the protein’s native binding cleft, providing evidence that the inhibitors are binding in the same cleft and therefore should be able to effectively compete with the native ligand. However, this was not conclusive. In all three cases, there are also numerous shifted residues that are remote from the binding site, and dispersed over the entire protein structure. This is common in HSQC titrations, as proteins often undergo conformational changes during ligand binding that are not necessarily localised to the binding site [213]. The pattern of shifts alone was therefore not enough to reliably conclude the mechanism of binding, and more fine-grained information was required.

5.4.2 Analysis of the binding mode of MFP-7210 to Brd3 ET via molecular docking

As noted above, de novo determination of protein-fragment complexes by NMR methods is time consuming. However, molecular docking algorithms can be used to computationally model the binding mechanism of a ligand, and this approach can be made significantly more powerful by the use of experimental intermolecular restraints. As the NMR solution structure of BrD3 ET in complex had already been determined by our group [87], the use of molecular docking based on this existing structural information provided a faster way to determine binding mode than solving the entire structure of the Brd3 ET-fragment complexes. The Brd3 ET/MFP-7210 complex was chosen as the focus of this work, as the mode of binding of MFP- 7208 was already being investigated via the same means by Lorna Wilkinson-White, and MFP-

108

7210 was observed to have differences in shifted residues from both MFP-7208 and MFP- 7137 during the experiments presented in Section 5.4.1.

It was decided to use the HADDOCK webserver, a molecular docking server that is able to process a large number and variety of restraints within a relatively short timeframe [164], to determine the Brd3 ET:MFP-7210 structure. The restraints to be used were intermolecular NOEs between Brd3 ET and MFP-7210, which provide distance restraints, and the list of highly shifted residues from the HSQC titrations as shown in Figures 5.10 and 5.11.

In order to generate the NOEs, a set of triple resonance NMR experiments was collected to allow assignment of the protein proton and carbon resonances from the Brd3 ET:MFP-7210 complex. Details of these experiments are provided in Section 2.10. Chemical shift assignments for both molecules were made using the NMRFAM-SPARKY NMR analysis package [219]. Briefly, making resonance assignments involves matching peaks within the spectra generated by the above experiments to atoms within the structure of the molecule based on known details of their chemical environments. By correlating relationships between peaks to bond distances between atoms in the structure, it is ideally possible to assign a peak for every atom in the structure. In this case, spectra that provide information on through- space distance between atoms can then be used to inform calculations to determine the fold of the protein or the mode of interaction in a protein complex.

In more depth, residue assignments in the 15N-HSQC were made using the HNCACB and CBCA(CO)NH spectra, guided by existing 15N-HSQC assignments from the Brd3 ET:BRG1 15N- HSQC [162]. Cα and Cβ assignments were made using the HNCACB, CBCA(CO)NH and CC(CO)NH spectra, and further side chain carbon atoms were assigned using the CBCA(CO)NH and CC(CO)NH spectra. Side chain proton assignments were made using the HBHA(CO)NH spectrum, and confirmed using the H(C)CH TOCSY. 1H-1H NOESY and COSY spectra collected from the fragment alone were used to make the fragment proton assignments in the unbound state, and these assignments were transferred to the filtered NOESY and COSY for the ET:fragment complex in order to assign the fragment protons in the bound state. Finally, intermolecular NOEs were identified in the 3D ω1-13C, 15N filtered, ω3-13Cali edited [1H, 1H]- NOESY using the generated assignments (Table 5.1, Figure 5.11).

The quality of the spectra gathered were generally very good, and in line with previous spectra collected for Brd3 ET complexes. No re-acquisition of data was required. Every residue was

109 able to be unambiguously assigned, except for ~10 residues at each terminus and residues 610–613. Inability to identify these residues was expected based on NMR analysis of other Brd3 ET complexes [87]. Inability to assign residues 610–613 is a hinderance, as these residues are in the vicinity of the binding cleft, but sufficient NOEs (physical distance restraints) were nonetheless able to be identified for docking calculations to be feasible. Overall, 18 NOEs were unambiguously identified between Brd3 ET and MFP-7210. These NOEs involved atoms on five residues in the Brd3 ET domain (Table 5.1, Figure 5.12).

Table 5.1: Intermolecular NOEs used for Brd3 ET:MFP-7210 docking. Refer to Figure 5.12 for MFP- 7210 nomenclature.

Brd3 ET Residue/Atom MFP-7210 Atom I598 HG12 H1 I598 HG12 H7 I598 HG12 H15 I599 HG12 H8 I599 HG12 H10 I614 HG21 H15 I614 HG22 H1 I614 HG22 H16 I616 HG12 H1 I616 HG12 H6 I616 HG12 H7 I616 HG12 H10 I616 HG12 H23 T620 HG21 H1 T620 HG21 H3 T620 HG21 H5 T620 HG21 H7 T620 HG21 H11

110

Figure 5.12: MFP-7210 nomenclature.

Figure 5.13: Structural data of the binding mode of small molecule inhibitors of Brd3 ET derived from HSQC titration. Structural data of the complex between Brd3 ET and the small molecule inhibitor candidate MFP-7210 were gathered via NMR spectroscopy. Residues for which NOEs with MFP-7210 were identified are highlighted in red on the structure of Brd3 ET. The native peptide ligand (represented in purple) is shown to demonstrate the native mode of binding. 111

The HADDOCK webserver was used for docking calculations [164], using the lowest energy solution structure of Brd3 ET [87] and the structure of MFP-7210 computed by Luke Adams from the Monash Fragment Platform (Figure 5.13). HADDOCK docking calculations were performed as described in section 2.11. Briefly, the N- and C-termini of Brd3 ET were assumed to be fully flexible, as was MFP-7210. The NOEs presented in Table 5.1 were input as distance restraints, with an assumed distance of 5 Å (the maximum distance at which NOEs are likely to be visible). Residues around the binding interface, as determined from the list of highly shifted residues from HSQC titration data (Section 5.4.1) were input as ambiguous interaction restraints (AIRs) with a distance of 4 Å. AIRs do not have to be satisfied by docking, but are used for prioritising solutions. These residues were also set as semi-flexible (meaning that their bond angles can be changed during calculation, but solutions to the calculation where these residues maintained similar bond angles to the input structure were prioritised). The remaining residues were assumed to be fixed. A set (1000) of possible docked structures were generated, and the lowest energy structures iterated upon for two subsequent generations. Following completion of the run, the top ten structures with the lowest HADDOCK scores and violation energies of restraints were collected, overlaid over the structured regions and examined in PyMOL (Figures 5.14 and 5.15).

112

Figure 5.14: Molecular docking of the Brd3 ET:MFP-7210 complex compared to NMR solution structure of Brd3 ET bound to a native CHD4-derived ligand. a), b) The ten lowest energy solutions for Brd3 ET bound to MFP-7210 (represented in purple), as determined by HADDOCK molecular docking software [164] and represented as cartoon (a); disordered N- and C-termini not shown), or surface (b) views. c), d) The lowest energy solution structure of Brd3 ET bound to its native peptide ligand, as determined by NMR spectroscopy and represented as cartoon (c) or surface (d) views. From PDB ID 6BGH [87].

113

Figure 5.15: Major and minor orientations of MFP-7210 determined by molecular docking of the Brd3 ET:MFP-7210 complex. a), b) Seven of the ten lowest energy solutions for Brd3 ET (disordered N- and C-termini not shown) bound to MFP-7210 (represented in purple), as determined by HADDOCK molecular docking software [164] and represented as cartoon (a) or surface (b) views. These seven solutions are extremely similar in MFP-7210 orientation (dubbed the major orientation). c), d) The remaining three of the ten lowest energy solutions for the Brd3ET/MFP-7210 complex. These solutions have MFP-7210 in a reversed orientation (dubbed the minor orientation) compared to the other seven.

Figure 5.14 shows that HADDOCK was able to arrive at mostly consistent solutions for the binding of MFP-7210 to Brd3 ET. MFP-7210 inhabits the same binding cleft in all solutions in two possible orientations, one occupied by seven of the top ten structures (the major orientation) and one occupied by three (the minor orientation) (Figure 5.15). Neither of these orientations satisfies 100% of the NOE restraints imposed (Figure 5.6), suggesting that some of the signals observed in the NOESY might be bleed-through signals (i.e., signals that are not

114 intermolecular NOEs, but rather intramolecular NOEs within the protein that are not fully subtracted out), or were incorrectly assigned (none of the NOEs presented in Table 5.1 appeared ambiguous in the residue they were assigned to, but the possibility remains that E512 or E514, which were the only active site residues that were not able to be assigned, were responsible for them). In all cases, NOEs identified between residues I598/I599 and MFP-7210 were not satisfied. Additionally, the minor orientation of MFP-7210 did not satisfy the NOEs recorded for T620, suggesting that it is less likely to represent the binding mode.

115

Figure 5.16: NOEs identified between Brd3 ET and MFP-7210 by NMR spectroscopy, shown on the lowest energy structure of the of the Brd3 ET:MFP-7210 complex as determined by molecular docking. NOEs between MFP-7210 and Brd3 residues a) I598, b) I599, c) I614, d) I616, and e) T620 are shown in yellow for NOEs satisfied by the molecular docking solution, or red for NOEs not satisfied. NOEs may appear as longer than in reality (particularly in panel e) due to the arbitrary positioning of residues in disordered regions of structure. 116

Regardless of the orientation, MFP-7210 occupies the same binding cleft as the native peptide ligand in either orientation proposed (Figures 5.15 and 5.17). The binding of the major orientation is also mainly based on hydrophobic interactions between the side chains of G592, I614 and I616 and MFP-7210, which is analogous to the hydrophobic interactions that drive binding to native peptide ligands [87] (Figure 5.17). These findings support the conclusion from the HSQC titration analysis (Section 5.4.1) that MFP-7210 may be able to act as a competitive inhibitor of Brd3 ET, particularly if additional chemical synthesis can further improve the affinity of the interaction.

Figure 5.17: Hydrophobicity of the Brd3 ET:CHD4 and Brd3 ET:MFP-7210 complexes. Amino acids are coloured according to their hydrophobicity [220], with a darker red indicating greater hydrophobicity. a) The lowest energy solution structure of Brd3 ET bound to its native peptide ligand, as determined by NMR spectroscopy and represented as cartoon or surface views. From PDB ID 6BGH [87]. b) The lowest energy solution for Brd3 ET bound to MFP-7210, as determined by HADDOCK molecular docking software [164] and represented as cartoon (disordered N- and C-termini not shown) and surface views. 117

5.5 Discussion

The work presented in this chapter represents progress towards the creation of a viable inhibitor of Brd3 ET, for further examination as a potential anti-cancer agent. The HSQC titration experiments outlined in section 5.4.1 confirmed that the small molecule inhibitors MFP-7137, MFP-7208 and MFP-7210 are able to bind Brd3 ET with low micromolar affinity. Although nanomolar affinity or tighter is generally considered a requirement for drugs, these values are approaching affinities that can be tested in biological assays [221, 222]. Docking experiments presented in section 5.4.2 provide evidence that the mode of binding of MFP- 7210 is identical to that of the native BRD3 ET substrate. A similar set of experiments performed with MFP-7208 by Dr. Lorna Wilkinson-White [unpublished] confirmed that it also binds in the same cleft. It is therefore possible that both fragments could act as competitive inhibitors of Brd3 ET. Both molecules will now be used as templates for further rounds of REFiL chemistry in order to produce molecules with tighter affinity for Brd3 ET.

As further small molecule inhibitors will have to be developed and their properties examined as this project continues, the ability to crystallise Brd3 ET complexes for rapid structure determination would still prove a huge benefit. Many remaining strategies exist to attempt to crystallise the protein. For example:

 Development of gradient screens based on the few conditions from commercial screens that produced crystals, in order to attempt to produce optimal conditions for crystallisation based on these conditions. Fine-tuning a particular screening condition by making small changes to buffer concentrations, protein concentrations, temperatures, etc. can rapidly optimise crystal formation conditions [223, 224].  Further experimentation with the concentration of Brd3 ET (or its truncations) used for crystallisation screens, and/or with different commercial screens. Highly soluble proteins sometimes requires concentrations up to 200 mg/mL to reliably induce crystallisation [225, 226].  Methylation of lysine residues in Brd3 ET to reduce its ability to hydrogen bond with solvent, and therefore its solubility [227]. However, this may result in unwanted conformational changes.  Crystallisation under oil to slow the rate of solvent evaporation from droplets and allow ordered crystals more time to form [228, 229]. This also allows smaller volumes of protein solution to be used, making higher protein concentrations more practical. 118

 Addition of other kinds of heterogenous nucleating agents. A large variety of additives to crystal trays, from horse hair to titanium oxide, have been shown to aid in crystal formation in specific scenarios [216].

Ideally, the discovery of conditions that produce consistent Brd3 ET crystals would facilitate rapid structural studies of the complexes of Brd3 ET bound to various new small molecule inhibitor candidates in order to speed up development of tighter binding inhibitors. The goal of this stage of drug discovery is ultimately to produce an inhibitor with nanomolar affinity for Brd3 ET, as this is often sufficient specificity for in vivo applications [152-154, 156, 221, 222, 230]. The next aim would be to investigate the activity and effects of the drug candidate in human cells. As mentioned in Section 1.4, BET dysregulation is involved in numerous forms of cancer. BET-dependent cancer cell lines present an attractive testing ground for drug candidates. For example, BET upregulation has been shown to be required for cell proliferation in the human multiple myeloma cell line OPM1 [118], the human neuroblastoma cell line SK-N-BE(2)-C [111, 231] and the human melanoma cell line A375 [232]. Ideally multiple cell lines from as many different forms of cancer as feasible would be tested to assess how widely applicable BET ET domain inhibition is to the prevention of cancer cell proliferation, and to determine whether effects of the drug are targeted or also inhibit healthy cell function.

Chapter 4 detailed numerous challenges faced when attempting to perform cell-based treatment efficacy assays with human cell lines. However, experiments with small molecule inhibitors are able to avoid these issues as the target cells are unmodified cancer cell lines rather than fragile transfected cells. Assays testing the efficacy of drug candidates from fragment-based drug design are therefore much more easily and quickly performed and troubleshooted compared to the assays detailed in Chapter 4.

A suite of assays are available to determine the effectiveness of drug candidates in cell-based assays following treatment with drug candidates:

 Cell proliferation can be assayed by computational microscope image analysis to determine whether drug candidates are capable of supressing tumour proliferation.  EdU staining [233] is a straightforward technique for staining cells currently in S phase, which can be quantified via flow cytometry, allowing determination of the effects of drug candidates on cell cycle progression in cancer models.

119

 Cell viability can be assayed via fluorometric flow cytometry, to assess whether treated cells remain viable.

Simultaneously, cell permeability of the drug candidate should be investigated to ascertain whether modifications to the compound or a delivery mechanism would be required for the drug to be pharmacoactive. The Caco-2 permeability assay [234, 235] uses a cultured monolayer of cells from the Caco-2 colon epithelial cell line to mimic the human intestinal wall. This monolayer (grown on a permeable artificial membrane) can simply be placed as a barrier between two solutions, one containing the compound of interest. Measurement of the rate of transfer of the compound across the monolayer can be performed by mass spectrometry, allowing analysis of the permeability of the drug candidate to human tissue. Alternative, the Parallel Artificial Membrane Permeability Assay (PAMPA) [234, 236] uses a similar procedure with an artificial membrane that mimics the permeability of human cells.

Cell permeability assays lead more broadly into absorption distribution metabolism excretion (ADME) studies, a loose term for pharmacokinetic studies which determine the bioavailability, toxicity, biological half-life, etc. of drug candidates [237]. Other than cell permeability, important ADME studies to perform on a promising drug candidate would include:

 Rate of metabolism. This can be assayed in vitro by incubating cultured hepatocytes with a known amount of the drug, and determining the rate of drug metabolism by quantitative mass spectrometry [237]  Distribution of the drug candidate within the body, ideally assayed in a mouse model using a radiolabelled drug and detection via magnetic resonance imaging (MRI) or positron emission tomography (PET) [237]  Excretion/biological half-life of the molecule, again via radiolabelling and faecal/urinary analysis in a mouse model.

Finally, investigation of the toxicity of the drug candidate must be performed before human trials could begin. Toxicity can partially be observed in vitro in cultured cells, and some idea of the therapeutic window (the range of dosages that have measurable effects but are not dangerously toxic to the patient) gleaned from comparison of the viability of control, non- BET-dependent cell lines to target cell lines. Again, observations of toxicity in mouse models are required to accurately predict toxicity to humans [237].

120

It is worth noting that the vast majority of drug candidates fail to meet minimum cell permeability/bioavailability/toxicity criteria to proceed with human trials [237]. However, production of a high-affinity inhibitor of the ET domain would nonetheless be useful to ongoing research into the function of the ET domain. Further studies are required to confirm its apparent role in protein-protein interactions [87], and to investigate the range of ligands the ET domain binds and their implications. An ET domain inhibitor allows investigation of the function of the ET domain in a much more granular manner than mutagenesis, gene knockdown, RNAi, or similar methods that interfere with transcription or translation of BET proteins, as these methods are extremely likely to cause nonspecific side effects that are difficult to control for. Therefore, a failed drug candidate would still prove extremely useful to future ET drug development.

121

Concluding Remarks

Although the field of epigenetics has progressed rapidly in the last 20 years and several drugs targeting epigenetic mechanisms are undergoing clinical trials, issues remain that hinder our ability to develop or select further drug targets within epigenetic regulatory mechanisms [238].

The epigenome is a system of incredible complexity, and functions of many epigenetic marks and related enzymes are unknown. Only approximately a third of the ~130 known unique epigenetic marks and ~220 known epigenetic readers or writers have been functionally categorised [68, 97]. This limits the choices of enzymes and epigenetic marks that can be targeted in drug design efforts.

In addition, many of the best-studied epigenetic enzymes have a diverse array of non- epigenetic functions that are still being determined. The complexity of these systems means that any drug that causes indiscriminate inhibition of an epigenetic enzyme is likely to have off-target effects that cannot currently be anticipated, as observed in the clinical trials discussed in Section 1.4.

Even if it is known or assumed that a targeted enzyme only has histone modification functions, drugging that enzyme is still likely to cause off-target effects due to the large number of genes that will likely have their transcription impacted by inhibition of the enzyme. In most cases the desired target of the therapy is one or a small number of genes, but as discussed in Section 1.4, all characterised epigenetic enzymes perform their functions across the whole genome. An ideal solution would involve localising the effects of the drug to the specific gene locus desired. Work in this thesis has made steps towards implementing such a targeted system, though complications in implementation will require further study to overcome.

The split enzyme reassembly system studied in Chapter 3 showed initial promise with the success of positive controls, but we were unable to demonstrate that the system was functional with real enzymes within the scope of the experiments performed. Ultimately more investigation into the system is required to determine whether the negative results were due to methodological flaws, issues with the leucine zipper model used, or truly due to the inability of the SETD7 enzyme to reconstitute activity in a split system. Even if SETD7 were to be shown to be unable to function in such a system, Section 3.2.2 suggested a number of other enzymes potentially suitable for future study along similar lines. More broadly, split 122 enzyme systems have found increasingly found roles as reporters for in vitro assays [239-242]. Interestingly, a recent study was able to use an (admittedly much-different) algorithmic enzyme-splitting system to selectively target carcinogenic cells in vitro [243], which is an indication that the concept of split-enzyme systems holds promise for more than just assay reporting.

The cell-based in vitro assays presented in Chapter 4 were intended to more closely mimic in vivo conditions in order to avoid wasting time on experiments that risk not being repeatable in more realistic conditions. However, the difficulties encountered with cell-based assays in this work indicate that more extensive use of recombinant bacterial systems, to allow for quicker and cheaper experiments and more controlled experimental systems, may be a more prudent path for future research. Once a recombinant split enzyme system has been established as functional, introducing that system to mammalian cells will allow for experiments with less uncertainty and fewer variables, ideally resulting in quicker and surer progress.

However, investigation of drug candidates via fragment-based drug discovery and NMR spectroscopy as presented in Chapter 5 proved more fruitful. Promising candidates for specific inhibition of the Brd3 ET domain were identified via NMR spectroscopy techniques. With further refinement these inhibitors are likely to be able to act as highly specific inhibitors in vivo, and at that point their effects on cancer growth can be investigated. The lack of success of attempts to crystallise the Brd3 ET domain are likely due to the solubility and resistance to crystal packing of the protein itself, particularly in light of the complete lack of published work containing successful crystallisation of the domain. Therefore, the NMR and molecular docking techniques which bore fruit in this research should be (and are being) continued, in order to refine the drug candidates shown in Chapter 5 into inhibitors capable of demonstrating significant effects in vivo.

Taken as a whole, the work presented in this thesis is a contribution to the body of research on the vastly complicated field of the histone code. While the aims of producing systems to combat cancer in vivo encountered issues and require substantial further investigation if they are to prove successful, these data provide progress towards further understanding and manipulation of this crucial aspect of eukaryotic cellular regulation.

123

References

1. Gillooly, J.F., A. Hein, and R. Damiani, Nuclear DNA content varies with cell size across human cell types. Cold Spring Harbor Perspectives in Biology, 2015. 7(7): p. a019091-a019091. 2. Mirsky, A. and H. Ris, Variable and constant components of chromosomes. Nature, 1949. 163(4148): p. 666. 3. Campbell, N., Biology: Concepts & Connections. 1997, San Francisco: Benjamin Cummings. 4. McGhee, J.D. and G. Felsenfeld, Nucleosome structure. Annual Review of Biochemistry, 1980. 49(1): p. 1115-1156. 5. McGinty, R.K. and S. Tan, Nucleosome structure and function. Chemical Reviews, 2015. 115(6): p. 2255-2273. 6. Eberharter, A. and P.B. Becker, ATP-dependent nucleosome remodelling: factors and functions. Journal of Cell Science, 2004. 117(17): p. 3707-3711. 7. Varga-Weisz, P.D. and P.B. Becker, Regulation of higher-order chromatin structures by nucleosome-remodelling factors. Current Opinion in Genetics & Development, 2006. 16(2): p. 151-156. 8. Biel, M., V. Wascholowski, and A. Giannis, Epigenetics—an epicenter of gene regulation: histones and histone-modifying enzymes. Angewandte Chemie International Edition, 2005. 44(21): p. 3186-3216. 9. Berger, S.L., et al., An operational definition of epigenetics. Genes & Development, 2009. 23(7): p. 781-783. 10. Razin, A. and A.D. Riggs, DNA methylation and gene function. Science, 1980. 210(4470): p. 604- 610. 11. Davey, C.A., et al., Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 A resolution. Journal of Molecular Biology, 2002. 319(5): p. 1097-113. 12. Zhou, V.W., A. Goren, and B.E. Bernstein, Charting histone modifications and the functional organization of mammalian genomes. Nature Reviews Genetics, 2011. 12(1): p. 7. 13. Tan, M., et al., Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell, 2011. 146(6): p. 1016-1028. 14. Marmorstein, R. and M.-M. Zhou, Writers and readers of histone acetylation: structure, mechanism, and inhibition. Cold Spring Harbor Perspectives in Biology, 2014. 6(7): p. a018762. 15. Farooq, Z., et al., The many faces of histone H3K79 methylation. Mutation Research — Reviews in Mutation Research, 2016. 768: p. 46-52. 16. Li, E., Chromatin modification and epigenetic reprogramming in mammalian development. Nature Reviews Genetics, 2002. 3(9): p. 662. 17. Suganuma, T. and J.L. Workman, Crosstalk among histone modifications. Cell, 2008. 135(4): p. 604-607. 18. Bird, A.P. and E.M. Southern, Use of restriction enzymes to study eukaryotic DNA methylation: I. The methylation pattern in ribosomal DNA from Xenopus laevis. Journal of Molecular Biology, 1978. 118(1): p. 27-47. 19. Musselman, C.A., et al., Perceiving the epigenetic landscape through histone readers. Nature Structural & Molecular Biology, 2012. 19(12): p. 1218. 20. Stedman, E. and E. Stedman, Cell specificity of histones. Nature, 1950. 166(4227): p. 780-781. 21. Wallis, J.W., L. Hereford, and M. Grunstein, Histone H2B genes of yeast encode two different proteins. Cell, 1980. 22(3): p. 799-805. 22. Durrin, L.K., et al., Yeast histone H4 N-terminal sequence is required for promoter activation in vivo. Cell, 1991. 65(6): p. 1023-1031. 23. Brownell, J.E., et al., Tetrahymena histone acetyltransferase A: a homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell, 1996. 84(6): p. 843-851. 24. Taunton, J., C.A. Hassig, and S.L. Schreiber, A mammalian histone deacetylase related to the yeast transcriptional regulator Rpd3p. Science, 1996. 272(5260): p. 408-411. 25. Baxevanis, A.D. and D. Landsman, Histone Sequence Database: A Compilation of Highly- Conserved Nucleoprotein Sequences. Nucleic Acids Research, 1996. 24(1): p. 245-247.

124

26. Egger, G., et al., Epigenetics in human disease and prospects for epigenetic therapy. Nature, 2004. 429(6990): p. 457-463. 27. Yoo, C.B. and P.A. Jones, Epigenetic therapy of cancer: past, present and future. Nature Reviews Drug Discovery, 2006. 5(1): p. 37. 28. Yang, X., et al., Targeting DNA methylation for epigenetic therapy. Trends in Pharmacological Sciences, 2010. 31(11): p. 536-546. 29. Kelly, A.D. and J.-P.J. Issa, The promise of epigenetic therapy: reprogramming the cancer epigenome. Current Opinion in Genetics & Development, 2017. 42: p. 68-77. 30. Wouters, B.J. and R. Delwel, Epigenetics and approaches to targeted epigenetic therapy in acute myeloid leukemia. Blood, The Journal of the American Society of Hematology, 2016. 127(1): p. 42-52. 31. Ambler, R. and M. Rees, ɛ-N-Methyl-lysine in bacterial flagellar protein. Nature, 1959. 184(4679): p. 56-57. 32. Murray, K., The occurrence of iε-N-methyl lysine in histones. Biochemistry, 1964. 3(1): p. 10- 15. 33. Stocker, B., M. McDonough, and R. Ambler, A gene determining presence or absence of ɛ-N- methyl-lysine in Salmonella flagellar protein. Nature, 1961. 189(4764): p. 556-558. 34. Allfrey, V., R. Faulkner, and A. Mirsky, Acetylation and methylation of histones and their possible role in the regulation of RNA synthesis. Proceedings of the National Academy of Sciences, 1964. 51(5): p. 786-794. 35. Kim, S. and W.K. Paik, Studies on the Origin of ε-N-Methyl-l-lysine in Protein. Journal of Biological Chemistry, 1965. 240(12): p. 4629-4634. 36. Shen, E.C., et al., Arginine methylation facilitates the nuclear export of hnRNP proteins. Genes & Development, 1998. 12(5): p. 679-691. 37. Chen, D., et al., Regulation of transcription by a protein methyltransferase. Science, 1999. 284(5423): p. 2174-2177. 38. Strahl, B.D., et al., Methylation of histone H3 at lysine 4 is highly conserved and correlates with transcriptionally active nuclei in Tetrahymena. Proceedings of the National Academy of Sciences, 1999. 96(26): p. 14967-14972. 39. Strahl, B.D. and C.D. Allis, The language of covalent histone modifications. Nature, 2000. 403(6765): p. 41-45. 40. Lachner, M., et al., Methylation of histone H3 lysine 9 creates a binding site for HP1 proteins. Nature, 2001. 410(6824): p. 116-120. 41. Bannister, A.J., et al., Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain. Nature, 2001. 410(6824): p. 120-124. 42. Nielsen, P.R., et al., Structure of the HP1 chromodomain bound to histone H3 methylated at lysine 9. Nature, 2002. 416(6876): p. 103-107. 43. Jacobs, S.A. and S. Khorasanizadeh, Structure of HP1 chromodomain bound to a lysine 9- methylated histone H3 tail. Science, 2002. 295(5562): p. 2080-2083. 44. Hu, M., et al., Histone H3 lysine 36 methyltransferase Hypb/Setd2 is required for embryonic vascular remodeling. Proceedings of the National Academy of Sciences, 2010. 107(7): p. 2956- 2961. 45. Shinkai, Y. and M. Tachibana, H3K9 methyltransferase G9a and the related molecule GLP. Genes & Development, 2011. 25(8): p. 781-788. 46. Bledau, A.S., et al., The H3K4 methyltransferase Setd1a is first required at the epiblast stage, whereas Setd1b becomes essential after gastrulation. Development, 2014. 141(5): p. 1022- 1035. 47. Dodge, J.E., et al., Histone H3-K9 methyltransferase ESET is essential for early development. Molecular and Cellular Biology, 2004. 24(6): p. 2478-2486. 48. Michael, C.Y., et al., Arginine methyltransferase affects interactions and recruitment of mRNA processing and export factors. Genes & Development, 2004. 18(16): p. 2024-2035. 49. Vagin, V.V., et al., Proteomic analysis of murine Piwi proteins reveals a role for arginine methylation in specifying interaction with Tudor family members. Genes & Development, 2009. 23(15): p. 1749-1762.

125

50. Shi, X., et al., Modulation of p53 function by SET8-mediated methylation at lysine 382. Molecular Cell, 2007. 27(4): p. 636-646. 51. Jansson, M., et al., Arginine methylation regulates the p53 response. Nature Cell Biology, 2008. 10(12): p. 1431-1439. 52. Biggar, K.K. and S.S.-C. Li, Non-histone protein methylation as a regulator of cellular signalling and function. Nature Reviews Molecular Cell Biology, 2015. 16(1): p. 5-17. 53. Rathert, P., et al., Protein lysine methyltransferase G9a acts on non-histone targets. Nature Chemical Biology, 2008. 4(6): p. 344-346. 54. Phillips, D., The presence of acetyl groups in histones. Biochemical Journal, 1963. 87(2): p. 258. 55. Hebbes, T.R., A.W. Thorne, and C. Crane-Robinson, A direct link between core histone acetylation and transcriptionally active chromatin. The EMBO Journal, 1988. 7(5): p. 1395- 1402. 56. Johnson, L.M., et al., Genetic evidence for an interaction between SIR3 and histone H4 in the repression of the silent mating loci in Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences, 1990. 87(16): p. 6286-6290. 57. Megee, P.C., et al., Genetic analysis of histone H4: essential role of lysines subject to reversible acetylation. Science, 1990. 247(4944): p. 841-845. 58. Kim, S.C., et al., Substrate and functional diversity of lysine acetylation revealed by a proteomics survey. Molecular Cell, 2006. 23(4): p. 607-618. 59. Choudhary, C., et al., Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science, 2009. 325(5942): p. 834-840. 60. Gu, W. and R.G. Roeder, Activation of p53 sequence-specific DNA binding by acetylation of the p53 C-terminal domain. Cell, 1997. 90(4): p. 595-606. 61. Chen, L.-f., et al., Duration of nuclear NF-κB action regulated by reversible acetylation. Science, 2001. 293(5535): p. 1653-1657. 62. Hilfiker, A., et al., mof, a putative acetyl transferase gene related to the Tip60 and MOZ human genes and to the SAS genes of yeast, is required for dosage compensation in Drosophila. The EMBO Journal, 1997. 16(8): p. 2054-2060. 63. Reifsnyder, C., et al., Yeast SAS silencing genes and human genes associated with AML and HIV–1 Tat interactions are homologous with acetyltransferases. Nature Genetics, 1996. 14(1): p. 42-49. 64. Xue, Y., et al., NURD, a novel complex with both ATP-dependent chromatin-remodeling and histone deacetylase activities. Molecular Cell, 1998. 2(6): p. 851-861. 65. Imai, S.-I., et al., Transcriptional silencing and longevity protein Sir2 is an NAD-dependent histone deacetylase. Nature, 2000. 403(6771): p. 795-800. 66. North, B.J., et al., The human Sir2 ortholog, SIRT2, is an NAD+-dependent tubulin deacetylase. Molecular Cell, 2003. 11(2): p. 437-444. 67. Zhang, T., S. Cooper, and N. Brockdorff, The interplay of histone modifications–writers that read. EMBO Reports, 2015. 16(11): p. 1467-1481. 68. Khare, S.P., et al., HIstome—a relational knowledgebase of human histone proteins and histone modifying enzymes. Nucleic Acids Research, 2011. 40(D1): p. D337-D342. 69. Dillon, S.C., et al., The SET-domain protein superfamily: protein lysine methyltransferases. Genome Biology, 2005. 6(8): p. 227-227. 70. Xiao, B., J.R. Wilson, and S.J. Gamblin, SET domains and histone methylation. Current Opinion in Structural Biology, 2003. 13(6): p. 699-705. 71. Jenuwein, T., et al., SET domain proteins modulate chromatin domains in eu-and heterochromatin. Cellular and Molecular Life Sciences CMLS, 1998. 54(1): p. 80-93. 72. Berman, H., K. Henrick, and H. Nakamura, Announcing the worldwide protein data bank. Nature Structural & Molecular Biology, 2003. 10(12): p. 980. 73. Xiao, B., et al., Structure and catalytic mechanism of the human histone methyltransferase SET7/9. Nature, 2003. 421(6923): p. 652-6. 74. Horowitz, S., et al., Conservation and functional importance of carbon-oxygen hydrogen bonding in AdoMet-dependent methyltransferases. Journal of the American Chemical Society, 2013. 135(41): p. 15536-48.

126

75. Filippakopoulos, P., et al., Selective inhibition of BET bromodomains. Nature, 2010. 468(7327): p. 1067. 76. Sanchez, R., J. Meslamani, and M.-M. Zhou, The bromodomain: from epigenome reader to druggable target. Biochimica et Biophysica Acta-Gene Regulatory Mechanisms, 2014. 1839(8): p. 676-685. 77. Müller, P., et al., Identification of JAK/STAT signalling components by genome-wide RNA interference. Nature, 2005. 436(7052): p. 871. 78. Farhang-Fallah, J., et al., The pleckstrin homology (PH) domain-interacting protein couples the insulin receptor substrate 1 PH domain to insulin signaling pathways leading to mitogenesis and GLUT4 translocation. Molecular Cell Biology, 2002. 22(20): p. 7325-36. 79. Wassarman, D.A. and F. Sauer, TAFII250: a transcription toolbox. Journal of Cell Science, 2001. 114(16): p. 2895-2902. 80. Tamkun, J.W., et al., brahma: A regulator of Drosophila homeotic genes structurally related to the yeast transcriptional activator SNF 2 SWI 2. Cell, 1992. 68(3): p. 561-572. 81. Filippakopoulos, P., et al., Histone recognition and large-scale structural analysis of the human bromodomain family. Cell, 2012. 149(1): p. 214-231. 82. Vollmuth, F., W. Blankenfeldt, and M. Geyer, Structures of the dual bromodomains of the P- TEFb-activating protein Brd4 at atomic resolution. Journal of Biological Chemistry, 2009. 284(52): p. 36547-56. 83. Kaimori, J.-Y., et al., Histone H4 lysine 20 acetylation is associated with gene repression in human cells. Scientific Reports, 2016. 6(1): p. 24318. 84. Kalashnikova, A.A., et al., The role of the nucleosome acidic patch in modulating higher order chromatin structure. Journal of the Royal Society Interface, 2013. 10(82): p. 20121022. 85. Pérez-Salvia, M. and M. Esteller, Bromodomain inhibitors and cancer therapy: From structures to applications. Epigenetics, 2017. 12(5): p. 323-339. 86. Sharma, A., et al., BET proteins promote efficient murine leukemia virus integration at transcription start sites. Proceedings of the National Academy of Sciences, 2013. 110(29): p. 12036-12041. 87. Wai, D.C., et al., The BRD3 ET domain recognizes a short peptide motif through a mechanism that is conserved across chromatin remodelers and transcriptional regulators. Journal of Biological Chemistry, 2018. 293(19): p. 7160-7175. 88. Houzelstein, D., et al., Growth and early postimplantation defects in mice deficient for the bromodomain-containing protein Brd4. Molecular and Cellular Biology, 2002. 22(11): p. 3794- 3802. 89. Shang, E., et al., Double bromodomain-containing gene Brd2 is essential for embryonic development in mouse. Developmental Dynamics, 2009. 238(4): p. 908-917. 90. Pivot-Pajot, C., et al., Acetylation-dependent chromatin reorganization by BRDT, a testis- specific bromodomain-containing protein. Molecular and Cellular Biology, 2003. 23(15): p. 5354-5365. 91. Stathis, A. and F. Bertoni, BET proteins as targets for anticancer treatment. Cancer Discovery, 2018. 8(1): p. 24-36. 92. Roe, J.-S., et al., BET bromodomain inhibition suppresses the function of hematopoietic transcription factors in acute myeloid leukemia. Molecular Cell, 2015. 58(6): p. 1028-1039. 93. Shi, J. and C.R. Vakoc, The mechanisms behind the therapeutic activity of BET bromodomain inhibition. Molecular Cell, 2014. 54(5): p. 728-736. 94. Devaiah, B.N., et al., BRD4 is an atypical kinase that phosphorylates serine2 of the RNA polymerase II carboxy-terminal domain. Proceedings of the National Academy of Sciences, 2012. 109(18): p. 6927-6932. 95. Unoki, M., et al., Lysyl 5-hydroxylation, a novel histone modification, by Jumonji domain containing 6 (JMJD6). Journal of Biological Chemistry, 2013. 288(9): p. 6053-6062. 96. Christophorou, M.A., et al., Citrullination regulates pluripotency and histone H1 binding to chromatin. Nature, 2014. 507(7490): p. 104-108. 97. Huang, Z., et al., HEMD: an integrated tool of human epigenetic enzymes and chemical modulators for therapeutics. PLoS One, 2012. 7(6): p. e39917.

127

98. Shu, S., et al., Response and resistance to BET bromodomain inhibitors in triple-negative breast cancer. Nature, 2016. 529(7586): p. 413. 99. Asangani, I.A., et al., Therapeutic targeting of BET bromodomain proteins in castration- resistant prostate cancer. Nature, 2014. 510(7504): p. 278. 100. Rodriguez, R., et al., Aberrant epigenetic regulation of bromodomain BRD4 in human colon cancer. Journal of Molecular Medicine, 2012. 90(5): p. 587-595. 101. Lu, Q., et al., Epigenetics, disease, and therapeutic interventions. Ageing Research Reviews, 2006. 5(4): p. 449-467. 102. Feinberg, A.P., Phenotypic plasticity and the epigenetics of human disease. Nature, 2007. 447(7143): p. 433. 103. Jiang, Y.-h., J. Bressler, and A.L. Beaudet, Epigenetics and human disease. Annual Review of Genomics and Human Genetics, 2004. 5: p. 479-510. 104. Esteller, M., Epigenetics in cancer. New England Journal of Medicine, 2008. 358(11): p. 1148- 1159. 105. Sharma, S., T.K. Kelly, and P.A. Jones, Epigenetics in cancer. Carcinogenesis, 2010. 31(1): p. 27- 36. 106. Simon, J.A. and C.A. Lange, Roles of the EZH2 histone methyltransferase in cancer epigenetics. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 2008. 647(1- 2): p. 21-29. 107. Laird, P.W. and R. Jaenisch, The role of DNA methylation in cancer genetics and epigenetics. Annual Review of Genetics, 1996. 30(1): p. 441-464. 108. Kantor, B., et al., Establishing the epigenetic status of the Prader–Willi/Angelman imprinting center in the gametes and embryo. Human Molecular Genetics, 2004. 13(22): p. 2767-2779. 109. Batista, I.d.A.A. and L.A. Helguero, Biological processes and signal transduction pathways regulated by the protein methyltransferase SETD7 and their significance in cancer. Signal Transduction and Targeted Therapy, 2018. 3(1): p. 1-14. 110. Cho, H.-S., et al., Demethylation of RB regulator MYPT1 by histone demethylase LSD1 promotes cell cycle progression in cancer cells. Cancer Research, 2011. 71(3): p. 655-660. 111. Durinck, K., et al. BRD3 as a specific vulnerable therapeutic target in neuroblastoma. in AACR Annual Meeting 2017. 2017. Washington, DC: AACR. 112. Li, F., et al., p21-activated kinase 1 interacts with and phosphorylates histone H3 in breast cancer cells. EMBO Reports, 2002. 3(8): p. 767-773. 113. Henshall, D.C. and K. Kobow, Epigenetics and epilepsy. Cold Spring Harbor Perspectives in Medicine, 2015. 5(12): p. a022731. 114. Hahnen, E., et al., Histone deacetylase inhibitors: possible implications for neurodegenerative disorders. Expert Opinion on Investigational Drugs, 2008. 17(2): p. 169-84. 115. Machado-Vieira, R., L. Ibrahim, and C.A. Zarate, Jr., Histone deacetylases and mood disorders: epigenetic programming in gene-environment interactions. CNS Neuroscience & Therapeutics, 2011. 17(6): p. 699-704. 116. Zuber, J., et al., RNAi screen identifies Brd4 as a therapeutic target in acute myeloid leukaemia. Nature, 2011. 478(7370): p. 524. 117. Crawford, N.P., et al., Bromodomain 4 activation predicts breast cancer survival. Proceedings of the National Academy of Sciences, 2008. 105(17): p. 6380-6385. 118. Delmore, J.E., et al., BET bromodomain inhibition as a therapeutic strategy to target c-Myc. Cell, 2011. 146(6): p. 904-917. 119. Alqahtani, A., et al., Bromodomain and extra-terminal motif inhibitors: a review of preclinical and clinical advances in cancer therapy. Future Science OA, 2019. 5(3): p. FSO372-FSO372. 120. Ganesan, A., Multitarget drugs: an epigenetic epiphany. ChemMedChem, 2016. 11(12): p. 1227-1241. 121. Subramanian, S., et al., Clinical toxicities of histone deacetylase inhibitors. Pharmaceuticals, 2010. 3(9): p. 2751-2767. 122. Alvarez, R., et al., Epigenetic multiple modulators. Current Topics in Medicinal Chemistry, 2011. 11(22): p. 2749-2787.

128

123. Zheng, Y.C., et al., A systematic review of histone lysine-specific demethylase 1 and its inhibitors. Medicinal Research Reviews, 2015. 35(5): p. 1032-1071. 124. Rosenfeld, L., et al., Protein engineering by combined computational and in vitro evolution approaches. Trends in Biochemical Sciences, 2016. 41(5): p. 421-433. 125. Hall, B.G., Experimental evolution of a new enzymatic function. II. Evolution of multiple functions for ebg enzyme in E. coli. Genetics, 1978. 89(3): p. 453-465. 126. Richardson, J.S. and D.C. Richardson, The de novo design of protein structures. Trends in Biochemical Sciences, 1989. 14(7): p. 304-309. 127. Chenna, R., et al., Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Research, 2003. 31(13): p. 3497-3500. 128. Sievers, F. and D.G. Higgins, Clustal Omega, accurate alignment of very large numbers of sequences, in Multiple Sequence Alignment Methods. 2014, Springer. p. 105-116. 129. Kelley, L.A. and M.J. Sternberg, Protein structure prediction on the web: a case study using the Phyre server. Nature Protocols, 2009. 4(3): p. 363. 130. Cirino, P.C., K.M. Mayer, and D. Umeno, Generating mutant libraries using error-prone PCR, in Directed Evolution Library Creation. 2003, Springer. p. 3-9. 131. Arnold, K., et al., The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics, 2006. 22(2): p. 195-201. 132. Huang, P.-S., S.E. Boyken, and D. Baker, The coming of age of de novo protein design. Nature, 2016. 537(7620): p. 320-327. 133. Sera, T., Zinc-finger-based artificial transcription factors and their applications. Advanced Drug Delivery Reviews, 2009. 61(7): p. 513-526. 134. Ansari, A.Z. and A.K. Mapp, Modular design of artificial transcription factors. Current Opinion in Chemical Biology, 2002. 6(6): p. 765-772. 135. Bae, K.-H., et al., Human zinc fingers as building blocks in the construction of artificial transcription factors. Nature Biotechnology, 2003. 21(3): p. 275-280. 136. Bogdanove, A.J. and D.F. Voytas, TAL effectors: customizable proteins for DNA targeting. Science, 2011. 333(6051): p. 1843-6. 137. Bultmann, S., et al., Targeted transcriptional activation of silent oct4 pluripotency gene by combining designer TALEs and inhibition of epigenetic modifiers. Nucleic Acids Research, 2012. 40(12): p. 5368-77. 138. Cromm, P.M. and C.M. Crews, Targeted protein degradation: from chemical biology to drug discovery. Cell Chemical Biology, 2017. 24(9): p. 1181-1190. 139. Rankin, C.H., A review of transgenerational epigenetics for RNAi, longevity, germline maintenance and olfactory imprinting in Caenorhabditis elegans. Journal of Experimental Biology, 2015. 218(1): p. 41-49. 140. Bolotin, A., et al., Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology, 2005. 151(8): p. 2551-2561. 141. Jinek, M., et al., A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science, 2012. 337(6096): p. 816-821. 142. Mali, P., et al., RNA-guided engineering via Cas9. Science, 2013. 339(6121): p. 823-826. 143. Pelletier, S., CRISPR-Cas systems for the study of immune function. Encyclopedia of Life Sciences, 2001: p. 1-11. 144. Choudhury, S.R., et al., CRISPR-dCas9 mediated TET1 targeting for selective DNA demethylation at BRCA1 promoter. Oncotarget, 2016. 7(29): p. 46545-46556. 145. Liszczak, G.P., et al., Genomic targeting of epigenetic probes using a chemically tailored Cas9 system. Proceedings of the National Academy of Sciences, 2017. 114(4): p. 681-686. 146. Thakore, P.I., et al., Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nature Methods, 2015. 12(12): p. 1143-1149. 147. Zhang, X.-H., et al., Off-target effects in CRISPR/Cas9-mediated genome engineering. Molecular Therapy-Nucleic Acids, 2015. 4: p. e264.

129

148. Beerli, R.R., et al., Toward controlling gene expression at will: specific regulation of the erbB-2 HER-2 promoter by using polydactyl zinc finger proteins constructed from modular building blocks. Proceedings of the National Academy of Sciences, 1998. 95(25): p. 14628-14633. 149. Blancafort, P. and A.S. Beltran, Rational design, selection and specificity of artificial transcription factors (ATFs): the influence of chromatin in target gene regulation. Combinatorial Chemistry & High Throughput Screening, 2008. 11(2): p. 146-158. 150. Meckler, J.F., et al., Quantitative analysis of TALE–DNA interactions suggests polarity effects. Nucleic Acids Research, 2013. 41(7): p. 4118-4128. 151. Maurer, H.H. and F.T. Peters, Toward high-throughput drug screening using mass spectrometry. Therapeutic Drug Monitoring, 2005. 27(6): p. 686-688. 152. Bleicher, K.H., et al., A guide to drug discovery: hit and lead generation: beyond high- throughput screening. Nature Reviews Drug discovery, 2003. 2(5): p. 369. 153. Broach, J.R. and J. Thorner, High-throughput screening for drug discovery. Nature, 1996. 384(6604): p. 14-16. 154. Hajduk, P.J. and J. Greer, A decade of fragment-based drug design: strategic advances and lessons learned. Nature Reviews Drug Discovery, 2007. 6(3): p. 211. 155. Murray, C.W. and T.L. Blundell, Structural biology in fragment-based drug design. Current Opinion in Structural Biology, 2010. 20(4): p. 497-507. 156. Murray, C.W. and D.C. Rees, The rise of fragment-based drug discovery. Nature Chemistry, 2009. 1(3): p. 187. 157. Woodmansey, J.N., Towards the purification and structural optimisation of demibodies, in School of Molecular Bioscience. 2015, University of Sydney. 158. Cai, M., et al., An efficient and cost-effective isotope labeling protocol for proteins expressed in shape Escherichia coli. Journal of Biomolecular NMR, 1998. 11(1): p. 97-102. 159. Chung, C.T. and R.H. Miller, Preparation and storage of competent Escherichia coli cells, in Methods in Enzymology. 1993, Elsevier. p. 621-627. 160. Studier, F.W., Protein production by auto-induction in high-density shaking cultures. Protein Expression and Purification, 2005. 41(1): p. 207-234. 161. McPhillips, T.M., et al., Blu-Ice and the distributed control system: software for data acquisition and instrument control at macromolecular crystallography beamlines. Journal of Synchrotron Radiation, 2002. 9(6): p. 401-406. 162. Hu, Q., NMR approaches to determine protein structure, in Molecular Life Sciences: An Encyclopedic Reference, E. Bell, Editor. 2014, Springer New York: New York, NY. p. 1-9. 163. Bax, A. and M. Ikura, An efficient 3D NMR technique for correlating the proton and 15N backbone amide resonances with the alpha-carbon of the preceding residue in uniformly 15N/13C enriched proteins. Journal of Biomolecular NMR, 1991. 1(1): p. 99-104. 164. Van Zundert, G., et al., The HADDOCK2. 2 web server: user-friendly integrative modeling of biomolecular complexes. Journal of Molecular Biology, 2016. 428(4): p. 720-725. 165. Mohanty, B., et al., Determination of ligand binding modes in weak protein–ligand complexes using sparse NMR data. Journal of Biomolecular NMR, 2016. 66(3): p. 195-208. 166. Schrödinger. The PyMOL molecular graphics system, version 2.0. 167. Advanced Centre for Treatment, R.a.E.i.C. HIstome: The Histone Infobase. 2011 [cited 2016 04/04]; Available from: http://www.actrec.gov.in/histome/. 168. Galonska, C., et al., Genome-wide tracking of dCas9-methyltransferase footprints. Nature Communications, 2018. 9: p. 9. 169. Vojta, A., et al., Repurposing the CRISPR-Cas9 system for targeted DNA methylation. Nucleic Acids Research, 2016. 44(12): p. 5615-5628. 170. McDonald, J.I., et al., Reprogrammable CRISPR/Cas9-based system for inducing site-specific DNA methylation. Biology Open, 2016. 5(6): p. 866-874. 171. Stepper, P., et al., Efficient targeted DNA methylation with chimeric dCas9-Dnmt3a-Dnmt3L methyltransferase. Nucleic Acids Research, 2016. 45(4): p. 1703-1713. 172. The UniProt Consortium, UniProt: the universal protein knowledgebase. Nucleic Acids Research, 2017. 45(D1): p. D158-D169.

130

173. Wilson, J.R., et al., Crystal structure and functional analysis of the histone methyltransferase SET7/9. Cell, 2002. 111(1): p. 105-15. 174. Kwon, T., et al., Mechanism of histone lysine methyl transfer revealed by the structure of SET7/9-AdoMet. The EMBO Journal, 2003. 22(2): p. 292-303. 175. Keating, S. and A. El-Osta, Transcriptional regulation by the Set7 lysine methyltransferase. Epigenetics, 2013. 8(4): p. 361-372. 176. Zegerman, P., et al., Histone H3 lysine 4 methylation disrupts binding of nucleosome remodeling and deacetylase (NuRD) repressor complex. Journal of Biological Chemistry, 2002. 277(14): p. 11621-11624. 177. Zhang, Y., et al., Evolving catalytic properties of the MLL family SET domain. Structure, 2015. 178. Guo, C., et al., KMT2D maintains neoplastic cell proliferation and global histone H3 lysine 4 monomethylation. Oncotarget, 2013. 4(11): p. 2144-53. 179. Ortega-Molina, A., et al., The histone lysine methyltransferase KMT2D sustains a gene expression program that represses B cell lymphoma development. Nature Medicine, 2015. 21(10): p. 1199-1208. 180. Zhang, J., et al., Disruption of KMT2D perturbs germinal center B cell development and promotes lymphomagenesis. Nature Medicine, 2015. 21(10): p. 1190-1198. 181. Yuan, H., et al., MYST protein acetyltransferase activity requires active site lysine autoacetylation. The EMBO Journal, 2012. 31(1): p. 58-70. 182. Sharma, G.G., et al., MOF and histone H4 acetylation at lysine 16 are critical for DNA damage response and double-strand break repair. Molecular and Cellular Biology, 2010. 30(14): p. 3582-3595. 183. Shogren-Knaak, M., et al., Histone H4-K16 acetylation controls chromatin structure and protein interactions. Science, 2006. 311(5762): p. 844-7. 184. Wu, H., Tempel, W., Dombrovski, L., Loppnau, P., Weigelt, J., Sundstrom, M., Arrowsmith, C.H., Edwards, A.M., Bochkarev, A., Plotnikov, A.N., MYST histone acetyltransferase 1. 2007, http://www.rcsb.org/structure/2PQ8. 185. Keller, B.O., et al., Interferences and contaminants encountered in modern mass spectrometry. Analytica chimica acta, 2008. 627(1): p. 71-81. 186. Zhong, Y., et al., CHD4 slides nucleosomes by decoupling entry- and exit-side DNA translocation. Nature Communications, 2020. 11(1): p. 1519. 187. Luger, K., T.J. Rechsteiner, and T.J. Richmond, Expression and purification of recombinant histones and nucleosome reconstitution, in Chromatin Protocols. 1999, Springer. p. 1-16. 188. Fick, R.J., et al., Sulfur-oxygen chalcogen bonding mediates AdoMet recognition in the lysine methyltransferase SET7/9. ACS Chemical Biology, 2016. 11(3): p. 748-754. 189. Ashkenazy, H., et al., ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Research, 2016. 44(W1): p. W344- W350. 190. Altschul, S.F., et al., Basic local alignment search tool. Journal of Molecular Biology, 1990. 215(3): p. 403-10. 191. Subramanian, K., et al., Regulation of estrogen receptor alpha by the SET7 lysine methyltransferase. Molecular Cell, 2008. 30(3): p. 336-47. 192. Ghosh, I., A.D. Hamilton, and L. Regan, Antiparallel leucine zipper-directed protein reassembly: application to the green fluorescent protein. Journal of the American Chemical Society, 2000. 122(23): p. 5658-5659. 193. Porter, J.R., et al., Seeing genetic and epigenetic information without DNA denaturation using sequence-enabled reassembly (SEER), in Engineered Zinc Finger Proteins. 2010, Springer. p. 365-382. 194. Nomura, W. and C.F. Barbas, In vivo site-specific DNA methylation with a designed sequence- enabled DNA methylase. Journal of the American Chemical Society, 2007. 129(28): p. 8676- 8677. 195. Ooi, A.T., et al., Sequence-enabled reassembly of β-lactamase (SEER-LAC): A sensitive method for the detection of double-stranded DNA. Biochemistry, 2006. 45(11): p. 3620-3625.

131

196. Nallamsetty, S. and D.S. Waugh, Solubility-enhancing proteins MBP and NusA play a passive role in the folding of their fusion partners. Protein Expression and Purification, 2006. 45(1): p. 175-182. 197. O'Geen, H., et al., A genome-wide analysis of Cas9 binding specificity using ChIP-seq and targeted sequence capture. Nucleic Acids Research, 2015. 43(6): p. 3389-3404. 198. Deveau, H., J.E. Garneau, and S. Moineau, CRISPR/Cas system and its role in phage-bacteria interactions. Annual Review of Microbiology, 2010. 64: p. 475-493. 199. Dominguez, A.A., W.A. Lim, and L.S. Qi, Beyond editing: repurposing CRISPR-Cas9 for precision genome regulation and interrogation. Nature Reviews Molecular Cell Biology, 2016. 17(1): p. 5-15. 200. Mitri, Z., T. Constantine, and R. O'Regan, The HER2 receptor in breast cancer: pathophysiology, clinical use, and new advances in therapy. Chemotherapy Research and Practice, 2012. 2012: p. 743193-743193. 201. Huang, Y.-H., et al., DNA epigenome editing using CRISPR-Cas Suntag-directed DNMT3A. Blood, 2016. 128(22): p. 2707-2707. 202. Wacker, M.J. and M.P. Godard, Analysis of one-step and two-step real-time RT-PCR using SuperScript III. Journal of Biomolecular Techniques, 2005. 16(3): p. 266-271. 203. Lejon, S., et al., Insights into association of the NuRD complex with FOG-1 from the crystal structure of an RbAp48.FOG-1 complex. Journal of Biological Chemistry, 2011. 286(2): p. 1196- 203. 204. O’Geen, H., et al., dCas9-based epigenome editing suggests acquisition of histone methylation is not sufficient for target gene repression. Nucleic Acids Research, 2017. 45(17): p. 9901-9916. 205. Mori, R., et al., Both β-actin and GAPDH are useful reference genes for normalization of quantitative RT-PCR in human FFPE tissue samples of prostate cancer. The Prostate, 2008. 68(14): p. 1555-1560. 206. Martin-Ruiz, C.M., et al., Reproducibility of telomere length assessment: an international collaborative study. International Journal of Epidemiology, 2014. 44(5): p. 1673-1683. 207. Niepel, M., et al., A multi-center study on the reproducibility of drug-response assays in mammalian cell lines. Cell Systems, 2019. 9(1): p. 35-48. e5. 208. Buehring, G.C., E.A. Eby, and M.J. Eby, Cell line cross-contamination: how aware are Mammalian cell culturists of the problem and how to monitor it? In Vitro Cellular & Developmental Biology-Animal, 2004. 40(7): p. 211-215. 209. Geraghty, R.J., et al., Guidelines for the use of cell lines in biomedical research. British Journal of Cancer, 2014. 111(6): p. 1021-1046. 210. French, C.A., Pathogenesis of NUT midline carcinoma. Annual Review of Pathology: Mechanisms of Disease, 2012. 7: p. 247-265. 211. Dawson, M.A., et al., Inhibition of BET recruitment to chromatin as an effective treatment for MLL-fusion leukaemia. Nature, 2011. 478(7370): p. 529. 212. Adams, L.A., et al., REFiL: Rapid Elaboration of Fragments into Leads applied to BRD3-extra terminal domain. In preparation. 213. Kwan, A.H., et al., Macromolecular NMR spectroscopy for the non-spectroscopist. The FEBS Journal, 2011. 278(5): p. 687-703. 214. Rong, L., et al., Protein crystallization by using porous glass substrate. Journal of Synchrotron Radiation, 2003. 11(1): p. 27-29. 215. D'Arcy, A., A. Mac Sweeney, and A. Haber, Using natural seeding material to generate nucleation in protein crystallization experiments. Acta Crystallographica Section D: Biological Crystallography, 2003. 59(7): p. 1343-1346. 216. Thakur, A.S., et al., Improved success of sparse matrix protein crystallization screening with heterogeneous nucleating agents. PLoS One, 2007. 2(10): p. e1091. 217. Williamson, M.P., Using chemical shift perturbation to characterise ligand binding. Progress in Nuclear Magnetic Resonance Spectroscopy, 2013. 73: p. 1-16. 218. GraphPad. GraphPad Prism version 8.0.0 for Windows. 219. Lee, W., M. Tonelli, and J.L. Markley, NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics, 2014. 31(8): p. 1325-1327.

132

220. Eisenberg, D., et al., Amino acid scale: normalized consensus hydrophobicity scale. Journal of Molecular Biology, 1984. 179: p. 125-142. 221. Moore, J.M., NMR screening in drug discovery. Current Opinion in Biotechnology, 1999. 10(1): p. 54-58. 222. Rees, D.C., et al., Fragment-based lead discovery. Nature Reviews Drug Discovery, 2004. 3(8): p. 660. 223. Chayen, N.E., Turning protein crystallisation from an art into a science. Current Opinion in Structural Biology, 2004. 14(5): p. 577-583. 224. McPherson, A., Two approaches to the rapid screening of crystallization conditions. Journal of Crystal Growth, 1992. 122(1-4): p. 161-167. 225. Byrne, N. and C.A. Angell, The solubility of hen lysozyme in ethylammonium nitrate/H2O mixtures and a novel approach to protein crystallization. Molecules, 2010. 15(2): p. 793-803. 226. Kobe, B., Personal communication. 2019. 227. Kim, Y., et al., Large-scale evaluation of protein reductive methylation for improving protein crystallization. Nature Methods, 2008. 5(10): p. 853-854. 228. Chayen, N.E., P.D.S. Stewart, and D.M. Blow, Microbatch crystallization under oil—a new technique allowing many small-volume crystallization trials. Journal of Crystal Growth, 1992. 122(1-4): p. 176-180. 229. Brumshtein, B., et al., Control of the rate of evaporation in protein crystallization by themicrobatch under oil'method. Journal of Applied Crystallography, 2008. 41(5): p. 969-971. 230. Erlanson, D.A., R.S. McDowell, and T. O'Brien, Fragment-based drug discovery. Journal of Medicinal Chemistry, 2004. 47(14): p. 3463-3482. 231. Puissant, A., et al., Targeting MYCN in neuroblastoma by BET bromodomain inhibition. Cancer Discovery, 2013. 3(3): p. 308-323. 232. Tiago, M., et al., Targeting BRD/BET proteins inhibits adaptive kinome upregulation and enhances the effects of BRAF/MEK inhibitors in melanoma. British Journal of Cancer, 2020. 122(6): p. 789-800. 233. Salic, A. and T.J. Mitchison, A chemical method for fast and sensitive detection of DNA synthesis in vivo. Proceedings of the National Academy of Sciences, 2008. 105(7): p. 2415- 2420. 234. M. Reis, J., B. Sinko, and C. H.R. Serra, Parallel artificial membrane permeability assay (PAMPA) - is it better than Caco-2 for human passive permeability prediction? Mini Reviews in Medicinal Chemistry, 2010. 10(11): p. 1071-1076. 235. van Breemen, R.B. and Y. Li, Caco-2 cell permeability assays to measure drug absorption. Expert Opinion on Drug Metabolism & Toxicology, 2005. 1(2): p. 175-185. 236. Avdeef, A., The rise of PAMPA. Expert Opinion on Drug Metabolism & Toxicology, 2005. 1(2): p. 325-342. 237. Vrbanac, J. and R. Slauter, Chapter 3 - ADME in drug discovery, in A Comprehensive Guide to Toxicology in Nonclinical Drug Development (Second Edition), A.S. Faqi, Editor. 2017, Academic Press: Boston. p. 39-67. 238. Nebbioso, A., et al., Cancer epigenetics: moving forward. PLoS Genetics, 2018. 14(6): p. e1007362. 239. Zhou, D., et al., A label-free and enzyme-free aptasensor for visual Cd2+ detection based on split DNAzyme fragments. Analytical Methods, 2019. 11(28): p. 3546-3551. 240. Zhou, L., et al., Tandem reassembly of split luciferase-DNA chimeras for bioluminescent detection of attomolar circulating microRNAs using a smartphone. Biosensors and Bioelectronics, 2021. 173: p. 112824. 241. Bezerra, A.B., A.S.N. Kurian, and C.J. Easley, Nucleic-Acid Driven Cooperative Bioassays Using Probe Proximity or Split-Probe Techniques. Analytical Chemistry, 2021. 93(1): p. 198-214. 242. Levray, Y.S., A.D. Berhe, and A.R. Osborne, Use of split-dihydrofolate reductase for the detection of protein-protein interactions and simultaneous selection of multiple plasmids in Plasmodium falciparum. Molecular and Biochemical Parasitology, 2020. 238: p. 111292.

133

243. Purde, V., et al., Intein-mediated cytoplasmic reconstitution of a split toxin enables selective cell ablation in mixed populations and tumor xenografts. Proceedings of the National Academy of Sciences, 2020. 117(36): p. 22090.

134

Appendices

Appendix 1: Gene Sequences Used for Protein Expression

DNA sequences are presented including restriction sites used to insert genes into the expression vectors detailed in Section 2.2.2.

SETD7

GCGCGCGGATCCATGTACAAGGACAACATCCGTCACGGCGTTTGCTGGATTTACTATCCGGA CGGTGGCAGCCTGGTGGGTGAGGTTAACGAAGATGGCGAGATGACCGGCGAAAAAATCGCGT ACGTGTATCCGGACGAGCGTACCGCGCTGTACGGCAAGTTCATCGATGGCGAAATGATTGAG GGTAAACTGGCGACCCTGATGAGCACCGAGGAAGGCCGTCCGCACTTCGAACTGATGCCGGG TAACAGCGTTTATCACTTTGACAAGAGCACCAGCAGCTGCATCAGCACCAACGCGCTGCTGC CGGACCCGTACGAGAGCGAACGTGTGTATGTTGCGGAAAGCCTGATTAGCAGCGCGGGCGAG GGCCTGTTCAGCAAAGTGGCGGTTGGCCCGAACACCGTGATGAGCTTTTACAACGGTGTTCG TATCACCCACCAGGAAGTGGACAGCCGTGATTGGGCGCTGAACGGTAACACCCTGAGCCTGG ACGAGGAAACCGTGATTGATGTTCCGGAGCCGTACAACCACGTTAGCAAGTATTGCGCGAGC CTGGGCCACAAAGCGAACCACAGCTTCACCCCGAACTGCATTTACGATATGTTCGTGCACCC GCGTTTTGGTCCGATCAAGTGCATTCGTACCCTGCGTGCGGTTGAAGCGGACGAGGAACTGA CCGTGGCGTACGGCTATGATCACAGCCCGCCGGGTAAAAGCGGTCCGGAAGCGCCGGAGTGG TATCAGGTGGAGCTGAAGGCGTTTCAAGCGACCCAGCAAAAATAATGATAGGAATTCGCGCG C

MYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLA TLMSTEEGRPHFELMPGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLFS KVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEETVIDVPEPYNHVSKYCASLGHK ANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVE LKAFQATQQK

135

Split SETD7 – Pair 1

N terminal

ATGGGTTACAAGGACAACATCCGTCACGGCGTTTGCTGGATTTACTATCCGGACGGTGGCAG CCTGGTGGGTGAGGTTAACGAAGATGGCGAGATGACCGGCGAAAAAATCGCGTACGTGTATC CGGACGAGCGTACCGCGCTGTACGGCAAGTTCATCGATGGCGAAATGATTGAGGGTAAACTG GCGACCCTGATGAGCACCGAGGAA

MGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKL ATLMSTEEGSGGSGSGALKKELQANKKELAQLKWELQALKKELAQSGHHHHHHHH

C terminal

ATGGGCCGTCCGCACTTCGAACTGATGCCGGGTAACAGCGTTTATCACTTTGACAAGAGCAC CAGCAGCTGCATCAGCACCAACGCGCTGCTGCCGGACCCGTACGAGAGCGAACGTGTGTATG TTGCGGAAAGCCTGATTAGCAGCGCGGGCGAGGGCCTGTTCAGCAAAGTGGCGGTTGGCCCG AACACCGTGATGAGCTTTTACAACGGTGTTCGTATCACCCACCAGGAAGTGGACAGCCGTGA TTGGGCGCTGAACGGTAACACCCTGAGCCTGGACGAGGAAACCGTGATTGATGTTCCGGAGC CGTACAACCACGTTAGCAAGTATTGCGCGAGCCTGGGCCACAAAGCGAACCACAGCTTCACC CCGAACTGCATTTACGATATGTTCGTGCACCCGCGTTTTGGTCCGATCAAGTGCATTCGTAC CCTGCGTGCGGTTGAAGCGGACGAGGAACTGACCGTGGCGTACGGCTATGATCACAGCCCGC CGGGTAAAAGCGGTCCGGAAGCGCCGGAGTGGTATCAGGTGGAGCTGAAGGCGTTTCAAGCG ACCCAGCAAAAA

MWSHPQFEKGSEQLEKKLQALEKKLAQLEWKNQALEKKLAQGSGSGGDIMGRPHFELMPGNS VYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRIT HQEVDSRDWALNGNTLSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRF GPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

136

Split SETD7 – Pair 2

N terminal

ATGGGTTACAAGGACAACATCCGTCACGGCGTTTGCTGGATTTACTATCCGGACGGTGGCAG CCTGGTGGGTGAGGTTAACGAAGATGGCGAGATGACCGGCGAAAAAATCGCGTACGTGTATC CGGACGAGCGTACCGCGCTGTACGGCAAGTTCATCGATGGCGAAATGATTGAGGGTAAACTG GCGACCCTGATGAGCACCGAGGAAGGCCGTCCGCACTTCGAACTGATGCCGGGTAACAGCGT TTATCAC

MGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKL ATLMSTEEGRPHFELMPGNSVYHGSGGSGSGALKKELQANKKELAQLKWELQALKKELAQSG HHHHHHHH

C terminal

ATGTTTGACAAGAGCACCAGCAGCTGCATCAGCACCAACGCGCTGCTGCCGGACCCGTACGA GAGCGAACGTGTGTATGTTGCGGAAAGCCTGATTAGCAGCGCGGGCGAGGGCCTGTTCAGCA AAGTGGCGGTTGGCCCGAACACCGTGATGAGCTTTTACAACGGTGTTCGTATCACCCACCAG GAAGTGGACAGCCGTGATTGGGCGCTGAACGGTAACACCCTGAGCCTGGACGAGGAAACCGT GATTGATGTTCCGGAGCCGTACAACCACGTTAGCAAGTATTGCGCGAGCCTGGGCCACAAAG CGAACCACAGCTTCACCCCGAACTGCATTTACGATATGTTCGTGCACCCGCGTTTTGGTCCG ATCAAGTGCATTCGTACCCTGCGTGCGGTTGAAGCGGACGAGGAACTGACCGTGGCGTACGG CTATGATCACAGCCCGCCGGGTAAAAGCGGTCCGGAAGCGCCGGAGTGGTATCAGGTGGAGC TGAAGGCGTTTCAAGCGACCCAGCAAAAA

MWSHPQFEKGSEQLEKKLQALEKKLAQLEWKNQALEKKLAQGSGSGGDIMFDKSTSSCISTN ALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNT LSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEAD EELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

137

Split SETD7 – Pair 3

N terminal

ATGGGTTACAAGGACAACATCCGTCACGGCGTTTGCTGGATTTACTATCCGGACGGTGGCAG CCTGGTGGGTGAGGTTAACGAAGATGGCGAGATGACCGGCGAAAAAATCGCGTACGTGTATC CGGACGAGCGTACCGCGCTGTACGGCAAGTTCATCGATGGCGAAATGATTGAGGGTAAACTG GCGACCCTGATGAGCACCGAGGAAGGCCGTCCGCACTTCGAACTGATGCCGGGTAACAGCGT TTATCACTTTGACAAGAGCACCAGCAGCTGCATCAGCACCAACGCGCTGCTGCCGGACCCGT ACGAGAGCGAACGTGTGTATGTTGCGGAAAGCCTGATTAGCAGCGCGGGCGAGGGCCTGTTC AGCAAAGTGGCGGTTGGC

MGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKL ATLMSTEEGRPHFELMPGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLF SKVAVGGSGGSGSGALKKELQANKKELAQLKWELQALKKELAQSGHHHHHHHH

C terminal

ATGCCGAACACCGTGATGAGCTTTTACAACGGTGTTCGTATCACCCACCAGGAAGTGGACAG CCGTGATTGGGCGCTGAACGGTAACACCCTGAGCCTGGACGAGGAAACCGTGATTGATGTTC CGGAGCCGTACAACCACGTTAGCAAGTATTGCGCGAGCCTGGGCCACAAAGCGAACCACAGC TTCACCCCGAACTGCATTTACGATATGTTCGTGCACCCGCGTTTTGGTCCGATCAAGTGCAT TCGTACCCTGCGTGCGGTTGAAGCGGACGAGGAACTGACCGTGGCGTACGGCTATGATCACA GCCCGCCGGGTAAAAGCGGTCCGGAAGCGCCGGAGTGGTATCAGGTGGAGCTGAAGGCGTTT CAAGCGACCCAGCAAAAA

MWSHPQFEKGSEQLEKKLQALEKKLAQLEWKNQALEKKLAQGSGSGGDIMPNTVMSFYNGVR ITHQEVDSRDWALNGNTLSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHP RFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

138

Split SETD7 – Pair 4

N terminal

ATGTACAAGGACAACATCCGTCACGGCGTTTGCTGGATTTACTATCCGGACGGTGGCAGCCT GGTGGGTGAGGTTAACGAAGATGGCGAGATGACCGGCGAAAAAATCGCGTACGTGTATCCGG ACGAGCGTACCGCGCTGTACGGCAAGTTCATCGATGGCGAAATGATTGAGGGTAAACTGGCG ACCCTGATGAGCACCGAGGAAGGCCGTCCGCACTTCGAACTGATGCCGGGTAACAGCGTTTA TCACTTTGACAAGAGCACCAGCAGCTGCATCAGCACCAACGCGCTGCTGCCGGACCCGTACG AGAGCGAACGTGTGTATGTTGCGGAAAGCCTGATTAGCAGCGCGGGCGAGGGCCTGTTCAGC AAAGTGGCGGTTGGCCCGAACACCGTGATGAGCTTTTACAACGGTGTTCGTATCACCCACCA GGAAGTGGACAGCCGTGATTGGGCGCTGAACGGTAACACCCTGAGCCTGGACGAGGAAACCG TGATTGATGTTCCGGAGCCGTACAACCACG

MYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGK LATLMSTEEGRPHFELMPGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGE GLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEETVIDVPEPYNHGSGGSGS GALKKELQANKKELAQLKWELQALKKELAQSGHHHHHHHH

C terminal

ATGGTTAGCAAGTATTGCGCGAGCCTGGGCCACAAAGCGAACCACAGCTTCACCCCGAACTG CATTTACGATATGTTCGTGCACCCGCGTTTTGGTCCGATCAAGTGCATTCGTACCCTGCGTG CGGTTGAAGCGGACGAGGAACTGACCGTGGCGTACGGCTATGATCACAGCCCGCCGGGTAAA AGCGGTCCGGAAGCGCCGGAGTGGTATCAGGTGGAGCTGAAGGCGTTTCAAGCGACCCAGCA AAAA

MWSHPQFEKGSEQLEKKLQALEKKLAQLEWKNQALEKKLAQGSGSGGDIMVSKYCASLGHKA NHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVEL KAFQATQQK

139

Split SETD7 – Pair 5

N terminal

ATGGGTTACAAGGACAACATCCGTCACGGCGTTTGCTGGATTTACTATCCGGACGGTGGCAG CCTGGTGGGTGAGGTTAACGAAGATGGCGAGATGACCGGCGAAAAAATCGCGTACGTGTATC CGGACGAGCGTACCGCGCTGTACGGCAAGTTCATCGATGGCGAAATGATTGAGGGTAAACTG GCGACCCTGATGAGCACCGAGGAAGGCCGTCCGCACTTCGAACTGATGCCGGGTAACAGCGT TTATCACTTTGACAAGAGCACCAGCAGCTGCATCAGCACCAACGCGCTGCTGCCGGACCCGT ACGAGAGCGAACGTGTGTATGTTGCGGAAAGCCTGATTAGCAGCGCGGGCGAGGGCCTGTTC AGCAAAGTGGCGGTTGGCCCGAACACCGTGATGAGCTTTTACAACGGTGTTCGTATCACCCA CCAGGAAGTGGACAGCCGTGATTGGGCGCTGAACGGTAACACCCTGAGCCTGGACGAGGAAA CCGTGATTGATGTTCCGGAGCCGTACAACCACGTTAGCAAGTATTGCGCGAGCCTGGGCCAC AAAGCGAACCACAGCTTCACCCCGAACTGCATTTACGATATGTTCGTGCACCCGCGTTTTGG TCCGATCAAGTGCATTCGTACCCTGCGTGCGGTTGAA

MGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKL ATLMSTEEGRPHFELMPGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLF SKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEETVIDVPEPYNHVSKYCASLGH KANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEGSGGSGSGALKKELQANKKELAQLKWELQ ALKKELAQSGHHHHHHHH

C terminal

MWSHPQFEKGSEQLEKKLQALEKKLAQLEWKNQALEKKLAQGSGSGGDIATGGCGGACGAGG AACTGACCGTGGCGTACGGCTATGATCACAGCCCGCCGGGTAAAAGCGGTCCGGAAGCGCCG GAGTGGTATCAGGTGGAGCTGAAGGCGTTTCAAGCGACCCAGCAAAAA

MADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

140

KAT8 GCGCGCGGATCCATGACCAAGGTGAAATATGTTGACAAAATCCACATTGGTAACTACGAGAT CGATGCGTGGTATTTCAGCCCGTTTCCGGAGGACTACGGCAAGCAGCCGAAACTGTGGCTGT GCGAGTATTGCCTGAAGTACATGAAATATGAAAAGAGCTACCGTTTCCACCTGGGCCAGTGC CAATGGCGTCAGCCGCCGGGCAAAGAAATCTACCGTAAGAGCAACATTAGCGTGTATGAAGT TGACGGTAAAGATCACAAGATTTACTGCCAAAACCTGTGCCTGCTGGCGAAACTGTTCCTGG ACCACAAGACCCTGTACTTTGATGTGGAGCCGTTCGTGTTCTACATCCTGACCGAAGTGGAT CGTCAGGGTGCGCACATTGTTGGCTATTTCAGCAAGGAGAAAGAAAGCCCGGACGGCAACAA CGTGGCGTGCATCCTGACCCTGCCGCCGTACCAACGTCGTGGTTATGGCAAATTCCTGATTG CGTTTAGCTACGAGCTGAGCAAGCTGGAAAGCACCGTGGGTAGCCCGGAGAAACCGCTGAGC GATCTGGGCAAGCTGAGCTACCGTAGCTATTGGAGCTGGGTTCTGCTGGAAATCCTGCGTGA CTTTCGTGGCACCCTGAGCATCAAGGATCTGAGCCAGATGACCAGCATTACCCAAAACGACA TCATTAGCACCCTGCAGAGCCTGAACATGGTGAAATATTGGAAGGGCCAACACGTGATCTGC GTTACCCCGAAACTGGTTGAGGAACACCTGAAGAGCGCGCAGTACAAGAAACCGCCGATTAC CGTGGATAGCGTTTGCCTGAAATGGGCTCCGCCGAAGCACAAACAAGTTAAGCTGAGCAAGA AATAATGATAGGAATTCGCGCGC

MTKVKYVDKIHIGNYEIDAWYFSPFPEDYGKQPKLWLCEYCLKYMKYEKSYRFHLGQCQWRQ PPGKEIYRKSNISVYEVDGKDHKIYCQNLCLLAKLFLDHKTLYFDVEPFVFYILTEVDRQGA HIVGYFSKEKESPDGNNVACILTLPPYQRRGYGKFLIAFSYELSKLESTVGSPEKPLSDLGK LSYRSYWSWVLLEILRDFRGTLSIKDLSQMTSITQNDIISTLQSLNMVKYWKGQHVICVTPK LVEEHLKSAQYKKPPITVDSVCLKWAPPKHKQVKLSKK

141

KMT2D

GCGCGCGGATCCATGCTGCCGGGCGTGGAATCGTGCCAGAACTACCTGTTTCGCTACGGTCG TCACCCGCTGATGGAACTGCCGCTGATGATTAACCCGACCGGCTGCGCACGTTCCGAACCGA AAATTCTGACGCATTATAAACGCCCGCACACCCTGAACTCAACGTCGATGAGCAAAGCGTAT CAGAGCACCTTTACGGGTGAAACCAATACGCCGTACTCTAAACAGTTCGTTCATAGTAAAAG CTCTCAATATCGTCGCCTGCGTACCGAATGGAAAAACAATGTCTACCTGGCACGTTCTCGCA TTCAGGGCCTGGGTCTGTATGCGGCCAAAGACCTGGAAAAACACACGATGGTGATTGAATAC ATCGGCACGATTATCCGTAACGAAGTGGCCAATCGTCGCGAAAAAATCTATGAAGAACAAAA CCGTGGTATTTACATGTTTCGCATCAACAATGAACATGTGATTGATGCCACCCTGACGGGCG GTCCGGCACGTTATATCAACCACAGCTGCGCACCGAATTGTGTTGCTGAAGTGGTTACCTTC GATAAAGAAGACAAAATCATCATCATCAGTTCCCGTCGCATCCCGAAAGGCGAAGAACTGAC CTATGATTACCAATTTGACTTCGAAGACGACCAACATAAAATCCCGTGTCACTGCGGTGCCT GGAACTGCCGTAAATGGATGAACTAATGATAGGAATTCGCGCGC

MLPGVESCQNYLFRYGRHPLMELPLMINPTGCARSEPKILTHYKRPHTLNSTSMSKAYQSTF TGETNTPYSKQFVHSKSSQYRRLRTEWKNNVYLARSRIQGLGLYAAKDLEKHTMVIEYIGTI IRNEVANRREKIYEEQNRGIYMFRINNEHVIDATLTGGPARYINHSCAPNCVAEVVTFDKED KIIIISSRRIPKGEELTYDYQFDFEDDQHKIPCHCGAWNCRKWMN

142

Split GFP

N terminal

CCATGGGTCACATGAGCAAAGGCGAGGAGCTGTTTACCGGTGTTGTTCCGATTCTGGTTGAA CTGGATGGCGACGTTAATGGTCACAAATTCAGCGTGAGCGGCGAGGGCGAAGGTGACGCGAC CTACGGCAAGCTGACCCTGAAATTTATCTGCACCACCGGTAAACTGCCGGTGCCGTGGCCGA CCCTGGTTACCACCCTGACCTACGGCGTGCAGTGCTTCGCGCGTTATCCGGACCACATGAAG CAACACGATTTCTTTAAAAGCGCGATGCCGGAGGGCTACGTTCAGGAACGTACCATTAGCTT CAAGGACGATGGTAACTATAAAACCCGTGCGGAAGTGAAGTTTGAAGGCGACACCCTGGTTA ACCGTATCGAACTGAAGGGTATTGACTTTAAAGAGGATGGCAACATCCTGGGTCACAAACTG GAATACAACTATAACAGCCACAACGTTTATATTACCGCGGATAAGCAGAAAGGATCCGGTGG CAGCGGTAGCGGTGCGCTGAAGAAAGAGCTGCAAGCGAACAAGAAAGAACTGGCGCAGCTGA AGTGGGAGCTGCAAGCGCTGAAGAAGGAACTGGCGCAGAGCGGTCATCACCACCACCATCAC CATCACTAATAGTGAGAATTC

MGHMSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPT LVTTLTYGVQCFARYPDHMKQHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVN RIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKGSGGSGSGALKKELQANKKELAQLK WELQALKKELAQSGHHHHHHHH

C terminal

GCGCGCCATATGTGGAGCCATCCGCAATTTGAGAAAGGTAGCGAACAACTGGAGAAGAAACT GCAAGCGCTGGAGAAGAAACTGGCGCAACTGGAGTGGAAGAACCAGGCGCTGGAGAAGAAAC TGGCGCAGGGTAGCGGCAGCGGTGGCACTAGTAACGGTATTAAGGCGAACTTCAAAATCCGT CACAACATTGAAGACGGTGGCGTGCAGCTGGCGGATCACTACCAGCAAAACACCCCGATTGG TGATGGTCCGGTTCTGCTGCCGGATAACCACTATCTGAGCACCCAAAGCAAGCTGAGCAAAG ACCCGAACGAGAAACGTGATCACATGGTGCTGCTGGAGTTTGTGACCGCGGCGGGCATTACC CACGGCATGGATGAGCTGTATAAATAATAGTGAGGTACCGCGCGC

MWSHPQFEKGSEQLEKKLQALEKKLAQLEWKNQALEKKLAQGSGSGGTSNGIKANFKIRHNI EDGGVQLADHYQQNTPIGDGPVLLPDNHYLSTQSKLSKDPNEKRDHMVLLEFVTAAGITHGM DELYK

143

HER2 gRNAs gRNA1 GAATTTATCCCGGACTCCGGGG gRNA2 GTTGGAATGCAGTTGGAGGGGG gRNA3 ATTCCAGAAGATATGCCCCGGG

FOG (1-45)

ATGTCCAGGCGGAAACAGAGCAACCCCCGGCAGATCAAGCGTTCCCTCGGAGACATGGAGGC CAGAGAGGAGGTGCAGTTGGTGGGTGCCAGCCACATGGAGCAAAAGGCCACGGCACCTGAAG CCCCGAGCCCT

MSRRKQSNPRQIKRSLGDMEAREEVQLVGASHMEQKATAPEAPSP

KRAB

GTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGACACTGC TCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGAACCTGGTTTCCTTGGGTT ATCAGCTTACTAAGCCAGATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTG GAGAGAGAAATTCACCAAGAGACCCATCCT

VTFKDVFVDFTREEWKLLDTAQQIVYRNVMLENYKNLVSLGYQLTKPDVILRLEKGEEPWLV EREIHQETHP

Histone 3.3

ATGGCGCGCACCAAACAGACCGCGCGCAAAAGCACCGGCGGCAAAGCGCCGCGCAAACAGCT GGCGACCAAAGCGGCGCGCAAAAGCGCGCCGAGCACCGGCGGCGTGAAAAAACCGCATCGCT ATCGCCCGGGCACCGTGGCGCTGCGCGAAATTCGCCGCTATCAGAAAAGCACCGAACTGCTG ATTCGCAAACTGCCGTTTCAGCGCCTGGTGCGCGAAATTGCGCAGGATTTTAAAACCGATCT GCGCTTTCAGAGCGCGGCGATTGGCGCGCTGCAGGAAGCGAGCGAAGCGTATCTGGTGGGCC TGTTTGAAGATACCAACCTGTGCGCGATTCATGCGAAACGCGTGACCATTATGCCGAAAGAT ATTCAGCTGGCGCGCCGCATTCGCGGCGAACGCGCG

MARTKQTARKSTGGKAPRKQLATKAARKSAPSTGGVKKPHRYRPGTVALREIRRYQKSTELL IRKLPFQRLVREIAQDFKTDLRFQSAAIGALQEASEAYLVGLFEDTNLCAIHAKRVTIMPKD IQLARRIRGERA

144

Brd3 ET (full length)

GCATCTGCGTCCTATGACTCAGAGGAAGAGGAGGAGGGCCTGCCCATGAGCTATGATGAAAA GCGACAACTTAGCCTTGACATCAACCGGCTGCCCGGCGAGAAGCTAGGGCGTGTGGTGCACA TCATTCAGTCTCGGGAGCCCTCGCTTCGGGACTCAAACCCAGACGAGATTGAGATTGACTTT GAGACCCTGAAGCCAACCACGCTGCGGGAACTGGAGAGATATGTCAAGTCTTGTTTACAAAA AAAGCAGAGGAAACCATGA

GPLGSASASYDSEEEEEGLPMSYDEKRQLSLDINRLPGEKLGRVVHIIQSREPSLRDSNPDE IEIDFETLKPTTLRELERYVKSCLQKKQRKP

Brd3 ET (short)

GAGGAGGAGGGCCTGCCCATGAGCTATGATGAAAAGCGACAACTTAGCCTTGACATCAACCG GCTGCCCGGCGAGAAGCTAGGGCGTGTGGTGCACATCATTCAGTCTCGGGAGCCCTCGCTTC GGGACTCAAACCCAGACGAGATTGAGATTGACTTTGAGACCCTGAAGCCAACCACGCTGCGG GAACTGGAGAGATATGTCAAGTCTTGTTTACAAAAA

GPLGSEEEGLPMSYDEKRQLSLDINRLPGEKLGRVVHIIQSREPSLRDSNPDEIEIDFETLK PTTLRELERYVKSCLQK

Brd3 ET (shorter)

CCATGAGCTATGATGAAAAGCGACAACTTAGCCTTGACATCAACCGGCTGCCCGGCGAGAAG CTAGGGCGTGTGGTGCACATCATTCAGTCTCGGGAGCCCTCGCTTCGGGACTCAAACCCAGA CGAGATTGAGATTGACTTTGAGACCCTGAAGCCAACCACGCTGCGGGAACTGGAGAGATATG TCAAGTCTTGTTTACAAAAA

GPLGSMSYDEKRQLSLDINRLPGEKLGRVVHIIQSREPSLRDSNPDEIEIDFETLKPTTLRE LERYVKSCLQK

145

Appendix 2: RT-qPCR Primers

HER2 Forward GGGAAACCTGGAACTCACCT

HER2 Reverse GACCTGCCTCACTTGGTTGT

GAPDH Forward AATCCCATCACCATCTTCCA

GAPDH Reverse CTCCATGGTGGTGAAGACG

146

Appendix 3: Human Epigenetic Enzymes

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

BRCA2 Acetylates 384 Partial Involved in double- Occasional Acetyl- primarily H3 and (wrong strand break repair homodimer CoA H4 of free domain) and/or homologous histones. recombination. Binds RAD51 and potentiates recombinational DNA repair by promoting assembly of RAD51 onto single-stranded DNA (ssDNA). Acts by targeting RAD51 to ssDNA over double- stranded DNA, enabling RAD51 to displace replication protein-A (RPA) from ssDNA and stabilizing RAD51- ssDNA filaments by blocking ATP hydrolysis. CBP Acetylates all 265 Partial Transcription factor Part of Acetyl- four core (wrong involved in large numerous CoA histones in domain) transcriptional transcriptional nucleosomes. complex — complexes acetylase activity tied to this. Promiscuous protein acetylase. CDY1 Has histone 60 Partial Acetyl- acetyltransferase (unclear CoA activity, with a if correct preference for domain) histone H4. CLOCK Acetylates 95 No Transcription factor Acetyl- primarily involved in CoA histones H3 and maintaining H4 circadian rhythms.

147

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

ELP3 Acetylates 62 No Acetylates alpha- Catalytic histone Acetyl- histones H3 and tubulin. acetyltransferas CoA probably H4. May e subunit of the also have a RNA polymerase methyltransferas II elongator e activity. complex, which is a component of the RNA polymerase II (Pol II) holoenzyme. GTF3C Specifically 92 No Essential for RNA Part of the TFIIIC Acetyl- 4 acetylates free polymerase III to subcomplex CoA and nucleosomal make a number of TFIIIC2 H3. small nuclear and cytoplasmic RNAs. HAT1 Acetylates 50 Yes Catalytic subunit Acetyl- soluble but not of the type B CoA nucleosomal H4 histone at 'Lys-5' and acetyltransferas 'Lys-12' and e (HAT) complex, acetylates composed of histone H2A at RBBP7 and 'Lys-5'. Has HAT1. intrinsic substrate specificity that modifies lysine in recognition sequence GXGKXG. HDAC1 Responsible for 55 Yes Part of NuRD, NAD+ the deacetylation Sin3, numerous of lysine residues other complexes on the N-terminal part of the core histones (H2A, H2B, H3 and H4). HDAC1 Responsible for 71 No NAD+ 0 the deacetylation of lysine residues on the N-terminal part of the core histones (H2A, H2B, H3 and H4).

148

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

HDAC1 Responsible for 39 No NAD+ 1 the deacetylation of lysine residues on the N-terminal part of the core histones (H2A, H2B, H3 and H4). HDAC2 Responsible for 55 Yes Part of NuRD, NAD+ the deacetylation Sin3, numerous of lysine residues other complexes on the N-terminal part of the core histones (H2A, H2B, H3 and H4). HDAC3 Responsible for 49 Yes Probably Forms a NAD+ the deacetylation participates in the heterologous of lysine residues regulation of complex at least on the N-terminal transcription with YY1. part of the core through its binding Component of histones (H2A, to the zinc-finger the Notch H2B, H3 and H4). transcription factor corepressor YY1. Acts as a complex. molecular chaperone for shuttling phosphorylated NR2C1 to PML bodies for sumoylation. HDAC4 Responsible for 119 Partial Homodimer NAD+ the deacetylation (unclear whether of lysine residues obligate) on the N-terminal part of the core histones (H2A, H2B, H3 and H4). HDAC5 Responsible for 119 No NAD+ the deacetylation of lysine residues on the N-terminal part of the core histones (H2A, H2B, H3 and H4).

149

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

HDAC6 Responsible for 131 Partial Plays a key role in NAD+ the deacetylation (wrong the degradation of of lysine residues domain) misfolded proteins: on the N-terminal when misfolded part of the core proteins are too histones (H2A, abundant to be H2B, H3 and H4). degraded by the chaperone refolding system and the ubiquitin- proteasome, mediates the transport of misfolded proteins to a cytoplasmic juxtanuclear structure called aggresome. HDAC7 Responsible for 103 Partial NAD+ the deacetylation of lysine residues on the N-terminal part of the core histones (H2A, H2B, H3 and H4). HDAC8 Responsible for 42 Yes NAD+ the deacetylation of lysine residues on the N-terminal part of the core histones (H2A, H2B, H3 and H4). HDAC9 Responsible for 111 No Homodimer NAD+ the deacetylation of lysine residues on the N-terminal part of the core histones (H2A, H2B, H3 and H4). KAT2A Acetylates H3- 94 Partial Promiscuous protein Part of various Acetyl- (GCN5) K9/14/18 (only if acetylase complexes; can CoA H3-S10 is only acetylate phosphorylated?) H3-K14 in free histones by itself

150

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

KAT2B Has significant 93 Partial Promiscuous protein Component of a Acetyl- histone acetylase large chromatin CoA acetyltransferase remodeling activity with core complex, at least histones (H3 and composed of H4), and also MYSM1, with nucleosome KAT2B/PCAF, core particles. RBM10 and KIF11/TRIP5. KAT5 Acetylates 53 Partial Directly acetylates Catalytic subunit Acetyl- amino-terminal and activates ATM. of the NuA4 CoA tail peptides of histone histones H2A, H3 acetyltransferas and H4, but not e complex. H2B, consistent Component of a with substrate SWR1-like preference on complex that intact histones. specifically Preferred mediates the acetylation sites removal of for Tip60 are the histone Lys-5 of histone H2A.Z/H2AFZ H2A, the Lys-14 from the of histone H3, nucleosome. and the Lys-5, -8, -12, -16 of histone H4. MYST1 Histone H4-K16 52 Yes Acetylates p53 at Part of MSL and Acetyl- acetylation. K120 NSL complexes CoA

MYST2 Acetylates 71 No Component of Acetyl- histone H4 at the HBO1 CoA Lys5, Lys8 and complex Lys12, and also has a reduced activity toward histone H3. Responsible for the bulk of histone H4 acetylation in vivo. MYST3 Acetylates lysine 225 Partial Acetylates p53/TP53 Component of Acetyl- residues in at 'Lys-120' and 'Lys- the MOZ/MORF CoA histone H3 and 382' complex histone H4 (in vitro).

151

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

MYST4 Acetylates 229 No Acetyl- histones H3 and Component of CoA H4. the MOZ/MORF complex NAT10 Unclear 116 No RNA cytidine Acetyl- ("acetylates acetyltransferase CoA histones"). with specificity toward both 18S rRNA and tRNAs. Acetylates microtubules. NCOA1 Displays histone 157 Partial Transcription factor Apparently Acetyl- acetyltransferase (wrong complex- CoA activity toward domain) exclusive H3 and H4. NCOA2 Intrinsic histone 159 Partial Transcription factor Complex Acetyl- acetyltransferase (wrong exclusive CoA activity. domain)

NCOA3 Acetylates 155 Partial Transcription factor Apparently Acetyl- primarily (wrong complex- CoA histones H3 and domain) exclusive H4. p300 Acetylates all 264 Partial Transcription factor Part of Acetyl- four core involved in large numerous CoA histones in transcriptional transcriptional nucleosomes. complex — complexes acetylase activity tied to this. Promiscuous protein acetylase. SIRT1 Deacetylates 'Lys- 82 Partial Nightmarishly long Found in a NAD+ 26' of HIST1H1E, list of functions complex with histone H2A, 'Lys- PCAF and 9' of H3 and 'Lys- MYOD1. 16' of H4. Component of the eNoSC complex. SIRT2 Primarily 43 Yes Promiscuous protein NAD+ deacetylates deacetylase Lys16 of histone H4. SIRT3 Deacetylates 44 Yes Activates or NAD+ acetyl-Lys 9 and deactivates acetyl-Lys 16 of mitochondrial target histones H3 and proteins by H4, respectively. deacetylating key lysine residues.

152

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

SIRT6 Has deacetylase 39 Yes Promotes DNA end NAD+ activity towards resection via 'Lys-9' and 'Lys- deacetylation of 56' of histone H3. RBBP8. TAF1 Acetylates 213 Partial Transcription factor. Largest Acetyl- histones H3 and Contains novel N- component and CoA H4 in vitro. and C-terminal core scaffold of Ser/Thr kinase the TFIID basal domains which can transcription autophosphorylate factor complex. or Component of transphosphorylate MLL complex. other transcription factors. DNMT Methylates CpG 183 Yes Homodimer. S- 1 residues. adenosyl- Preferentially L- methylates methionin hemimethylated e DNA EED Methylates 'Lys- 50 Yes Component of S- 27' of histone H3. the PRC2/EED- adenosyl- Recognizes 'Lys- EZH2 complex L- 26' trimethylated methionin histone H1 with e the effect of inhibiting PRC2 complex methyltransferas e activity. JMJD6 Arginine 46 Yes Acts as a lysyl- Fe2+ demethylase hydroxylase that which catalyzes 5- demethylates hydroxylation on histone H3 at specific lysine 'Arg-2' (H3R2me) residues of target and histone H4 at proteins. May also 'Arg-3' (H4R3me). act as an RNA hydroxylase. PRMT3 Methylates 60 Yes Occasional S- histone H4 on homodimer. adenosyl- Arg-3. L- methionin e

153

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

PRMT5 Methylates 73 Yes Promiscuous Forms, at least, S- histone H2A and arginine homodimers and adenosyl- H4 'Arg-3' during methyltransferase. homotetramers. L- germ cell Component of methionin development. the e Methylates methylosome, histone H3 'Arg- along with 8', which may numerous other repress complexes. transcription. PRMT6 Specifically 42 Yes Promiscuous S- mediates the arginine adenosyl- asymmetric methyltransferase. L- dimethylation of methionin histone H3 'Arg-2' e to form H3R2me2a. H3R2me2a represents a specific tag for epigenetic transcriptional repression and is mutually exclusive with methylation on histone H3 'Lys-4' (H3K4me2 and H3K4me3). Also methylates histone H2A and H4 'Arg-3' SETD7 Specifically 41 Yes Has S- monomethylates methyltransferase adenosyl- 'Lys-4' of histone activity toward non- L- H3. histone proteins methionin such as p53/TP53, e TAF10, and possibly TAF7 SMYD2 Specifically 50 Yes Monomethylates S- methylates 'Lys-370' of adenosyl- histone H3 'Lys-4' p53/TP53. L- (H3K4me) and Monomethylates methionin dimethylates RB1 at 'Lys-860'. e histone H3 'Lys- 36' (H3K36me2).

154

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

SMYD3 Specifically di- 49 Yes S- and tri- adenosyl- methylates K4 L- and K5 of H3. methionin e DNMT Required for 96 Partial Activated by S- 3B genome wide de (wrong binding DNMT3L adenosyl- novo methylation domain) L- and is essential methionin for the e establishment of DNA methylation patterns during development. May preferentially methylates nucleosomal DNA within the nucleosome core region. KDM2 Preferentially 153 Partial Fe2+ B demethylates (wrong trimethylated H3 domain) 'Lys-4' and dimethylated H3 'Lys-36' residue while it has weak or no activity for mono- and tri- methylated H3 'Lys-36'. KDM5 Demethylates 176 Partial Fe2+ C trimethylated (wrong and dimethylated domain) but not monomethylated H3 'Lys-4'. Does not demethylate histone H3 'Lys- 9', H3 'Lys-27', H3 'Lys-36', H3 'Lys- 79' or H4 'Lys-20'.

155

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

KDM5 Demethylates 174 Partial Fe2+ D trimethylated (wrong and dimethylated domain) but not monomethylated H3 'Lys-4'. Does not demethylate histone H3 'Lys- 9', H3 'Lys-27', H3 'Lys-36', H3 'Lys- 79' or H4 'Lys-20'. MBD3 Demethylates 33 Partial Heterodimer Fe2+ DNA at CpG (wrong with MBD2. Part dinucleotides. domain) of the NuRD and Recruits histone the MeCP1 deacetylases and complex. DNA methyltransferas es. NSD2 Methylates 52 Partial S- H3K27, H3K36 (wrong adenosyl- and H4K20. domain) L- methionin e PRMT2 Formation of 49 Partial Methylates the S- monomethyl- (wrong guanidino nitrogens adenosyl- and asymmetric domain) of arginyl residues in L- dimethyl-arginine proteins such as methionin residues on STAT3 and FBL e histone H4 SETDB Specifically 143 Partial Part of various S- 1 trimethylates (wrong complexes. adenosyl- 'Lys-9' of histone domain) Binding with L- H3. MBD1 probably methionin required for e activity. SUV39 Specifically 48 Partial Component of S- H1 trimethylates (wrong the eNoSC adenosyl- 'Lys-9' of histone domain) complex L- H3 using methionin monomethylated e H3 'Lys-9' as substrate. Inhibited by S10 phosphorylation. Also weakly methylates histone H1 (in vitro).

156

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

NSD1 Preferentially 297 Partial S- methylates 'Lys- (VERY adenosyl- 36' of histone H3 partial, L- and 'Lys-20' of but right methionin histone H4 (in domain!) e vitro). MLL3 Methylates 'Lys- 541 Partial Component of S- 4' of histone H3. (VERY the MLL2/3 adenosyl- partial, complex L- and methionin wrong e domain) EZH2 Able to mono-, 85 Partial Methylates non- Catalytic subunit S- di- and (not in E. histone proteins of the adenosyl- trimethylate 'Lys- coli) such as the PRC2/EED-EZH2 L- 27' of histone H3. transcription factor complex; methionin Also methylates GATA4 and the requires SUZ12 e Lys9 of H3. nuclear receptor and EED for RORA. activity) KDM5 Demethylates 192 Partial Fe2+ A trimethylated (does not and dimethylated fully but not cover monomethylated HMT H3 'Lys-4'. Does domain) not demethylate histone H3 'Lys- 9', H3 'Lys-27', H3 'Lys-36', H3 'Lys- 79' or H4 'Lys-20'. ASH1L Methylates 'Lys- 333 Partial S- 36' of histone H3. adenosyl- L- methionin e

157

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

CARM Methylates 66 Partial Methylates Probable S- 1 histone H3 at EP300/P300, both at homodimer. Part adenosyl- 'Arg-17'' 'Arg-2142', which of a complex L- (H3R17me), may loosen its consisting of methionin forming mainly interaction with CARM1, e asymmetric NCOA2/GRIP1, and EP300/P300 and dimethylarginine at 'Arg-580' and NCOA2/GRIP1. (H3R17me2a). 'Arg-604' in the KIX domain, which impairs its interaction with CREB and inhibits CREB-dependent transcriptional activation. Also methylates arginine residues in RNA- binding proteins PABPC1, ELAVL1 and ELAV4, which may affect their mRNA- stabilizing properties and the half-life of their target mRNAs. DNMT Required for 102 Partial Heterotetramer S- 3A genome wide de composed of 1 adenosyl- novo methylation DNMT3A L- and is essential homodimer and methionin for the 2 DNMT3L e establishment of subunits DNA methylation patterns during development. It modifies DNA in a non-processive manner and also methylates non- CpG sites. May preferentially methylate DNA linker between 2 nucleosomal cores and is inhibited by histone H1.

158

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

DOT1L Methylates 'Lys- 185 Partial S- 79' of histone H3. adenosyl- L- methionin e EHMT1 Specifically 141 Partial Heterodimer; S- mono- and heterodimerizes adenosyl- dimethylates with EHMT2. L- 'Lys-9' of histone Part of multiple methionin H3 (H3K9me1 complexes. e and H3K9me2, respectively) in euchromatin. Also weakly methylates 'Lys- 27' of histone H3 (H3K27me). EHMT2 Specifically 132 Partial Heterodimerizes S- (G9a) mono- and with adenosyl- dimethylates EHMT1/GLP. L- 'Lys-9' of histone Part of methionin H3 (H3K9me1 numerous e and H3K9me2, complexes. respectively) in euchromatin and weakly methylates 'Lys- 27' of histone H3 (H3K27me). May also methylate histone H1. JMJD1 Demethylates 285 Partial Fe2+ C both mono- and di-methylated 'Lys-4' of histone H3. Has no effect on tri-methylated 'Lys-4', mono-, di- or tri-methylated 'Lys-9', mono-, di- or tri-methylated 'Lys-27', mono-, di- or tri- methylated 'Lys- 36' of histone H3, or on mono-, di- or tri-methylated 'Lys-20' of histone H4.

159

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

KDM2 Preferentially 133 Partial May recognize and Part of a SCF Fe2+ A demethylates bind to some (SKP1-cullin-F- (JMJD2 dimethylated H3 phosphorylated box) protein A) 'Lys-36' residue proteins and ligase complex while it has weak promote their or no activity for ubiquitination and mono- and tri- degradation. methylated H3 'Lys-36'. KDM3 Specifically 192 Partial Fe2+ B demethylates 'Lys-9' of histone H3. KDM4 Demethylates 121 Partial Fe2+ A trimethylated H3 'Lys-9' and H3 'Lys-36' residue, while it has no activity on mono- and dimethylated residues. Does not demethylate histone H3 'Lys- 4', H3 'Lys-27' nor H4 'Lys-20'. KDM4 Only able to 122 Partial Fe2+ B demethylate trimethylated H3 'Lys-9'. Does not demethylate histone H3 'Lys- 4', H3 'Lys-27', H3 'Lys-36' nor H4 'Lys-20'. KDM4 Demethylates 120 Partial Fe2+ C trimethylated H3 'Lys-9' and H3 'Lys-36' residue, while it has no activity on mono- and dimethylated residues. Does not demethylate histone H3 'Lys- 4', H3 'Lys-27' nor H4 'Lys-20'.

160

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

KDM4 Demethylates 59 Partial Fe2+ D both di- and trimethylated H3 'Lys-9' residue, while it has no activity on monomethylated residues. Does not demethylate histone H3 'Lys- 4', H3 'Lys-27', H3 'Lys-36' nor H4 'Lys-20'. KDM5 Demethylates 176 Partial Fe2+ B trimethylated, dimethylated and monomethylated H3 'Lys-4'. Does not demethylate histone H3 'Lys-9' or H3 'Lys-27'. KDM6 Demethylates 154 Partial Component of Fe2+ A trimethylated the MLL2/3 and dimethylated complex but not monomethylated H3 'Lys-27'. KDM6 Demethylates 177 Partial Component of Fe2+ B trimethylated the MLL4 and dimethylated complex H3 'Lys-27'.

161

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

KDM7 Specifically 107 Partial Fe2+ A demethylates dimethylated 'Lys-9' and 'Lys- 27' (H3K9me2 and H3K27me2, respectively) of histone H3 and monomethylated histone H4 'Lys- 20' residue (H4K20Me1). Specifically binds trimethylated 'Lys-4' of histone H3 (H3K4me3), affecting histone demethylase specificity: in presence of H3K4me3, it has no demethylase activity toward H3K9me2, while it has high activity toward H3K27me2. Demethylates H3K9me2 in absence of H3K4me3. Has activity toward H4K20Me1 only when nucleosome is used as a substrate and when not histone octamer is used as substrate. KDM8 Specifically 47 Partial Fe2+ demethylates dimethylated 'Lys-36' (H3K36me2) of histone H3.

162

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

KMT2A Mediates 432 Partial Catalytic subunit S- methylation of of the MLL1/MLL adenosyl- 'Lys-4' of histone complex. Has L- H3 (H3K4me). weak methionin Has no activity methyltransferas e toward histone e activity by H3 itself, and phosphorylated requires other on 'Thr-3', less component of activity toward the MLL1/MLL H3 dimethylated complex to on 'Arg-8' or 'Lys- obtain full 9', while it has methyltransferas higher activity e activity. toward H3 acetylated on 'Lys-9'. NO66 Specifically 71 Partial Catalyzes the Fe2+ demethylates hydroxylation of 60S 'Lys-4' (H3K4me) ribosomal protein L8 and 'Lys-36' on 'His-216' (H3K36me) of histone H3. Preferentially demethylates trimethylated H3 'Lys-4' (H3K4me3) and monomethylated H3 'Lys-4' (H3K4me1) residues, while it has weaker activity for dimethylated H3 'Lys-36' (H3K36me2). NSD3 Preferentially 162 Partial S- methylates 'Lys- adenosyl- 4' and 'Lys-27' of L- histone H3. methionin e

163

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

PHF8 Demethylates 114 Partial Fe2+ mono- and dimethylated histone H3 'Lys-9' residue (H3K9Me1 and H3K9Me2), dimethylated H3 'Lys-27' (H3K27Me2) and monomethylated histone H4 'Lys- 20' residue (H4K20Me1). Only has activity toward H4K20Me1 when nucleosome is used as a substrate and when not histone octamer is used as substrate. May also have weak activity toward dimethylated H3 'Lys-36' (H3K36Me2), however, the relevance of this result remains unsure in vivo. Specifically binds trimethylated 'Lys-4' of histone H3 (H3K4me3), affecting histone demethylase specificity: has weak activity toward H3K9Me2 in absence of H3K4me3, while it has high activity toward H3K9me2 when binding H3K4me3.

164

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

PRDM Unclear (contains 92 Partial Transcriptional S- 1 a SET domain, repressor that binds adenosyl- that appears to specifically to the L- be all we know) PRDI element in the methionin promoter of the e beta-interferon gene PRDM Unclear (contains 130 Partial S- 10 a SET domain, adenosyl- that appears to L- be all we know) methionin e PRDM Involved in the 40 Partial S- 12 positive adenosyl- regulation of L- histone H3-K9 methionin dimethylation. e PRDM Specifically 189 Partial May function as a S- 2 methylates 'Lys- DNA-binding adenosyl- 9' of histone H3. transcription factor L- methionin e PRDM Unclear (contains 88 Partial S- 4 a SET domain, adenosyl- that appears to L- be all we know) methionin e SETD2 Trimethylates 288 Partial Component of a S- dimethylated complex with adenosyl- 'Lys-36' of HNRNPL. L- histone H3. methionin e SETD8 Specifically 39 Partial Mediates S- monomethylates monomethylation of adenosyl- 'Lys-20' of p53/TP53 at 'Lys- L- histone H4. 382' methionin e SETMA Methylates 'Lys- 78 Partial Regulates Homodimer. S- R 4' and 'Lys-36' of replication fork adenosyl- histone H3. processing, L- Specifically promoting methionin mediates replication fork e dimethylation of restart and H3 'Lys-36'. regulating DNA decatenation through stimulation of the topoisomerase activity of TOP2A

165

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

SUV39 Specifically 47 Partial Member of the S- H2 trimethylates large PER adenosyl- 'Lys-9' of histone complex L- H3 using methionin monomethylated e H3 'Lys-9' as substrate. Inhibited by S10 phosphorylation. SUV42 Specifically 99 Partial S- 0H1 trimethylates adenosyl- 'Lys-20' of L- histone H4. methionin e SUV42 Specifically 52 Partial S- 0H2 trimethylates adenosyl- 'Lys-20' of L- histone H4. methionin e TET2 Dioxygenase that 224 Partial 2- catalyzes the oxoglutara conversion of the te, Fe2+, modified Zn2+ genomic base 5- methylcytosine (5mC) into 5- hydroxymethylcy tosine (5hmC). Might initiate a process leading to cytosine demethylation through deamination into 5- hydroxymethylur acil (5hmU) and subsequent replacement by unmethylated cytosine by the base excision repair system.

166

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

UTY Catalyzes 150 Partial L- trimethylated ascorbate, 'Lys-27' Fe2+ (H3K27me3) demethylation in histone H3. Has relatively low lysine demethylase activity. AID AID could initiate 24 No Single-stranded Zn2+ (AICDA demethylation by DNA-specific ) a damage and cytidine deaminase. repair mechanism similar to that used in somatic hypermutation. EZH1 Able to mono-, 85 No Catalytic subunit S- di- and of the adenosyl- trimethylate 'Lys- PRC2/EED-EZH2 L- 27' of histone H3. complex; methionin requires SUZ12 e and EED for activity) GADD4 Demethylates 18 No Fe2+ 5B DNA by catalysing the removal of the methyl group on the 5 position of cytosines residing in the dinucleotide sequence mdCpdG. KDM1 Demethylates 92 No Fe2+ B both 'Lys-4' (H3K4me) and 'Lys-9' (H3K9me) of histone H3. Demethylates both mono- (H3K4me1) and di-methylated (H3K4me2) H3K4me.

167

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

KDM3 Preferentially 147 No Fe2+ A demethylates mono- and dimethylated H3 'Lys-9' residue, with a preference for dimethylated residue, while it has weak or no activity on trimethylated H3 'Lys-9'. MBD2 Demethylates 43 No Heterodimer Fe2+ DNA at CpG with MBD3. dinucleotides. Component of Recruits histone the MeCP1 deacetylases and complex DNA methyltransferas es. MLL4 Methylates 'Lys- 294 No Component of S- 4' of histone H3. the menin- adenosyl- associated L- histone methionin methyltransferas e e complex MLL5 Specifically 205 No Component of S- mono- and the MLL5-L adenosyl- dimethylates complex L- 'Lys-4' of histone methionin H3 e PRDM Specifically 64 No S- 6 methylates 'Lys- adenosyl- 20' of histone H4 L- in vitro. N.B. methionin Controversial e whether this activity exists in vivo PRDM Probable histone 56 No S- 7 methyltransferas adenosyl- e. L- methionin e PRDM Specifically 72 No S- 8 methylates H3K9 adenosyl- L- methionin e

168

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

PRMT1 Methylates 42 No Promiscuous Homodimer and S- histone H2, H3 arginine heterodimer adenosyl- and H4. methyltransferase. with PRMT8. L- Constitutes the Individual methionin main enzyme homodimers can e that mediates associate to monomethylatio form a n and asymmetric homohexamer. dimethylation of Component of histone H4 'Arg-4' the methylosome. PRMT7 Specifically 78 No Promiscuous Homodimer and S- mediates the arginine heterodimer adenosyl- symmetric methyltransferase. L- dimethylation of methionin histone H4 'Arg-3' e to form H4R3me2s, and also able to mediate the arginine methylation of histone H2A in vitro. SETD1 Specifically 186 No Component of S- A methylates 'Lys- the SET1 adenosyl- 4' of histone H3, complex L- but not if the methionin neighboring 'Lys- e 9' residue is already methylated. SETD1 Specifically 209 No Component of S- B methylates 'Lys- the SET1 adenosyl- 4' of histone H3, complex L- but not if the methionin neighboring 'Lys- e 9' residue is already methylated. Specifically tri- methylates 'Lys- 4' of histone H3 in vitro. SETD3 Methylates 'Lys- 67 No S- 4' and 'Lys-36' of adenosyl- histone H3 L- (H3K4me and methionin H3K36me) e

169

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

SETDB Specifically 82 No S- 2 trimethylates adenosyl- 'Lys-9' of histone L- H3. methionin e SUZ12 Methylates 'Lys- 83 No Component of S- 9' (H3K9me) and the PRC2/EED- adenosyl- 'Lys-27' EZH1/2 complex L- (H3K27me) of methionin histone H3. e TET1 Dioxygenase that 235 No 2- catalyzes the oxoglutara conversion of the te, Fe2+, modified Zn2+ genomic base 5- methylcytosine (5mC) into 5- hydroxymethylcy tosine (5hmC). Might initiate a process leading to cytosine demethylation through deamination into 5- hydroxymethylur acil (5hmU) and subsequent replacement by unmethylated cytosine by the base excision repair system. ATM Phosphorylates 351 No Promiscuous protein Dimers or 'Ser-139' of kinase involved tetramers in histone variant primarily in inactive state. H2AX. apoptotic pathways. On DNA damage, autophosphoryla tion dissociates ATM into monomers rendering them catalytically active. Part of the BRCA1- associated genome surveillance complex (BASC) 170

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

AURKC Phosphorylates 36 No Promiscuous protein Component of Mg2+ serine residues kinase. the 10 and 28 on chromosomal histone H3. passenger complex (CPC) NEK6 Phosphorylates 36 No Promiscuous protein Mg2+ histones H1 and kinase mostly H3. involved in mitosis.

NEK9 Phosphorylates 107 No Phosphorylates Homodimer Mg2+ histone H3 on myelin basic serine and protein, beta-casein, threonine and BICD2 during residues. mitosis. TLK1 Phosphorylates 87 No Phosphorylates Heterodimerises Mg2+ histone H3 at SNAP23. with TLK2 'Ser-10'.

DUSP1 Dephosphorylate 39 No Dephosphorylates s histone H3 on MAP kinase Ser-10. MAPK1/ERK2 on both 'Thr-183' and 'Tyr-185' EYA1 Specifically 65 No Promiscuous protein Mg2+ dephosphorylate phosphatase. s 'Tyr-142' of Transcriptional histone H2AX. coactivator. EYA3 Specifically 63 No Promiscuous protein Mg2+ dephosphorylate phosphatase. s 'Tyr-142' of Transcriptional histone H2AX. coactivator. EYA4 Specifically 70 No Mg2+ dephosphorylate s 'Tyr-142' of histone H2AX. UBP7 Deubiquitinates 128 Yes Promiscuous Occasional histone H2B. (pieceme deubiquitinase homodimer. al) Component of numerous complexes. UBP16 Specifically 94 No Homotetramer deubiquitinates histone H2A.

171

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

BRE1A Mediates 114 No May ubiquitinate Component of monoubiquitinati other proteins the RNF20/40 on of 'Lys-120' of complex (also histone H2B. known as BRE1 complex) probably composed of 2 copies of RNF20/BRE1A and 2 copies of RNF40/BRE1B. BRE1B Mediates 114 No May ubiquitinate Component of monoubiquitinati other proteins the RNF20/40 on of 'Lys-120' of complex (also histone H2B. known as BRE1 complex) probably composed of 2 copies of RNF20/BRE1A and 2 copies of RNF40/BRE1B. DZIP3 Catalyzes 139 No Promiscuous monoubiquitinati ubiquitin-protein on of H2A at ligase lysine 119. RAG1 Mediates 119 No monoubiquitinati on of histone H3.

E2 A In association 17 No Promiscuous with the E3 ubiquitin-protein enzyme BRE1 ligase (RNF20 and/or RNF40), it plays a role in transcription regulation by catalyzing the monoubiquitinati on of histone H2B at 'Lys-120' to form H2BK120ub1.

172

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

BRCC3 In the BRCA1-A 36 No Promiscuous Component of Zn2+ complex, it deubiquitinase numerous specifically complexes, and removes 'Lys-63'- appears to linked ubiquitin exclusively on histones H2A function within and H2AX, these antagonizing the complexes. RNF8-dependent ubiquitination at double-strand breaks (DSBs). UBP3 Deubiquitinates 59 No monoubiquitinat ed histone H2A and H2B. UBP22 Catalyzes the 60 No Component of deubiquitination some SAGA of both histones transcription H2A and H2B, coactivator-HAT thereby acting as complexes a coactivator. UBP36 Deubiquitinates 123 No histone H2B.

BAP1 Specifically 80 No Promiscuous Component of mediates deubiquitinase the PR-DUB deubiquitination complex of histone H2A monoubiquitinat ed at 'Lys-119'. PARP1 Mediates 113 Yes Widespread Component of a NAD+ poly(ADP- (pieceme involvement in DNA base excision ribosyl)ation of al) damage repair repair (BER) histone H1, Lys13 complex of H2A, Lys30 of H2B, Lys27 and Lys37 of H3, as well as Lys16 of H4. PARP3 Can mediate 60 Yes Involved in DNA NAD+ poly(ADP- (pieceme damage repair ribosyl)ation of al) histone H1 in vitro.

173

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

E1-E3 SUMOylate 34- Yes Promiscuous Operate as a histones H1-H4. 74 SUMOylators cascade (E1 recruits E2 recruits E3, although E2 apparently sufficient in vitro). PADI4 PADI4 specifically 74 Yes May deiminate Ca2+ deiminates other proteins (citrullinates) arginine residues R2, R8, R17, and R26 in the H3 tail. Can also target H1 Arg54 and H4 Arg3. PARP2 Can mediate 66 Partial Involved in DNA Component of a NAD+ poly(ADP- damage repair base excision ribosyl)ation of repair (BER) histone H1 in complex vitro. PARG Mediates de- 111 Partial Involved in DNA poly(ADP- damage repair ribosyl)ation of histones. SIRT4 Functions as an 35 No NAD-dependent NAD+, efficient ADP- protein lipoamidase, Zn2+ ribosyltransferas ADP-ribosyl e on histones transferase and (specifics deacetylase unclear).

174

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure birA Mediates the 81 No Acetyl CoA covalent binding of biotin to lysine (K) residues in histones. K4, K9 and K18 in histone H3 are targets for biotinylation. K9 and K13 in the N- terminus of human histones H2A and H2AX are targets for biotinylation and that K125, K127 and K129 in the C-terminus of histone H2A are targets for biotinylation. K8, K12 and K16 in histone H4 are targets for biotinylation.

175

Name Epigenetic MW Structure Other Functions Quaternary Cofactors Function (kDa) Available structure

BTD Mediates the 61 No Acetyl CoA covalent binding of biotin to lysine (K) residues in histones. K4, K9 and K18 in histone H3 are targets for biotinylation. K9 and K13 in the N- terminus of human histones H2A and H2AX are targets for biotinylation and that K125, K127 and K129 in the C-terminus of histone H2A are targets for biotinylation. K8, K12 and K16 in histone H4 are targets for biotinylation. In addition, biotinidase may catalyze debiotinylation of histones.

176

Appendix 4: SETD7 Amino Acid Conservation Input Sequences

From ConSurf conservation calculation server [189].

> 3cbm | Input_pdb_SEQRES_A

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL MSTEEGRPHFELMPGNSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_A0A0F8BWR4_111_306 | Histone-lysine N- methyltransferase SETD7 n=2 Tax=Larimichthys crocea RepID=A0A0F8BWR4_LARCR

KDNNRCGECWVYYPDGGCVFGEVNEDGEMTGKSVAYIYPDRRTALCGSFVDGELIEARLGTL ISNEGGRPRFEITPNSPVYSYDKSTSTCIATHTLLPDP

YENQKVFVADSMIKGAGQGLFAKMDAETDTVMAFYNGVRITHSEVDSRDWALNGNTISLDED TVIDVPQPFDQMERYCASLGHKANHSFTPNCKYD

> UniRef90_A0A0N8K018_101_352 | Histone-lysine N- methyltransferase SETD7-like n=1 Tax=Scleropages formosus RepID=A0A0N8K018_9TELE

KDNNRCGVCWIFYPDGGSVVGEVNEEGEMTGNKIAYVYPDHRTALYGSFVEGELIEARLAVL SSEENGRPHFDVVPDRPVYAYDRSTSTCIATHTLLPDP

YESQRVYVAESLISGAGEGLFAKIDAEPNTVMAFYNGVRITHTEVDSRDWSLNGNTISLDED TVIDVPEPFNDIKKYCASLGHKANHSFTPNCIYDPFVH

PRFGAIKCVRTTRAVQQDEELTVAYGYDHTPSGKSGPEAPDWYKQELRAFQA

177

> UniRef90_Q8WTS6_111_366 | Histone-lysine N-methyltransferase SETD7 n=60 Tax=Eutheria RepID=SETD7_HUMAN

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL MSTEEGRPHFELMPGNSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_UPI00072E693E_221_476 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=2 Tax=Laurasiatheria RepID=UPI00072E693E

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL MSTEEGRPHFELMPGSSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_A0A0Q3M6U7_122_377 | Histone-lysine N- methyltransferase SETD7 n=13 Tax=Archelosauria RepID=A0A0Q3M6U7_AMAAE

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATV TSLEDGKPQFEVVPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEANTVMSFYNGVRITHQEVDSRDWSLNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQKK

178

> UniRef90_F6QW03_112_367 | Histone-lysine N-methyltransferase SETD7 n=1 Tax=Ornithorhynchus anatinus RepID=F6QW03_ORNAN

RDNIRHGICWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDGRTTVYGKFIDGELIEGKLATL TSTEEGRPHFELIPGSSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISTAGEGLFSKVAAGPSTVMSFYNGVRITHQEVDGRDWALNGNTLSLDDE TVIDVPEPYNHASKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVERDEELTVAYGYDHSPPGKGGPEAPEWYQVELKSFQATQQK

> UniRef90_UPI000750039D_156_411 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Gecko japonicus RepID=UPI000750039D

KDNIRHGICWIYYPDGGYLVGEVNEEGEMTGEKIAYVYPDGKTAYYGHFIDGEMLEAKLANL LSIEEGKPQFEVVPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDTSLISSAGEGLFSKIAAEAGITMSFYNGIRITHQEVDNRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCHYESYVH

PRFGLIKCIRTIRAIEKDEELTVAYGYDHNPSGQNGPEAPEWYRMELKAFQAAQEK

> UniRef90_Q7Z0G7_115_369 | Histone-lysine N-methyltransferase SETD7 n=1 Tax=Halocynthia roretzi RepID=SETD7_HALRO

EGVRCGLCFYYFPDGGSLIGNVNASGDLSADNIAYIYPDRTTALIGSFEEGDMITAKEANVT ITGEKGEEISFPTVNSISPDPVYRLDVSTPHVISTRPL

VPDPYESELVYAAPSKIPNAGEGLYAKCDVDQDTVMAFYNGVRLKQDEVENRDWSQNSNTIS LTDDIAIDVPEEYVSTDNYCASLGHKVNHSFDPNCRYD

IYQHPRFGFIKCVRTIRGVSEGDELTVHYTYEHNDGNKTREAEAPEWYKSQLKVF

179

> UniRef90_UPI000742B3F1_112_366 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Cyprinodon variegatus RepID=UPI000742B3F1

KDNIRCGECWIYYPDGGCVFGQVNEEGEMTGNAIAYIYPDGRTALYGSFVDGEIIEARLASL ASHQNQRTRFQITASSPVYSYDKSTSTCIATHALLPDP

YESQWVFVAESLIKGAGEGLFAKADAEPGTVMAFYNGVRITHSEVDSRDWTLNGNTISLDDD TVIDVPRPFDQTDRYCASLGHKANHSFTPNCKYEPFVH

PRFGAIKCIRTLRAVQKDEELTVAYGYDHEPTGKNGPEAPDWYKEELELFQQRQQ

> UniRef90_M7BBN4_96_351 | Histone-lysine N-methyltransferase SETD7 (Fragment) n=4 Tax=Archelosauria RepID=M7BBN4_CHEMY

KDDNRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDEKTAYYGRFIDGEMIEAKLATL TSVEEGKPHFELIPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDASLISSAGEGLFSKIAAGSRTVMSFYNGVRITHQEVDSRDWALNGNTIALDDE TVIDVPDPYNHATKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPIGQNGPEAPEWYQVELKAFEAAQQK

> UniRef90_D6RJA0_111_308 | Histone-lysine N-methyltransferase SETD7 n=3 Tax=Hominoidea RepID=D6RJA0_HUMAN

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL MSTEEGRPHFELMPGNSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDIG

> UniRef90_UPI0004442D23_139_394 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Erinaceus europaeus RepID=UPI0004442D23

KDNIRHGVCWIYYPDGGSLVGDVNEDGEMTGEKIAYVYPDGRTALLGRFIDGEMMEGRLATL VSTEEGRPHFELVPGGSVYHFDKSTSSCISSNALLPDL

YESERVYVAESLISGAGEGLFSKITVGANTVMSFYNGVRITHQEVDGRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDLFMH

PRFGPIKCIRTLRAVEADEELTVAYGYDHNPPGKSGPEAPEWYQLELRAFQTAQQK

180

> UniRef90_UPI00038F07D4_109_364 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Chinchilla lanigera RepID=UPI00038F07D4

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGTTALYGKFMDGEMIEGKLATL LSIEEGRPHFELVPGSSVYHFDKSTSSCISTTALLPDP

YESERVYVAESLISSAGEGLFSKVAMEPNTVTSFYNGVRITHQEVDNRDWALNGNTLSLDED TVIDVPEPYNRVSKYCASLGHKANHSFTPNCIYDTFVH

PRFGPIKCIRTLRAVEAHEELTVAYGYDHSPRGKNGLEAPEWYQLELKAFQASQQK

> UniRef90_UPI00035975CB_147_402 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Ficedula albicollis RepID=UPI00035975CB

RDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCTYEPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQQK

> UniRef90_UPI00048F2F8C_2_180 | PREDICTED: histone-lysine N- methyltransferase SETD7-like, partial n=1 Tax=Oryctolagus cuniculus RepID=UPI00048F2F8C

GSVYHFDKSTSSCISSNALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTAMSFYNGVR ITHQEVDSRDWALNGNTLSLDEETVIDVPEPYNHVSKF

CASLGHKANHSFTPNCVYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEA PEWYQVELKAFQATQQK

181

> UniRef90_UPI0004979369_111_365 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Cynoglossus semilaevis RepID=UPI0004979369

KDNSRCGECWVYHPDGGCLFGEVNDDGEMTGDSVAYVYPDGRTSLFGTFVDGELIQARLAVL TSSHNGRPRFEVSPDSPLYSYDKSTSTCVATHVLLPDP

YESDRVFVADSTIKGAGQGLFAKIDADKETVMAFYNGVRITHTEVDGRDWSLNGNTISLDED TVIDVPAPYCQTDRYCASLAHKANHSFSPNCRYELFVH

PRFGSIKCMRTLRAVQKDEELTVAYGYDHVVMGTNCPDAPDWYRRELEAFQQQQR

> UniRef90_UPI00076271AC_145_400 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=4 Tax=Euarchontoglires RepID=UPI00076271AC

KDNIRHGVCWIYYPDGGSLVGDVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL MATEEGKPHFELMPGSSVYHFDKSTSSCISTNALLPDP

YESDRVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDED TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_L5LPZ5_184_439 | Histone-lysine N-methyltransferase SETD7 n=1 Tax=Myotis davidii RepID=L5LPZ5_MYODS

KDNIRHGLCWIYFPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMLEGKLAAL VSTEEGRPHFDLMPGSSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

182

> UniRef90_L9L1V3_99_354 | Histone-lysine N-methyltransferase SETD7 n=3 Tax=Boreoeutheria RepID=L9L1V3_TUPCH

KDNLRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL TSTEEGRPHFELMPGSSVYHFDKSTSSCISADALLPDP

YESERVYVAESLISTAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDNRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCAYDMFVH

PRFGSIKCIRTLQAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_UPI0002C3400C_111_255 | PREDICTED: histone-lysine N- methyltransferase SETD7-like, partial n=1 Tax=Tursiops truncatus RepID=UPI0002C3400C

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL LSTEEGRPHFELMPGSSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEV

> UniRef90_UPI000577FDD2_110_364 | PREDICTED: histone-lysine N- methyltransferase SETD7 isoform X1 n=2 Tax=Esox lucius RepID=UPI000577FDD2

KDNSRCGECWIFYPDRGSVFGQVNEEGEMTGNSLAYVYPDGQTAFYGSFVEGELIEARLATL VATENGRPHFDVSPDSPVYTYDKSTSTCIATHTLLPDP

YESKRVFVADSLIPGAGEGLFAKMDVEANSVMAFYNGVRITHSEVDSRDWSLNGNTISLDED TVIDVPPPFHQTDRYCASLGHKANHAFTPNCKYDPFVH

PRFGPIKCIRTIQAVQRNEELTVAYGYDHEPSGRTGPEAPDWYKRELQVFQERQA

> UniRef90_F4QDD1_176_404 | Histone-lysine N-methyltransferase n=1 Tax=Dictyostelium fasciculatum (strain SH3) RepID=F4QDD1_DICFS

KNGERNGPGIIYLVDGGKIHGIWKSSQMDGQCTYYYPDHRFSIKGKWKDGDFVSGRAIAPKG VPPIDDDVVEIKRDEATDTLISSDPLQADPYEQHYVFV

APSTIPNSNEGLFAKRDIPANRLVSFYNGLRIDPKIADERDWSFNSNTMFLDKETYIDIPPS HSLTTQYCASLGHKANHSFQHNTVYKTCYHPRFGDIKC

VTSIKPIAKGEEILVHYDYEEKDKPAWYK

183

> UniRef90_UPI0004ED3B1E_222_477 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Nannospalax galili RepID=UPI0004ED3B1E

KDNIRHGVCWIHFPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATV MATEEGRPHFELTSGSSVYHFDKSTSSCISTDALLPDP

YESERVYVADSLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEAEEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_Q4QQN5_111_366 | Histone-lysine N-methyltransferase SETD7 n=2 Tax=Xenopus tropicalis RepID=SETD7_XENTR

KDNVRHGVCWIYYPDGGSLVGEVNEDGDMTGDKVAYVYPDGRMALYGKFIDAEMLEGKLAIL TSVDEGKPHFELVPNGPVYNFDKSTPSCISVNPLFPDP

YESERVYVNDSLIHNAGEGLFAKVASAAQTVMSFYNGVRITHQEVDSREWALNGNTISLDDE TVLDVPAPYNSYYKYCASLGHKANHSFSPNCMYDTFVH

PRFGPIKCIRTMKAVEKDEELTVAYGYDHSVTGKNGPEAPEWYQQQLTAFQATQQK

> UniRef90_UPI0004C292A7_232_487 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Calypte anna RepID=UPI0004C292A7

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDGKTAYLGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPVYTFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPAGQNGPEAPEWYQLELKAFQAAQQK

184

> UniRef90_UPI000520F2A7_147_402 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Cariama cristata RepID=UPI000520F2A7

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEANTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQQK

> UniRef90_D3BL39_142_374 | Histone-lysine N-methyltransferase n=1 Tax=Polysphondylium pallidum RepID=D3BL39_POLPA

RHGPGILYNIDGSRIEGTWKNSILCGECCFYYPDDRFFIKGTWKKGEFVKGTVSVSDSKALD LKSLVSTTQLKRDESDSKKISSNPMLPDAFEQYYCYVK

KSEIPNSGEGLFAAVDIPADRLVSYYNGLRIDPKLADDRDWSFNSNTMYLDKETYIDIPPEW SSTSKYCASLGHKANHSKNNNTVYRPCYHPRFGDIKCI

RSIKPIAKDQEILVDYDYEEKDKPQWYIDLEKQ

> UniRef90_UPI0004F1E0B7_111_313 | PREDICTED: histone-lysine N- methyltransferase SETD7 isoform X2 n=2 Tax=Cercopithecinae RepID=UPI0004F1E0B7

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL MSTEEGRPHFELLPGNSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMVEN

RVF

185

> UniRef90_UPI0003F19B55_111_307 | PREDICTED: histone-lysine N- methyltransferase SETD7 isoform X2 n=1 Tax=Felis catus RepID=UPI0003F19B55

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL MSTEEGRPHFELMPGSSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDM

> UniRef90_Q6DHG0_111_364 | Histone-lysine N-methyltransferase SETD7 n=3 Tax=Danio rerio RepID=SETD7_DANRE

KDNIRYGMCWVYYPDGACVVGEVNEEGEMTGKAVAYVYPDGRTALYGSFVDGELIEARLATL TTQENGRPHFTVDPDSPIYCYDKSTSSCIAGHKLLPDP

YESQRVYVGQSLISGAGEGLFAQTEAEANTVMAFYNGVRITHTEVDSRDWSMNGNTISLDED TVIDVPAPFNMTENYCGSLGHKANHSFSPNCKYDQYVH

PRFGQIKCIRTIRAVEKDEELTVAYGYDHEPSGKSGPEAPEWYTKQFLEFQQRE

> UniRef90_UPI0003C141BF_207_451 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Latimeria chalumnae RepID=UPI0003C141BF

WPYLFADRGSLVGEVNEEGELTGEKAAYLYPDGKTALYGKFIDAEMIETRLATLTSVSEGRP HFELAPSSHVYSFDKSTPSCISTTALVPDPYESERVCV

AESLILSAGEGLFAKVAAGPNTVMSFYNGVRITHQEVDNRDWALNGNTISLDEETVLDVPAP YNDTKKYCASLGHKANHSFTPNCIYDTFVHPRFGSIKC

IRTIGAVEKDEELTVAYGYDHSPPGKSGPEAPEWYQQDLKNHQAN

186

> UniRef90_UPI0003833690_156_411 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Melopsittacus undulatus RepID=UPI0003833690

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSLEDGKPQFEVVPGSPIYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRITHQEVDSRDWSLNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQKK

> UniRef90_L5L349_125_358 | Histone-lysine N-methyltransferase SETD7 n=1 Tax=Pteropus alecto RepID=L5L349_PTEAL

KDNIRHGVCWIYYPDGGRLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLASL MSTEEGRPHFELMPGSSVYHFDKSTSSCISTHALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPSTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEANEELTVAYGYDHSPPGK

> UniRef90_C1BUR3_139_382 | Histone-lysine N-methyltransferase SETD7 n=3 Tax=Lepeophtheirus salmonis RepID=C1BUR3_LEPSM

HGQSLESLVGGGFLLNLDHSNSEFSGDNIVYLYPDCQTAFVGTFDQSSMVSAQANEVINTRN NCCGLNYPIFELLNDVYYSQDISSLDKISKEPMLADPY

ESFFVFVKESKISGAGQGLFAKISVASGTVLAFYNGIRKPSVSLSESSFEDTPYKISLSKQT DMDIPQGCISLDKYTSSLAHKVCHSFTPNCDFDKYDHP

RFGVILSVVSLREILEGEELTVDYKYDLNVAPSWYKKAWAKHQK

> UniRef90_H0W0J3_111_366 | Histone-lysine N-methyltransferase SETD7 n=18 Tax=Mammalia RepID=H0W0J3_CAVPO

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDGRMALYGKFIDGEMIEAKLATL LSIEEGRPQFELVPGSPAYHFDKSTSSCISTSALLPDP

YESERVYVAESLISSAGEGLFSKAATEPNVVMSFYNGVRITHQEVDNRDWALNGNTLSLDED TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVDADEELTVAYGYDHSPRGKSGPEAPEWYQLELKAFQASQQK 187

> UniRef90_UPI000522A3A3_178_433 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=2 Tax=Neognathae RepID=UPI000522A3A3

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQQK

> UniRef90_UPI0006B0BECE_159_414 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Apteryx australis mantelli RepID=UPI0006B0BECE

KDNIRHGVCWIYYLDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKTAAEARTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPMGQNGPEAPEWYQLELKAFQAAQQK

> UniRef90_UPI000387126A_177_432 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Falco peregrinus RepID=UPI000387126A

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQQK

188

> UniRef90_W5PU61_111_366 | Histone-lysine N-methyltransferase SETD7 n=9 Tax=Boreoeutheria RepID=W5PU61_SHEEP

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL LSTEEGRPHFELMPGSSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAAGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_UPI0006447987_111_363 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Clupea harengus RepID=UPI0006447987

KDNIRFGLCWFYYSDRGCVVGEVNDEGEMTGNHVAYVYPDGQTALLGSFVDGELIEARLASL MLQENGRPHLELVDDRTVYSFDKSTSTCIANFRLLPDP

YESQHVFVGQSLISGAGEGLFAKTDADPDSVMAFYNGVRITHSEVDSRDWSANGNTISLDEE TVIDVPKPFNQTETYCASLGHKANHSFSPNCKYDPFVH

PRFGLIKSIRTIRAVQKDEELTVAYGYDHKPSGKSELPAPDWYTKELREFQER

> UniRef90_UPI0007042390_221_476 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Myotis brandtii RepID=UPI0007042390

KDNIRHGLCWIYFPDGGSLVGEVNEDGEMTGEKIAYVYPDKRTALYGKFIDGEMLEGKLAAL VSTEEGRPHFDLMPGSSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCTYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

189

> UniRef90_UPI00064F9BE8_71_312 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Echinops telfairi RepID=UPI00064F9BE8

DGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGRFIDGEMIEGKLATLVSTEEGRPHFEVVP GGSLYHFDKSTSSCISTNALLPDAYESERVYVAESLIS

SAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEETVIDVPEPYNHVSK YCASLGHKANHSFTPNCTYDMFVHPRFGPIKCIRTLHA

VEAEEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATKQQ

> UniRef90_A0A091DZP9_120_375 | Histone-lysine N- methyltransferase SETD7 n=4 Tax=Bathyergidae RepID=A0A091DZP9_FUKDA

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYMYPDGKTALYGKFIDGEMTEGKLATL MATEEGRPQFELVPGSPVYHFDKSTSSCISTSALLPDP

YEAERVYVSESLIPNAGEGLFSKVAMEPNTVVSFYNGVRITHQEVDNRDWALNGNTLSLDED TVIDVPEPYNHVSKYCASLGHKANHCFTPNCIYDMFVH

PRFGPIKSIRTLRAVEADEELTVAYGYDHSPPGKNAPEAPEWYLLELKAFQAAQQK

> UniRef90_UPI00051BA3B1_150_405 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Fulmarus glacialis RepID=UPI00051BA3B1

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPIYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQQK

190

> UniRef90_V8NZA7_114_369 | Histone-lysine N-methyltransferase SETD7 (Fragment) n=5 Tax=Serpentes RepID=V8NZA7_OPHHA

KDNIRHGFCWIYYSDGGYLVGQVNEEGEMTGEKLAYVYPDGKMAYYGHFIDGEMIEARLANV ILAEEEKLQFDVIPGSPVYSFDKSTSSCISTSPLLPDP

YESERVYVDASLISNAGEGLFTKIAAEAGTVLSFYNGIRITHQEVDSRDWSLNGNTISLDDE TVIDVPEPYSHVTKYCASLGHKANHSFTPNAVYDSFVH

PRFGLIKCIRTIQAVEKDEELTVAYGYDHSSTGQNGPEAPEWYLAELRAFQASLKK

> UniRef90_UPI000331684C_122_377 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Sorex araneus RepID=UPI000331684C

KDNIRHGVCWVYYPDGGSLVGEVNEDGEMTGEKIAYIYPDGRTALYGKFIDGEMMEGQLATV LATEEGRPLFELVPGSAVYHFDKSTSSCISANALLPDP

YESERVYVADSLISSAGEGLFAKVAVGPSTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPFNHVTEYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQAAQQK

> UniRef90_UPI00064301DF_356_611 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Condylura cristata RepID=UPI00064301DF

KDNIRHGVCWTYFPDGGSLVGDVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL TSSEEGRPRFELVPAGPVFHFDKSTSSCIATNALLPDP

YEAERVYVAESLISSAGEGLFSKVAAGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYRHLSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEAEEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQAAQQK

191

> UniRef90_UPI000515166C_160_415 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=4 Tax=Neognathae RepID=UPI000515166C

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGTFIDGEMIEAKLAML TSVEDGKPXXXXXXXGPVYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQQK

> UniRef90_UPI0004540F68_222_477 | PREDICTED: histone-lysine N- methyltransferase SETD7 isoform X1 n=2 Tax=Boreoeutheria RepID=UPI0004540F68

KDNIRHGVCWIHYPDGGSLVGEVNEDGEMTGEKIAYVYPDQRTALYGKFIDGEMIEGKLATL MATEEGRPHFELTSGSSVYHFDKSTSSCISSDALLPDP

YESERVYVADSLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLQAVEAEEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_UPI0004F00E31_1_178 | PREDICTED: histone-lysine N- methyltransferase SETD7-like, partial n=1 Tax=Merops nubicus RepID=UPI0004F00E31

PIYSFDKSTSSCISTNALLPDPYESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRI THQEVDGRDWALNGNTISLDDETVIDVPEPYNHAAKYC

ASLGHKANHSFTPNCTYDPFVHPRFGPIKCIRTLRAVEKDEELTVAYGYDHSPAGQSGPEAP EWYQLELKAFQAAQQK

> UniRef90_V9KF86_111_364 | Histone-lysine N-methyltransferase SETD7 n=2 Tax=Callorhinchus milii RepID=V9KF86_CALMI

RDNIRCGVCWIYFLDRGCLVGQVNEDGDMTGDKIAYVYPDSSTAMLGKFVDGEMIEGRITTL KNCVDGRPQFEQVAAGPVYSLDKSTSYCISTSSLRPDS

YEEERVYVADSFIPNAGEGLFAKVAAECDTVMAFYNGVRITHEEVDSRAWSLNGNTLSLDDD TVLDVPVPYNSTKHYCASLGHKANHSFTPNCMYAPFIH

PRFGAIKCIQTIRAVERDEELTVAYGYDHCSGGKSGPEAPDWYKRALEAFQTIQ

192

> UniRef90_UPI00076769A9_94_349 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Myotis davidii RepID=UPI00076769A9

KDNIRHGLCWIYFPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMLEGKLAAL VSTEEGRPHFDLMPGSSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVH

PRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKSGPEAPEWYQVELKAFQATQQK

> UniRef90_UPI00072DE0C5_113_365 | PREDICTED: histone-lysine N- methyltransferase SETD7-like n=1 Tax=Poecilia mexicana RepID=UPI00072DE0C5

KDNIRCGECWIYYPDGGCVFGQVNEDGEMTGSSIAYIYPDGCTALYGSFVDGEIIEAHLATL TSNRTGRPRFQIAPNSPVYSYDKSTSTCIATHTLLPDP

YESQWVFVAESLIKGAGQGLFAKAVAGPGTVMAFYNGVRITHSEVDSRDWALNGNTISLDDD TVIDVPQPFDQTERYCASLGHKANHSFTPNCQYEPFLH

PRFGPIKCIRTLRAVQKDEELTVAYGYDHESLGKNGLEAPDWYKHELEVFQQR

> UniRef90_F1P558_111_366 | Histone-lysine N-methyltransferase SETD7 n=68 Tax=Archelosauria RepID=F1P558_CHICK

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGRTAYSGKFIDGEMIEAKLATL TSLEDGKPQFEVVPGSPVYTFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKVAAEARTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQQK

> UniRef90_V9L5K1_14_163 | Histone-lysine N-methyltransferase SETD7 n=1 Tax=Callorhinchus milii RepID=V9L5K1_CALMI

RVYVADSFIPNAGEGLFAKVAAECDTVMAFYNGVRITHEEVDSRAWSLNGNTLSLDDDTVLD VPVPYNSTKHYCASLGHKANHSFTPNCMYAPFIHPRFG

AIKCIQTIRAVERDEELTVAYGYDHCSGGKSGPEAPDWYKRALEAFQTIQ

193

> UniRef90_UPI0006B3BA83_123_372 | PREDICTED: histone-lysine N- methyltransferase SETD7 isoform X1 n=2 Tax=Austrofundulus limnaeus RepID=UPI0006B3BA83

KDDVRYGECWVYHPDGGCVFGEVNKDGEMTGDSIAYVYPDGCTALFGTFVDGNLIRARLATL SFNQTGKPFFEISPNSPVYSYDKSTSTCITSQALLPDP

YESQRVFVAESLIRGAGQGLFAKTDVGPETVMAFYNGVRITHSEVDGRDWALNGNTISLDEE TVIDVPQPFDQMEKYCASLGHKANHSFSPNCRYEPFVH

PRFGPIKCIRTSRAVQRDEELTVLYGYDHESTGGSGPEAPDWYKKELEQL

> UniRef90_UPI00045DA83B_111_308 | PREDICTED: histone-lysine N- methyltransferase SETD7 isoform X1 n=7 Tax=Eutheria RepID=UPI00045DA83B

KDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATL MSTEEGRPHFELMPGNSVYHFDKSTSSCISTNALLPDP

YESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEE TVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDIL

> UniRef90_A0A093IG36_101_296 | Histone-lysine N- methyltransferase SETD7 (Fragment) n=3 Tax=Eurypyga helias RepID=A0A093IG36_EURHL

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPAYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEARTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYD

> UniRef90_A0A0S7GWC6_34_254 | SETD7 (Fragment) n=2 Tax=Poeciliopsis prolifica RepID=A0A0S7GWC6_9TELE

KDNIRCGECWIYYPDGGSVFGKVNEDGEMTGSSIAYIYPDGCTALYGSFVDGEIIEARLATL TSNQTGRPRFQIAPNSPVFSYDKSTSTCITTHALLPDP

YESQRVLVAESLIKGAGQGLFAKAAAEPGTVMAFYNGVRITHSEVDSRDWTLNGNTISLDDD TVIDVPQPFDKTERYCASLGHKANHSFTPNCQYEPFVH

PRFGSIKCIRTLRAVQKDEEL

194

> UniRef90_UPI0006618503_79_229 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Mesocricetus auratus RepID=UPI0006618503

VYVADSLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGNTLSLDEETVIDV PEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRFGP

IKCIRTLQAVEAEEELTVAYGYDHSPPGKSGPEAPEWYQLELKAFQATQQK

> UniRef90_T2M8V6_23_275 | Histone-lysine N-methyltransferase SETD7 n=2 Tax=Hydra vulgaris RepID=T2M8V6_HYDVU

NNMRIGVCQFFLEFGGTLFGRVNKFGKLSGNNIVYAYPDNKTFLKGQFEDGELVCAYPAVYS GVGDEFNPFSYKTIKGKKITKDVSTSFVISKSPLDEDI

FESSRVEVKQSSNKNAGEGLFSCVHAQPGEVMSFYDGVRLTHKEVDERDWLYNDNTISLDQE VVIDVPKPWNDLKFYKASLGHKANHSFKPNCMYDRFNH

PRFGKIKCIRTISFVQPGDELFVEYGYDHLKLDTDAPLWYQKALVEHLNNKDK

> UniRef90_UPI0006B7A276_135_389 | PREDICTED: histone-lysine N- methyltransferase SETD7 isoform X1 n=1 Tax=Salmo salar RepID=UPI0006B7A276

KDNSRCGECWIYYPDRGSVFGQVNEDGEMTGKAVAYVYPDGQTALYGSFVEGELIEARLAAL VSMETARPHFDVSSDSPVYSYDKSTSTCIATHTLLPDP

YESKRVFVADSLISGAGEGLFAKMDVEANVVMAFYNGVRITHSEVDTRDWSLNGNTISLDED TVIDVPQPFHQMDRYCASLGHKANHSFTPNCKYDPFVH

PRFGPIKCVRTLRAVQRDEELTVAYGYDHEPSGKSGPEAPDWYKQELQAFQQRQA

> UniRef90_UPI000644A2A9_145_398 | PREDICTED: histone-lysine N- methyltransferase SETD7 n=1 Tax=Fundulus heteroclitus RepID=UPI000644A2A9

KDNIRSGQCWTYYPDGGCVFGQVNEDGEMTGSAIAYIYPDGCTALYGSFVDGEIIEARLATL ISNLTERPRFQIAPNSPVYSYDKSTSTCIATHALLPDP

YESQRVFVADSLIKGAGQGLFAKADAEPGTVMAFYNGVRITHSEVDSRDWALNGNTISLDDD IVIDVPQPFDQMERYCATLGHKANHSFSPNCQYEPYLH

PRFGPIKCIRTLRAVQKDEELTVAYGYDHEPTGKNGPEAPDWYKQELKVFQQKQ

195

> UniRef90_UPI0006709E8A_159_414 | PREDICTED: histone-lysine N- methyltransferase SETD7 isoform X1 n=2 Tax=Neognathae RepID=UPI0006709E8A

KDNIRHGVCWIYYPDGGSLVGEVNEEGEMTGEKIAYVYPDGKTAYSGRFIDGEMIEAKLATL TSVEDGKPQFEVVPGSPVYSFDKSTSSCISTNALLPDP

YESERVYVDVSLISSAGEGLFSKIAAEASTVMSFYNGVRITHQEVDSRDWALNGNTISLDDE TVIDVPEPYNHAAKYCASLGHKANHSFTPNCIYDPFVH

PRFGPIKCIRTIRAVEKDEELTVAYGYDHNPVGQNGPEAPEWYQLELKAFQAAQQK

Amino Acid Conservation in SETD7 residues 111–366

POSITION CONSERVATION RESIDUE 111 94.545 K 112 94.737 D 113 92.982 N 114 78.947 I 115 100 R 116 72.881 H 117 100 G 118 64.407 V 119 94.915 C 120 91.667 W 121 78.333 I 122 90 Y 123 76.667 Y 124 85 P 125 96.721 D 126 91.803 G 127 96.721 G 128 73.77 S 129 78.689 L 130 77.049 V 131 98.361 G 132 76.271 E 133 95.082 V 134 95.082 N 135 86.885 E 136 55.738 D 137 95.082 G 138 90.164 E 139 91.803 M 140 91.803 T 141 98.361 G 196

POSITION CONSERVATION RESIDUE 142 70.492 E 143 76.271 K 144 81.967 I 145 93.443 A 146 98.361 Y 147 78.689 V 148 100 Y 149 100 P 150 100 D 151 54.098 E 152 52.459 R 153 91.803 T 154 91.803 A 155 63.934 L 156 60.656 Y 157 100 G 158 45.902 K 159 96.721 F 160 68.852 I 161 90.164 D 162 95.082 G 163 93.443 E 164 73.77 M 165 80.328 I 166 88.525 E 167 52.459 G 168 66.667 K 169 93.22 L 170 94.915 A 171 72.131 T 172 85.246 L 173 39.344 M 174 73.77 S 175 40.984 T 176 77.049 E 177 50.82 E 178 85.246 G 179 60.656 R 180 93.443 P 181 47.541 H 182 91.667 F 183 73.333 E 184 48.333 L 185 40 M 186 81.667 P 187 63.934 G

197

POSITION CONSERVATION RESIDUE 188 69.355 N 189 53.968 S 190 80.952 V 191 92.063 Y 192 42.857 H 193 73.016 F 194 100 D 195 90.476 K 196 98.413 S 197 96.825 T 198 92.063 S 199 73.016 S 200 92.063 C 201 98.413 I 202 79.365 S 203 79.365 T 204 58.73 N 205 74.603 A 206 96.825 L 207 90.476 L 208 95.238 P 209 100 D 210 92.063 P 211 96.825 Y 212 100 E 213 90.476 S 214 73.016 E 215 87.5 R 216 98.462 V 217 78.462 Y 218 98.462 V 219 63.077 A 220 46.154 E 221 100 S 222 86.154 L 223 98.462 I 224 73.846 S 225 61.538 S 226 96.923 A 227 98.462 G 228 89.231 E 229 100 G 230 100 L 231 98.462 F 232 66.154 S 233 95.385 K

198

POSITION CONSERVATION RESIDUE 234 49.231 V 235 73.846 A 236 50.769 V 237 44.615 G 238 53.846 P 239 49.231 N 240 87.692 T 241 93.846 V 242 90.769 M 243 75.385 S 244 98.462 F 245 100 Y 246 98.462 N 247 100 G 248 92.308 V 249 100 R 250 95.385 I 251 93.846 T 252 93.846 H 253 70.769 Q 254 95.385 s 255 95.385 V 256 96.875 D 257 73.438 S 258 98.438 R 259 93.75 D 260 98.438 W 261 71.875 A 262 89.062 L 263 98.438 N 264 92.188 G 265 98.438 N 266 98.438 T 267 53.125 L 268 95.312 S 269 100 L 270 96.875 D 271 53.125 E 272 71.875 E 273 95.312 T 274 93.75 V 275 92.188 I 276 100 D 277 95.312 V 278 100 P 279 68.75 E

199

POSITION CONSERVATION RESIDUE 280 93.75 P 281 75 Y 282 76.562 N 283 65.625 H 284 39.062 V 285 40.625 S 286 75 K 287 98.438 Y 288 96.875 C 289 96.875 A 290 98.438 S 291 100 L 292 96.875 G 293 100 H 294 100 K 295 96.875 A 296 98.438 N 297 100 H 298 96.875 S 299 98.438 F 300 84.375 T 301 96.875 P 302 100 N 303 95.312 C 304 57.812 I 305 98.438 Y 306 81.25 D 307 43.548 M 308 83.607 F 309 81.356 V 310 98.305 H 311 98.305 P 312 98.305 R 313 100 F 314 100 G 315 70.69 P 316 100 I 317 98.276 K 318 94.828 C 319 89.655 I 320 93.103 R 321 94.828 T 322 50 L 323 81.034 R 324 91.379 A 325 93.103 V

200

POSITION CONSERVATION RESIDUE 326 72.414 E 327 46.552 A 328 79.31 D 329 94.828 E 330 100 E 331 96.552 L 332 94.737 T 333 100 V 334 89.474 A 335 100 Y 336 92.982 G 337 100 Y 338 94.737 D 339 94.737 H 340 43.86 S 341 84.906 P 342 39.623 P 343 98.113 G 344 64.815 K 345 50.943 S 346 90.566 G 347 90.566 P 348 89.286 E 349 96.429 A 350 100 P 351 73.214 E 352 100 W 353 100 Y 354 66.071 Q 355 32.727 V 356 83.636 E 357 94.545 L 358 70.909 K 359 76.364 A 360 92.593 F 361 96.154 Q 362 75 A 363 38 T 364 91.489 Q 365 81.395 Q 366 97.436 K

201

Appendix 5: RT-qPCR Raw Data

Sample Target [cDNA] Well Name Name (ng/well) Cт A1 gRNA+ GAPDH 10 17.15463 A2 gRNA+ GAPDH 10 17.12275 A3 gRNA+ GAPDH 1 20.53868 A4 gRNA+ GAPDH 1 20.42053 A5 gRNA+ HER2 10 25.86711 A6 gRNA+ HER2 10 25.56148 A7 gRNA+ HER2 1 28.8951 A8 gRNA+ HER2 1 28.88535 B1 gRNA- GAPDH 10 16.73316 B2 gRNA- GAPDH 10 16.86747 B3 gRNA- GAPDH 1 20.21679 B4 gRNA- GAPDH 1 20.49099 B5 gRNA- HER2 10 24.79367 B6 gRNA- HER2 10 24.88177 B7 gRNA- HER2 1 28.72453 B8 gRNA- HER2 1 28.2741 H1 GAPDH 30.87064 H2 GAPDH 32.91101 H3 HER2 Undetermined H4 HER2 Undetermined

Sample Target Well Name Name Cт HER2 A9 HCT- test 1 ChIP 6.5412 HER2 A10 HCT- test 1 ChIP 20.97166 HER2 A11 HCT- test 10 ChIP 5.82635 HER2 A12 HCT- test 10 ChIP 28.05731 HEK- cDNA B1 10 GAPDH 17.32409 HEK- cDNA B2 10 GAPDH 17.29725 HEK- cDNA B3 1 GAPDH 20.74824 HEK- cDNA B4 1 GAPDH 20.73316 HEK- cDNA B5 10 HER2 25.86726 HEK- cDNA B6 10 HER2 25.8134 HEK- cDNA B7 1 HER2 29.27646 HEK- cDNA B8 1 HER2 29.07692 HER2 B9 HCT- neg 1 ChIP Undetermined HER2 B10 HCT- neg 1 ChIP 32.07861 202

Sample Target Well Name Name Cт HER2 B11 HCT- neg 10 ChIP Undetermined HER2 B12 HCT- neg 10 ChIP Undetermined HCT+ cDNA C1 10 GAPDH 15.89098 HCT+ cDNA C2 10 GAPDH 15.81816 HCT+ cDNA C3 1 GAPDH 19.23214 HCT+ cDNA C4 1 GAPDH 19.26681 HCT+ cDNA C5 10 HER2 25.69774 HCT+ cDNA C6 10 HER2 25.55099 HCT+ cDNA C7 1 HER2 29.04362 HCT+ cDNA C8 1 HER2 29.11283 HER2 C9 HCT- inp 50 ChIP Undetermined HER2 C10 HCT- inp 50 ChIP Undetermined HER2 C11 HCT- inp 200 ChIP Undetermined HER2 C12 HCT- inp 200 ChIP 17.15126 HCT- cDNA D1 10 GAPDH 17.22356 HCT- cDNA D2 10 GAPDH 17.05024 HCT- cDNA D3 1 GAPDH 20.32386 HCT- cDNA D4 1 GAPDH 20.44607 HCT- cDNA D5 10 HER2 26.93594 HCT- cDNA D6 10 HER2 26.76702 HCT- cDNA D7 1 HER2 29.81319 HCT- cDNA D8 1 HER2 30.83492 HEK+ cDNA D9 10 GAPDH 16.10332 HEK+ cDNA D10 10 GAPDH 16.10012 HEK+ cDNA D11 1 GAPDH 19.50046 HEK+ cDNA D12 1 GAPDH 19.59052 HER2 E1 HEK+ test 1 ChIP 31.15554 HER2 E2 HEK+ test 1 ChIP Undetermined HER2 E3 HEK+ test 10 ChIP Undetermined HER2 E4 HEK+ test 10 ChIP Undetermined 203

Sample Target Well Name Name Cт HER2 E5 HEK+ neg 1 ChIP Undetermined HER2 E6 HEK+ neg 1 ChIP Undetermined HER2 E7 HEK+ neg 10 ChIP Undetermined HER2 E8 HEK+ neg 10 ChIP 25.74249 HER2 E9 HEK+ inp 50 ChIP Undetermined HER2 E10 HEK+ inp 50 ChIP Undetermined HEK+ inp HER2 E11 200 ChIP Undetermined HEK+ inp HER2 E12 200 ChIP Undetermined HER2 F1 HEK- test 1 ChIP 28.56589 HER2 F2 HEK- test 1 ChIP 37.18177 HER2 F3 HEK- test 10 ChIP Undetermined HER2 F4 HEK- test 10 ChIP Undetermined HER2 F5 HEK- neg 1 ChIP 31.04003 HER2 F6 HEK- neg 1 ChIP Undetermined HER2 F7 HEK- neg 10 ChIP Undetermined HER2 F8 HEK- neg 10 ChIP Undetermined HER2 F9 HEK- inp 50 ChIP Undetermined HER2 F10 HEK- inp 50 ChIP 2.364403 HER2 F11 HEK- inp 200 ChIP Undetermined HER2 F12 HEK- inp 200 ChIP Undetermined HER2 G1 HCT+ test 1 ChIP 37.46823 HER2 G2 HCT+ test 1 ChIP Undetermined HER2 G3 HCT+ test 10 ChIP Undetermined HER2 G4 HCT+ test 10 ChIP Undetermined HER2 G5 HCT+ neg 1 ChIP Undetermined HER2 G6 HCT+ neg 1 ChIP Undetermined HER2 G7 HCT+ neg 10 ChIP Undetermined HER2 G8 HCT+ neg 10 ChIP Undetermined HER2 G9 HCT+ inp 50 ChIP 6.189681 HER2 G10 HCT+ inp 50 ChIP Undetermined 204

Sample Target Well Name Name Cт HCT+ inp HER2 G11 200 ChIP Undetermined HCT+ inp HER2 G12 200 ChIP Undetermined H1 GAPDH 32.92854 H2 GAPDH 32.79939 H3 HER2 Undetermined H4 HER2 Undetermined HER2 H5 ChIP Undetermined HER2 H6 ChIP Undetermined HEK+ cDNA H9 10 HER2 24.63138 HEK+ cDNA H10 10 HER2 24.89892 HEK+ cDNA H11 1 HER2 27.97154 HEK+ cDNA H12 1 HER2 27.93824

Sample Target Well Name Name Cт HEK+ A1 10 GAPDH AB 19.96872 HEK+ A2 10 GAPDH AB 19.91849 A3 HEK+ 1 GAPDH AB 22.20814 A4 HEK+ 1 GAPDH AB 21.85974 HEK+ A5 10 HER2 AB 28.63656 HEK+ A6 10 HER2 AB 28.96462 A7 HEK+ 1 HER2 AB 31.39765 A8 HEK+ 1 HER2 AB 30.98463 HEK- B1 10 GAPDH AB 18.72945 HEK- B2 10 GAPDH AB 18.70156 B3 HEK- 1 GAPDH AB 20.75333 B4 HEK- 1 GAPDH AB 20.73571 HEK- B5 10 HER2 AB 26.94201 HEK- B6 10 HER2 AB 26.90973 B7 HEK- 1 HER2 AB 29.36572 B8 HEK- 1 HER2 AB 29.41043 HEK+ GAPDH C1 10 KAPA Undetermined HEK+ GAPDH C2 10 KAPA Undetermined GAPDH C3 HEK+ 1 KAPA Undetermined GAPDH C4 HEK+ 1 KAPA Undetermined HEK+ HER2 C5 10 KAPA 25.49422 205

Sample Target Well Name Name Cт HEK+ HER2 C6 10 KAPA 34.42944 HER2 C7 HEK+ 1 KAPA 35.19018 HER2 C8 HEK+ 1 KAPA 35.84998 HEK- GAPDH D1 10 KAPA Undetermined HEK- GAPDH D2 10 KAPA 26.07579 GAPDH D3 HEK- 1 KAPA 19.56365 GAPDH D4 HEK- 1 KAPA Undetermined HEK- HER2 D5 10 KAPA Undetermined HEK- HER2 D6 10 KAPA 33.5104 HER2 D7 HEK- 1 KAPA 28.04697 HER2 D8 HEK- 1 KAPA 26.96381 H1 GAPDH AB 36.32415 H2 GAPDH AB 36.92701 H3 HER2 AB Undetermined H4 HER2 AB Undetermined GAPDH H5 KAPA 3.282403 GAPDH H6 KAPA 25.48301 HER2 H7 KAPA Undetermined HER2 H8 KAPA 35.70887

Sample Target Well Name Name Cт HCT+ A1 10 GAPDH 18.47278 HCT+ A2 10 GAPDH 18.34026 HCT+ A3 10 GAPDH 18.35153 A4 HCT+ 1 GAPDH 21.65191 A5 HCT+ 1 GAPDH 21.66627 A6 HCT+ 1 GAPDH 21.62945 HCT+ A7 10 HER2 28.33216 HCT+ A8 10 HER2 27.99955 HCT+ A9 10 HER2 28.08228 A10 HCT+ 1 HER2 31.75819 A11 HCT+ 1 HER2 31.29036 A12 HCT+ 1 HER2 31.74468 HCT- B1 10 GAPDH 17.68668 206

Sample Target Well Name Name Cт HCT- B2 10 GAPDH 17.48548 HCT- B3 10 GAPDH 17.53284 B4 HCT- 1 GAPDH 20.97848 B5 HCT- 1 GAPDH 20.89527 B6 HCT- 1 GAPDH 20.75316 HCT- B7 10 HER2 28.95399 HCT- B8 10 HER2 28.81199 HCT- B9 10 HER2 29.08758 B10 HCT- 1 HER2 31.83997 B11 HCT- 1 HER2 31.81804 B12 HCT- 1 HER2 32.36398 H1 GAPDH Undetermined H2 GAPDH 36.93541 H3 GAPDH 36.07714 H4 HER2 Undetermined H5 HER2 Undetermined H6 HER2 Undetermined

Sample Target Well Name Name Cт A1 FOG 10 GAPDH 19.93255 A2 FOG 10 GAPDH 19.90695 A3 FOG 10 GAPDH 19.91635 A4 FOG 1 GAPDH 22.98154 A5 FOG 1 GAPDH 22.7289 A6 FOG 1 GAPDH 22.90081 A7 FOG 10 HER2 28.94305 A8 FOG 10 HER2 28.94852 A9 FOG 10 HER2 28.86633 A10 FOG 1 HER2 32.35649 A11 FOG 1 HER2 31.96643 A12 FOG 1 HER2 31.95098 dCas9 B1 10 GAPDH 20.70872 dCas9 B2 10 GAPDH 20.59914 dCas9 B3 10 GAPDH 20.62915 dCas9 B4 1 GAPDH 23.44968 dCas9 B5 1 GAPDH 23.52812 dCas9 B6 1 GAPDH 23.48722 dCas9 B7 10 HER2 29.61219 dCas9 B8 10 HER2 29.72698 dCas9 B9 10 HER2 29.73588

207

Sample Target Well Name Name Cт dCas9 B10 1 HER2 32.57811 dCas9 B11 1 HER2 33.00906 dCas9 B12 1 HER2 32.59638 H1 GAPDH Undetermined H2 GAPDH Undetermined H3 GAPDH Undetermined H4 HER2 Undetermined H5 HER2 Undetermined H6 HER2 Undetermined

Sample Target Well Name Name Cт HEK +dCas9 A1 10 GAPDH 19.65804 HEK +dCas9 A2 10 GAPDH 19.89943 HEK +dCas9 A3 10 GAPDH 19.86567 HEK +dCas9 A4 1 GAPDH 22.80045 HEK +dCas9 A5 1 GAPDH 22.65173 HEK +dCas9 A6 1 GAPDH 22.95618 HEK +dCas9 A7 10 HER2 29.93776 HEK +dCas9 A8 10 HER2 29.83393 HEK +dCas9 A9 10 HER2 29.58704 HEK +dCas9 A10 1 HER2 32.44281 HEK +dCas9 A11 1 HER2 32.41531 HEK +dCas9 A12 1 HER2 32.0005 HEK +Krab B1 10 GAPDH 20.23612 HEK +Krab B2 10 GAPDH 19.83853 HEK +Krab B3 10 GAPDH 19.92649 B4 HEK +Krab 1 GAPDH 23.14502 B5 HEK +Krab 1 GAPDH 23.09205 B6 HEK +Krab 1 GAPDH 23.21341 HEK +Krab B7 10 HER2 30.30386 HEK +Krab B8 10 HER2 29.62184 HEK +Krab B9 10 HER2 29.65485 B10 HEK +Krab 1 HER2 32.26014 B11 HEK +Krab 1 HER2 32.22478 B12 HEK +Krab 1 HER2 32.80209 208

Sample Target Well Name Name Cт HEK +SetD7 C1 10 GAPDH 19.36534 HEK +SetD7 C2 10 GAPDH 19.43728 HEK +SetD7 C3 10 GAPDH 19.10461 HEK +SetD7 C4 1 GAPDH 22.4011 HEK +SetD7 C5 1 GAPDH 22.74661 HEK +SetD7 C6 1 GAPDH 22.32604 HEK +SetD7 C7 10 HER2 28.84906 HEK +SetD7 C8 10 HER2 28.6375 HEK +SetD7 C9 10 HER2 28.76201 HEK +SetD7 C10 1 HER2 31.42645 HEK +SetD7 C11 1 HER2 31.00355 HEK +SetD7 C12 1 HER2 31.89551 H1 GAPDH Undetermined H2 GAPDH Undetermined H3 GAPDH Undetermined H4 HER2 Undetermined H5 HER2 Undetermined H6 HER2 Undetermined

Sample Target Well Name Name Cт A1 C91 10 GAPDH 18.46493 A2 C91 10 GAPDH 18.89107 A3 C91 1 GAPDH 22.10714 A4 C91 1 GAPDH 22.04339 A5 C91 10 HER2 27.39512 A6 C91 10 HER2 27.80569 A7 C91 1 HER2 30.80193 A8 C91 1 HER2 31.25275 A9 KM2 10 GAPDH 20.35018 A10 KM2 10 GAPDH 20.27353 A11 KM2 1 GAPDH 23.80341 A12 KM2 1 GAPDH 23.93736 B1 C92 10 GAPDH 19.34746 B2 C92 10 GAPDH 19.49875 B3 C92 1 GAPDH 22.63283 B4 C92 1 GAPDH 22.79169 B5 C92 10 HER2 27.93878 B6 C92 10 HER2 27.9507 B7 C92 1 HER2 31.38062 B8 C92 1 HER2 31.20826 B9 KM2 10 HER2 29.09884 B10 KM2 10 HER2 29.05691 209

Sample Target Well Name Name Cт B11 KM2 1 HER2 31.95424 B12 KM2 1 HER2 32.31493 C1 S71 10 GAPDH 18.94429 C2 S71 10 GAPDH 18.94733 C3 S71 1 GAPDH 22.20325 C4 S71 1 GAPDH 22.36698 C5 S71 10 HER2 27.08088 C6 S71 10 HER2 27.26914 C7 S71 1 HER2 30.52855 C8 S71 1 HER2 30.32523 C9 KR1 10 GAPDH 18.16032 C10 KR1 10 GAPDH 17.98482 C11 KR1 1 GAPDH 21.58878 C12 KR1 1 GAPDH 21.65695 D1 S72 10 GAPDH 20.01425 D2 S72 10 GAPDH 19.78396 D3 S72 1 GAPDH 23.19998 D4 S72 1 GAPDH 23.30109 D5 S72 10 HER2 27.92927 D6 S72 10 HER2 27.97462 D7 S72 1 HER2 31.03514 D8 S72 1 HER2 31.51834 D9 KR1 10 HER2 26.89144 D10 KR1 10 HER2 26.98761 D11 KR1 1 HER2 30.05789 D12 KR1 1 HER2 30.06483 E1 K81 10 GAPDH 21.53384 E2 K81 10 GAPDH 21.61404 E3 K81 1 GAPDH 25.06396 E4 K81 1 GAPDH 25.07679 E5 K81 10 HER2 30.14295 E6 K81 10 HER2 29.98026 E7 K81 1 HER2 32.77169 E8 K81 1 HER2 32.95222 E9 KR2 10 GAPDH 18.45557 E10 KR2 10 GAPDH 18.94177 E11 KR2 1 GAPDH 21.92922 E12 KR2 1 GAPDH 22.01899 F1 K82 10 GAPDH 19.49299 F2 K82 10 GAPDH 19.2081 F3 K82 1 GAPDH 22.59275 F4 K82 1 GAPDH 22.82567 F5 K82 10 HER2 27.37317 F6 K82 10 HER2 27.27465 F7 K82 1 HER2 30.23916 F8 K82 1 HER2 30.90642 F9 KR2 10 HER2 26.48456 F10 KR2 10 HER2 26.39687 F11 KR2 1 HER2 29.88406 F12 KR2 1 HER2 30.17667 G1 KM1 10 GAPDH 19.12286 G2 KM1 10 GAPDH 19.18801 G3 KM1 1 GAPDH 22.62043

210

Sample Target Well Name Name Cт G4 KM1 1 GAPDH 22.75558 G5 KM1 10 HER2 27.44 G6 KM1 10 HER2 27.43405 G7 KM1 1 HER2 30.52262 G8 KM1 1 HER2 30.68612 H1 GAPDH Undetermined H2 GAPDH 36.99996 H3 GAPDH 33.40628 H4 HER2 Undetermined H5 HER2 Undetermined H6 HER2 Undetermined

Sample Target Well Name Name Cт A1 0h 10 GAPDH 19.09132 A2 0h 10 GAPDH 19.09974 A3 0h 1 GAPDH 21.87118 A4 0h 1 GAPDH 21.57773 A5 0h 10 HER2 24.0805 A6 0h 10 HER2 24.59439 A7 0h 1 HER2 27.30027 A8 0h 1 HER2 26.98023 B1 24h 10 GAPDH 17.92301 B2 24h 10 GAPDH 17.94898 B3 24h 1 GAPDH 20.34513 B4 24h 1 GAPDH 20.11532 B5 24h 10 HER2 22.73875 B6 24h 10 HER2 22.76588 B7 24h 1 HER2 25.32012 B8 24h 1 HER2 24.79204 C1 48h 10 GAPDH 18.21392 C2 48h 10 GAPDH 18.16208 C3 48h 1 GAPDH 20.56312 C4 48h 1 GAPDH 20.56874 C5 48h 10 HER2 23.88656 C6 48h 10 HER2 24.00213 C7 48h 1 HER2 26.84398 C8 48h 1 HER2 26.47644 D1 96h 10 GAPDH 30.89695 D2 96h 10 GAPDH 30.32147 D3 96h 1 GAPDH 32.89086 D4 96h 1 GAPDH 33.86087 D5 96h 10 HER2 Undetermined D6 96h 10 HER2 Undetermined D7 96h 1 HER2 7.092692 D8 96h 1 HER2 Undetermined H1 GAPDH Undetermined H2 GAPDH Undetermined H3 HER2 Undetermined H4 HER2 Undetermined

211

Sample Target Well Name Name Cт A1 24h+ 10 GAPDH 15.56789 A2 24h+ 10 GAPDH 15.85778 A3 24h+ 1 GAPDH 17.49605 A4 24h+ 1 GAPDH 17.45278 A5 24h+ 10 HER2 24.10344 A6 24h+ 10 HER2 23.24878 A7 24h+ 1 HER2 27.11721 A8 24h+ 1 HER2 26.82557 B1 24h- 10 GAPDH 15.27911 B2 24h- 10 GAPDH 14.86583 B3 24h- 1 GAPDH 18.64978 B4 24h- 1 GAPDH 4.050127 B5 24h- 10 HER2 22.58533 B6 24h- 10 HER2 22.42939 B7 24h- 1 HER2 26.38303 B8 24h- 1 HER2 26.34651 C1 48h+ 10 GAPDH 16.44925 C2 48h+ 10 GAPDH 16.77301 C3 48h+ 1 GAPDH 19.57243 C4 48h+ 1 GAPDH 19.59246 C5 48h+ 10 HER2 24.69424 C6 48h+ 10 HER2 23.90881 C7 48h+ 1 HER2 28.25904 C8 48h+ 1 HER2 27.9601 D1 48h- 10 GAPDH 36.85152 D2 48h- 10 GAPDH 33.80841 D3 48h- 1 GAPDH 25.22044 D4 48h- 1 GAPDH 25.24286 D5 48h- 10 HER2 33.46581 D6 48h- 10 HER2 33.2667 D7 48h- 1 HER2 35.53619 D8 48h- 1 HER2 34.90565 E1 72h+ 10 GAPDH 16.91566 E2 72h+ 10 GAPDH 16.61006 E3 72h+ 1 GAPDH 20.35581 E4 72h+ 1 GAPDH 20.73332 E5 72h+ 10 HER2 25.39036 E6 72h+ 10 HER2 25.13108 E7 72h+ 1 HER2 29.0176 E8 72h+ 1 HER2 28.91655 F1 72h- 10 GAPDH 16.41868 F2 72h- 10 GAPDH 16.84601 F3 72h- 1 GAPDH 20.15549 F4 72h- 1 GAPDH 19.94229 F5 72h- 10 HER2 25.0363 F6 72h- 10 HER2 25.08369 F7 72h- 1 HER2 28.8151 F8 72h- 1 HER2 28.89662 G1 96h+ 10 GAPDH 25.79078 G2 96h+ 10 GAPDH 25.72327 G3 96h+ 1 GAPDH 29.98712 G4 96h+ 1 GAPDH 29.74488 G5 96h+ 10 HER2 Undetermined

212

Sample Target Well Name Name Cт G6 96h+ 10 HER2 34.56622 G7 96h+ 1 HER2 35.20038 G8 96h+ 1 HER2 4.700068 H1 96h- 10 GAPDH 27.03381 H2 96h- 10 GAPDH 26.59876 H3 96h- 1 GAPDH 30.78033 H4 96h- 1 GAPDH 31.05445 H5 96h- 10 HER2 Undetermined H6 96h- 10 HER2 Undetermined H7 96h- 1 HER2 4.623785 H8 96h- 1 HER2 4.172688 H9 GAPDH Undetermined H10 GAPDH 39.14236 H11 HER2 Undetermined H12 HER2 Undetermined

213

Appendix 6: FOG dCas9 Verification Sequencing

>F5R_G04

TTCTTGTC-TTGGCCGGCCGCTGCCGCCTGAGCCACCAGAACCGCCGCTTCCACCACTCC

CTCCGGTACTAGGGCTCGGGGCTTCAGGTGCCGTGGCCTTTTGCTCCATGTGGCTGGCAC

CCACCAACTGCACCTCCTCTCTGGCCTCCATGTCTCCGAGGGAACGCTTGATCTGCCGGG

GGTTGCTCTGTTTCCGCCTGGACATGGTACCGGATCCACCGCCGGACCCGCCACCCTTAT

CGTCATCGTCTTTGTAATCAATATCATGATCCTTGTAGTCTCCGTCGTGGTCCTTATAGT

CGCTTCCTCCGGAGCCTCCCACCTTCCTCTTCTTCTTGGGCATGGTGGCAAGGGTTCGAT

CCTCTAGAGTCCGGAGGCTGGATCGGTCCCGGTGTCTTCTATGGAGGTCAAAACAGCGTG

GATGGCGTCTCCAGGCGATCTGACGGTTCACTAAACGAGCTCTGCTTATATAGACCTCCC

ACCGTACACGCCTACCGCCCATTTGCGTCAATGGGGCGGAGTTGTTACGACATTTTGGAA

AGTCCCGTTGATTTTGGTGCCAAAACAAACTCCCATTGACGTCAATGGGGTGGAGACTTG

GAAATCCCCGTGAGTCAAACCGCTATCCACGCCCATTGATGTACTGCCAAAACCGCATCA

CCATGGTAATAGCGATGACTAATACGTAGATGTACTGCCAAGTAGGAAAGTCCCATAAGG

TCATGTACTGGGCATAATGCCAGGCGGGCCATTTACCGTCATTGACGTCAATAGGGGGCG

TACTTGGCATATGATACACTTGATGTACTGCCAAGTGGGCAGTTTACCGTAAATACTCCA

CCCATTGACGTCAATGGAAAGTCCCTATTGGCGTTACTATGGGAACATACGTCATTATTG

ACGTCAATGGGCGGGGGTCGTTGGGCGGTCAGCCAGGCGGGCCATTTACCGTAAGTTATG

TAACGCGGAACTCCATATATGGGCTATGAACTAATGACCCCGTAATTGATTACTATTAAT

AACTAGTCAATAATCAATGTCAACGCGTATATCTGGCCCGTACATCGCGAAGCAGCGCAA

AACGCCTAACCCTAAGCAGATTCTTCATGCAATTGTCGGTCAAGCCTTGCCTTGTTGTAA

CTTAAATTTTG

214