Molecular Functions of Multi-SUMO- Binding Complexes

A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Biology, Medicine and Health

2019

Rotem Salmi-Leshem

Division of Molecular and Cellular Function

Table of contents

Table of contents ...... 2

List of figures ...... 6

List of tables ...... 7

List of supplementary data ...... 7

Abbreviations ...... 8

Abstract...... 11

Declaration ...... 12

Copyright statement ...... 12

Acknowledgments ...... 13

1 Introduction ...... 14 1.1 Chromatin and regulation ...... 14 1.2 Transcription factors and coregulators ...... 19 1.3 SUMO - Small -like modifier ...... 21 1.3.1 Introduction to SUMO ...... 21 1.3.2 Genome wide SUMOylation ...... 22 1.3.3 SUMO Structure and functionality: ...... 22 1.3.4 Reversible SUMOylation ...... 25 1.3.5 SUMO consensus binding sites ...... 27 1.3.6 SUMO interacting motifs (SIMs) ...... 27 1.3.7 Poly-SUMO chains ...... 28 1.3.8 Multi-SUMOylation ...... 29 1.3.9 Consequences of SUMOylation ...... 31 1.4 The Five Friends of Methylated CHTOP ...... 32 1.4.1 The 5FMC complex is recruited to multi-SUMOylated targets ...... 32 1.4.2 The 5FMC complex members ...... 33 1.4.3 5FMC complex members involvement in other complexes ...... 34 1.4.4 5FMC complex and transcription regulation ...... 35 1.5 Aims ...... 38 1.5.1 Aim 1: Identify and characterise recruited to different multi-SUMOylated targets...... 38 1.5.2 Aim 2: How does the 5FMC complex affect transcription? ...... 39 1.5.3 Aim 3: What is the interplay between the 5FMC complex and SUMO on chromatin? ...... 39 2 Materials and methods ...... 40 2.1 Bacterial cloning methods ...... 40 2.1.1 Bacterial transformation ...... 40 2.1.2 Glycerol stock preparation ...... 40 2.1.3 Plasmid DNA purification ...... 42 2.1.4 Restriction digests ...... 42

2

2.1.5 Ligation reactions ...... 42 2.1.6 Plasmid DNA mutagenesis ...... 42 2.1.7 Plasmid sequencing ...... 44 2.2 Protein expression methods ...... 44 2.2.1 Bacterial protein expression ...... 44 2.2.2 Bacterial protein purification ...... 45 2.2.3 In-Vitro protein production ...... 45 2.3 Mammalian cell extraction methods ...... 45 2.3.1 Total cell protein lysates...... 45 2.3.2 Nuclear extracts ...... 45 2.3.3 DNA extraction ...... 46 2.3.4 RNA extraction ...... 47 2.4 GST pull down assays ...... 47 2.5 Mass-spectrometry ...... 47 2.6 Immunoblotting ...... 48 2.7 Polymerase chain reaction (PCR) ...... 50 2.7.1 Amplification of DNA fragments for cloning ...... 50 2.7.2 Whole plasmid PCR for site directed mutagenesis...... 50 2.7.3 Colony PCR ...... 51 2.7.4 qPCR ...... 51 2.7.5 RT-qPCR ...... 52 2.8 Immunoprecipitation ...... 54 2.9 Chromatin immunoprecipitation ...... 55 2.9.1 Sonication efficiency test ...... 55 2.9.2 SUMO2/3 ChIP with protein A beads ...... 55 2.10 RNA-seq ...... 56 2.11 Mammalian cell culture methods ...... 56 2.11.1 Sub-culturing cell lines ...... 56 2.11.2 Freezing and thawing cells ...... 57 2.11.3 Cell transfection ...... 57 2.11.4 EGF stimulation of cells ...... 59 2.12 Clustered, regularly interspaced, short palindromic repeats (CRISPR) ...... 59 2.12.1 Construct and homology arm design ...... 59 2.12.2 Cell culture and nucleofection ...... 62 2.12.3 Homologous recombination validation ...... 63 2.13 Data analysis ...... 64 2.13.1 SUMOylation and SUMO binding site prediction ...... 64 2.13.2 Protein networks analysis ...... 64 2.13.3 Immunoblot band quantification ...... 64 2.13.4 qPCR analysis ...... 64 2.13.5 Statistical analysis ...... 65 2.13.6 Sequencing data analysis ...... 65 2.14 Supplementary information ...... 67

3

3 Identification and verification of 5FMC complex as a multi-SUMO binding complex ...... 68 3.1 Introduction ...... 68 3.2 Results ...... 68 3.2.1 Design of SUMO and multi-SUMO traps ...... 68 3.2.2 SUMO paralog-specific binding ...... 70 3.2.3 SUMO trap system stability ...... 71 3.2.4 Identifying multi-SUMO bound proteins ...... 73 3.2.5 5FMC complex binds to multi-SUMOylated targets ...... 79 3.2.6 5FMC behaves as a complex ...... 81 3.2.7 All 5FMC complex members are necessary for binding to multi-SUMO ...... 82 3.2.8 Determination of which sub-units provide multi-SUMO binding activity ...... 84 3.2.9 Investigating interactions between multi-SUMO3 and in vitro 5FMC complex components 86 3.2.10 Endogenous tagging of the 5FMC complex ...... 88 3.3 Discussion ...... 90 3.3.1 Binding of the 5FMC complex to multi-SUMO ...... 90 3.3.2 Few proteins bind to multi-SUMO1 ...... 91 3.3.3 Several potential SIM-independent binding proteins could be detected using mass spectrometry ...... 92 3.3.4 Summary, limitations and conclusions ...... 92 4 The effects of 5FMC complex on transcription ...... 95 4.1 Introduction ...... 95 4.2 Results ...... 96 4.2.1 Choice of cell lines ...... 96 4.2.2 Establishing depletion conditions for the 5FMC complex ...... 98 4.2.3 5FMC complex member knockdown affects gene expression ...... 100 4.2.4 Each 5FMC complex member knockdown has a different effect on global gene expression ...... 105 4.3 Discussion ...... 113 4.3.1 5FMC complex members knockdown ...... 113 4.3.2 Summary, Limitations and Conclusions ...... 113 5 Connection between the 5FMC complex and the SUMOylation profile across the genome ...... 115 5.1 Introduction ...... 115 5.2 Results ...... 115 5.2.1 SENP3 knockdown affects SUMO2/3 binding at known sites ...... 115 5.2.2 SENP3 knockdown has little effect on SUMO2/3 presence on chromatin ...... 118 5.2.3 SUMO2/3 dynamic chromatin binding ...... 124 5.3 Discussion ...... 132 5.3.1 Knockdown of SENP3 does not affect SUMO2/3 binding to chromatin ...... 132 5.3.2 SUMO2/3 recruitment to chromatin is dynamic ...... 132 5.3.3 Summary, limitations and conclusions ...... 133

4

6 General Discussion ...... 135 6.1 Overview ...... 135 6.2 5FMC is recruited to multi-SUMOylated targets only when all complex members are present ...... 137 6.3 5FMC complex is possibly comprised of two sub-complexes ...... 139 6.4 Dynamic SUMO2/3 binding to chromatin ...... 140 6.5 Perspectives ...... 141 Word count: 32,889

5

List of figures FIGURE 1.1‎ - AN OVERVIEW OF POSSIBLE PTMS ON HISTONE TAILS...... 15

FIGURE 1.2‎ – SUMO PARALOGS...... 23

FIGURE 1.3‎ - THE SUMO CYCLE...... 26

FIGURE 1.4‎ - SUMO CONJUGATED TARGETS AND THEIR INTERACTIONS WITH SIM AND MULTI-SIM CONTAINING READERS. 30

FIGURE 1.5‎ - 5FMC COMPLEX RECRUITMENT TO CHROMATIN...... 36

FIGURE 3.1‎ - SUMO TRAP DESIGN...... 69

FIGURE 3.2‎ - SUMO PARALOG-SPECIFIC BINDING...... 70

FIGURE 3.3‎ - SUMO-TRAP CONSISTENCY...... 72

FIGURE 3.4‎ - IDENTIFICATION OF MULTI-SUMO BOUND PROTEINS BY MASS SPECTROMETRY...... 77

FIGURE 3.5‎ - MULTI-SUMOYLATION OF THE 5FMC COMPLEX...... 80

FIGURE 3.6‎ – CO-IMMUNOPRECIPITATION OF 5FMC COMPLEX...... 81

FIGURE 3.7‎ – 5FMC COMPLEX BINDING TO MULTI-SUMO TRAPS...... 83

FIGURE 3.8‎ - WDR18 RECRUITMENT TO 5FMC COMPLEX IS NOT SUMO MEDIATED...... 85

FIGURE 3.9‎ – MULTI-SUMO BINDING OF IN VITRO PRODUCED 5FMC COMPLEX COMPONENTS...... 87

FIGURE 3.10‎ – OVERVIEW OF CRISPR EXPERIMENT...... 89

FIGURE 4.1‎ - ANALYSIS OF 5FMC AND SUMO3 GENE EXPRESSION IN OESOPHAGEAL RNA-SEQ SAMPLES...... 97

FIGURE 4.2‎ - SIRNA KNOCKS DOWN 5FMC COMPLEX MEMBERS EFFICIENCY...... 99

FIGURE 4.3‎ - WORK FLOW OF RNA-SEQ DATA ANALYSIS...... 101

FIGURE 4.4‎ – RNA-SEQ SAMPLE DISTRIBUTION...... 102

FIGURE 4.5‎ – VARIABLE KNOCKDOWN PATTERN PRODUCED BY EACH MEMBER OF THE 5FMC COMPLEX...... 104

FIGURE 4.6‎ – COMPARATIVE GO TERM ANALYSIS OF CHANGES IN GENE EXPRESSION FOLLOWING 5FMC COMPLEX MEMBERS’

KNOCKDOWNS...... 106

FIGURE 4.7‎ - COMPARISON OF GENE EXPRESSION DATASETS...... 109

FIGURE 4.8‎ - ANALYSIS OF GENE EXPRESSION CHANGES COMMON TO PELP1, SENP3 AND WDR18 KNOCKDOWNS...... 112

FIGURE 5.1‎ - SUMO2/3 CHIP VERIFICATION...... 117

FIGURE 5.2‎ - WORK FLOW OF CHIP-SEQ DATA ANALYSIS...... 119

FIGURE 5.3‎ - SENP3 KNOCKDOWN HAS NO EFFECT ON CHROMATIN RECRUITMENT OF SUMO2/3...... 121

FIGURE 5.4‎ – CHIP-SEQ TRACKS AT THE SAME LOCATIONS AS THE VERIFICATION PRIMERS...... 122

FIGURE 5.5‎ - SENP3 KNOCKDOWN HAS NO EFFECT ON CHROMATIN RECRUITMENT OF SUMO2/3...... 123

FIGURE 5.6‎ – GENOME WIDE IDENTIFICATION OF SUMO2/3 BINDING TO CHROMATIN...... 126

FIGURE 5.7‎ – GENOME WIDE SUMOYLATION ANALYSIS ACROSS AN EGF TIME COURSE...... 129

FIGURE 5.8‎ - ANALYSIS OF SUMOYLATION DYNAMICS ACROSS AN EGF TIME COURSE...... 130

FIGURE 5.9‎ – SUMO2/3 DYNAMIC BINDING TO CHROMATIN...... 131

FIGURE 6.1‎ – SUMO RECRUITMENT TO CHROMATIN INVOLVEMENT IN GENE EXPRESSION...... 138

6

List of tables

TABLE 1.1‎ - HISTONE MODIFICATIONS AND THEIR FUNCTIONS ...... 16

TABLE 2.1‎ – PLASMID CONSTRUCTS FOR MAMMALIAN EXPRESSION...... 41

TABLE 2.2‎ – PLASMID CONSTRUCTS FOR PROTEIN EXPRESSION IN BACTERIA...... 41

TABLE 2.3‎ – OLIGONUCLEOTIDES - PRIMERS FOR WDR18 MUTAGENESIS...... 43

TABLE 2.4‎ - OLIGONUCLEOTIDES - SEQUENCING PRIMERS ...... 44

TABLE 2.5‎ – ANTIBODIES FOR IMMUNOBLOT DETECTION AND IMMUNOPRECIPITATION...... 49

TABLE 2.6‎ – PCR REACTION CONDITIONS: AMPLIFICATION OF DNA FRAGMENTS FOR CLONING PURPOSES ...... 50

TABLE 2.7‎ – PCR REACTION CONDITIONS: WHOLE PLASMID PCR FOR SITE DIRECTED MUTAGENESIS...... 50

TABLE 2.8‎ – PCR REACTION CONDITIONS: COLONY PCR ...... 51

TABLE 2.9‎ – OLIGONUCLEOTIDES – PRIMERS FOR CHIP-QPCR ...... 52

TABLE 2.10‎ – PCR REACTION CONDITIONS: QPCR ...... 52

TABLE 2.11‎ – OLIGONUCLEOTIDES – PRIMERS FOR RT-QPCR ...... 53

TABLE 2.12‎ – PCR REACTION CONDITIONS: RT-QPCR ...... 53

TABLE 2.13‎ – CELL LINES AND CULTURING CONDITIONS ...... 56

TABLE 2.14‎ – SIRNAS USED FOR KNOCKDOWN ...... 58

TABLE 2.15‎ – GBLOCK DESIGN...... 60

TABLE 2.16‎ – CRISPR GUIDES...... 62

TABLE 2.17‎ – CAS9 GUIDE RNA PLASMIDS...... 62

TABLE 2.18‎ – OLIGONUCLEOTIDES - PRIMERS FOR CRISPR HOMOLOGOUS RECOMBINATION VERIFICATION ...... 63

TABLE 3.1‎ – MULTI-SUMO3 AFFINITY PURIFICATION FOLLOWED BY MASS-SPECTROMETRY...... 75

TABLE 3.2‎ - PREDICTION OF SIMS AND SUMOYLATION SITES BY GPS-SUMO...... 78

TABLE 4.1‎ - SUMMARY OF 1.5 FOLD UP AND DOWN REGULATED FOLLOWING EACH OF THE 5FMC COMPLEX

KNOCKDOWNS AT FDR <0.05...... 105

TABLE 4.2‎ - GENES AFFECTED BY KNOCKDOWN OF AT LEAST FOUR MEMBERS OF 5FMC COMPLEX...... 110

TABLE 5.1‎ – SUMMARY OF PEAK DISTRIBUTION IN CHIP-SEQ EXPERIMENTS...... 125

List of supplementary data

S1 – SUMO3 PULL DOWNS FOLLOWED BY MASS SPECTROMETRY...... 67

S2 – DIFFERENTIAL EXPRESSION ANALYSIS OF RNA-SEQ OF OE19 5FMC COMPLEX KNOCKDOWNS...... 67

S3 – DIFFERENTIAL BINDING ANALYSIS OF OE19 SUMO2/3 CHIP-SEQ WITH SENP3 KNOCKDOWN...... 67

S4 - PEAK AND READ FILES...... 67

7

Abbreviations 5FMC 5 friends of methylated Chtop aa Amino acids Assay for Transposase-Accessible Chromatin with high-throughput ATAC-seq sequencing bp Base pairs BSA Bovine Serum Albumin ChIP Chromatin immunoprecipitation ChIP-seq Chromatin immunoprecipitation followed by high throughput sequencing CHTOP Chromatin target of PRMT1 COMP Cartilage oligomeric matrix protein CRISPR Clustered regularly interspaced short palindromic repeats DMEM Dulbecco's Modified Eagle Medium DNA Deoxyribonucleic acid dNTP Deoxynucleoside triphosphate EBI European Bioinformatics Institute EGF Epidermal growth factor ER Endoplasmic reticulum ESC Embryonic stem cells FBS Fetal bovine serum FDR False discovery rate FPKM Fragments per kilobase of transcript per million mapped reads FT Flow through GST Glutathione S-transferase HEK293T Human embryonic kidney cell line 293 HeLa Henrietta Lacks (cell line) hg19 assembly 19 IP Immunoprecipitation LAS Lethal in the absence of SSD1-v LAS1L LAS1 Like, ribosome biogenesis factor LB Lysogeny Broth

8

MACS2 Model-based Analysis of ChIP-Seq version 2 MCF10A Michigan cancer foundation-10A cell line MS Mass spectrometry MYOD1 Myogenic Differentiation 1 NE Nuclear extract NGS New generation sequencing NIH National institute of Health NT Non targeting OAC Oesophageal adenocarcinoma OE19 Oesophageal gastric carcinoma cell line PBS Phosphate buffered saline PCA Principal component analysis PCR Polymerase chain reaction PDSMs Phosphorylation-Dependent SUMO Motifs PELP1 Proline-, glutamic acid- and leucine-rich protein 1 PML Promyelocytic leukemia PTM Post translation modification QC Quality control qPCR Quantitative PCR RIP Reads in peaks RIPA Radioimmunoprecipitation assay RNA Ribonucleic acid RPKM Reads per kilobase of transcript per million mapped reads RPMI Roswell Park Memorial Institute rRNA Ribosomal RNA RT Room temperature RT-qPCR Reverse transcriptase quantitative PCR SD Standard deviation SDS Sodium Dodecyl Sulphate SDS-PAGE SDS-polyacrylamide gel electrophoresis SENP Sentrin-specific protease/SUMO specific proteases

9

SIM SUMO interacting motif siRNA Small interfering RNAs STRING Search tool for the retrieval of interacting genes/proteins SUMO Small ubiquitin-like modifier TCGA The cancer genome atlas TEX10 Testis expressed 10 TF Transcription factor TSS Transcription start site TTS Transcription termination site UTR Un-translated region WB Western blot (Immunoblot) WDR18 trp-asp (WD) repeat domain 18 WT Wild type Zbp-89 Zinc finger DNA binding protein 89, also known as ZNF-148 ZMYM2 Zinc finger MYM-Type containing 2 ZNF198 Zinc finger protein 148, ZMYM2

10

Abstract The University of Manchester

Rotem Salmi-Leshem

Molecular Functions of Multi-SUMO-Binding Protein Complexes

2019

Gene expression is regulated in many ways, one of which is SUMOylation of transcription factors and associated complexes. This covalent modification by small ubiquitin-like modifier (SUMO) enables recruitment of reader proteins through their SUMO-interacting motifs (SIMs), and multiple SIM-containing proteins prefer binding to multiple SUMOs, presented either in poly-SUMO chains or multiple SUMOs presented together by a multi-SUMOylated target. A protein complex has emerged as multi-SUMO binding, the five friends of methylated CHTOP (5FMC), previously described as being involved in transcription regulation, possibly through deSUMOylation and activation of transcription factors. The complex contains 5 core members – PELP1, TEX10, WDR18, LAS1L and SENP3.

In this study we aimed to find reader proteins that specifically bind to multi-SUMOylated targets, and further characterise them. We focused on the multi-SUMO recruitment of the 5FMC complex, its recruitment to chromatin, and the effects of the 5FMC complex on SUMO binding to chromatin and impact on gene expression.

We have confirmed that the 5FMC complex is recruited preferentially to multiple SUMOs, and that when complex members are missing, no such recruitment occurs in- vitro. However, individual complex member knockdowns did not uncover the effects of the whole complex on gene expression, and implied instead that active sub-complexes exist, possibly joining together as needed to assemble the 5FMC complex. We found no effect on SUMO binding to chromatin following SENP3 knockdown, possibly indicating high redundancy of the SUMO protease, but also implying the possibility that the 5FMC complex is not necessarily a deSUMOylating complex in that context. We further looked at the dynamics of SUMO recruitment to chromatin, and found complex dynamic binding patterns following EGF induction, further emphasising the importance of SUMO binding in gene regulation.

11

Declaration No portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

Copyright statement i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance Presentation of Theses Policy You are required to submit your thesis electronically Page 11 of 25 with licensing agreements which the University has from time to time. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trademarks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=24420 ), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s regulations (see http://www.library.manchester.ac.uk/about/regulations/ ) and in The University’s policy on Presentation of Theses.

12

Acknowledgments Undertaking this PhD was a challenging, life altering mission that would not have been possible without help from many people.

Firstly, I would like to express my sincere gratitude to my supervisor Prof Andy Sharrocks, for taking a chance on me when others wouldn’t and for his continual support, guidance, and occasional sarcastic remarks. Your patience was greatly appreciated.

I would also like to thank my advisor Dr Gino Poulin for the support, motivation and useful advice over the years.

To all of the Sharrocks lab team thank you for listening to my rants, for commiserating and providing perspective when I needed it most. A special thank you for Sam Carrera for her friendship and moral support, for Steph Macdonald for artistic insight on life and for Connor Rogerson who proved you can have my cake and eat it too.

I thank also Ian Donaldson of the Bioinformatics Core Facility for his endless patience and endurance in the face of my torrential questions and to Yaoyong Li for his help and support with bioinformatics analyses.

Lastly, to my husband for having patience when I lacked it and my two wonderful boys for believing I am the answer. You make me better.

13

1 Introduction Protein modification with Small Ubiquitin-like MOdifier (SUMO) is a major regulatory event that affects the activities of hundreds of proteins, and affects a diverse array of biological activities. SUMO conjugation/deconjugation regulates the dynamic assembly and disassembly of multi-subunit protein complexes, such as complexes of transcription regulation and DNA repair networks (Finkbeiner et al., 2011b). To better understand the role of SUMO in transcription regulation, we must begin by understanding chromatin structures and Post Translation Modifications (PTMs).

1.1 Chromatin and gene regulation Chromatin is a highly plastic structure, a DNA and protein scaffold responsive to various external signals that enables or blocks the use of DNA. In Eukaryotes, chromatin is divided into two main structures; the loosely packed, highly active euchromatin, and the conserved, more compacted domain of the heterochromatin (Bannister et al., 2011).

Histones are the major chromatin building blocks, modification and removal of histones will allow expression, replication, remodelling and repair of specific DNA segments at specific times (Bannister et al., 2011). DNA is packed into the nucleus in the form of nucleosomes – comprised of 8 histones (two each of H2A, H2B, H3 and H4 or their variants) in an octamer formation, acting as spools for the 146 base pairs of the DNA molecule, with histone H1 as a linker between nucleosomes, protecting the exposed DNA (Kornberg, 1974). Histones have been shown to undergo a wide variety of modifications, both covalent and non-covalent. Covalent modifications include but are not limited to acetylation, methylation, phosphorylation, ubiquitination and SUMOylation of different residues, mainly on the tail regions (Figure 1‎ .1). Each of these modifications will result in a different outcome (Bannister et al., 2011), and a combination – so called ‘Histone code’ - of several modulations on the same histone tail can control the way other proteins respond to the modification (Hodawadekar et al., 2007, Strahl et al., 2000). For example, acetylation of a lysine side chain on a histone by acetyl CoA cofactor will eliminate the lysines’ positive charge, weaken the DNA-histone connection, and thus render the DNA segment exposed for replication and gene activation (Strahl et al., 2000). Deacetylation is generally related to gene silencing and

14 repression (Turner, 2000). These modifications are dynamic in nature, and change frequently over time (Kouzarides, 2007).

Lysine seems to be a hot-spot for PTMs. It is by far the most modified amino acid, with the widest range of modifications that can be attributed to it, as is evident in Table 1‎ .1 (Azevedo et al., 2015). As well as the synergistic effect of interactions between modifications on the same histone tail (‘Cis’ effect), adjacent histone tail modifications have been shown to work together (‘trans’ effect). For example, ubiquitination of H2B seems to be a prerequisite for H3 methylation on two residues; however, this cross-talk goes in one direction, as the elimination of the methylation on these residues does not seem to affect the ubiquitination of H2B (Sun et al., 2002).

Figure 1.1 - An overview of possible PTMs on histone tails. Each modification is represented in no particular order on its relevant amino acid, and an example of a modifying enzyme (green) and a de-

modifying enzyme (red) next to it.

15

Table 1‎ .1 - Histone Modifications and their functions (Azevedo et al., 2015, Kouzarides, 2007). Chromatin modifications are presented each with the modified residue, modified histones, and the functions regulated by the modification. References given are for linking the modification to histones.

Chromatin Residues modified Histone Function Reference modification modified regulated Acetylation K H2A, H2B, Transcription, (Allfrey et al., 1964) H3, H4 Repair, Replication, Condensation Methylation (lysines) K (mono-/di-/tri- H3, H4 Transcription, Mono-(Murray, methylation) Repair 1964), di- (Paik et al., 1967), tri- (Hempel et al., 1968) Methylation R (mono-/di-methylation, H3, H4 Transcription mono- (Byvoet et al., (arginines) symmetric/asymmetric) 1972), di- (Borun et al., 1972) Phosphorylation S, T H2B, H3, H4 Transcription, (Smith et al., 1974) Repair, Condensation Ubiquitination K H2A, H2B Transcription, (Goldknopf et al., Repair 1975, Goldstein et al., 1975) SUMOylation K H1, H2A, Transcription (Shiio et al., 2003) H2B, H3, H4 ADP Ribosylation E, K H1, H2A, Transcription (Hilz et al., 1979, H2B, H3, H4 Messner et al., 2010) Deimination R>Cit H1, H2A, H3, Transcription (Hagiwara et al., (Citrullination) H4 2002, Rogers, 1962)

These PTMs serve to recruit chromatin-associated complexes, which often contain PTM readers within. Methylation readers, such as Tudor and chromodomain, can bind a methylated lysine through an aromatic cage. Specificity to a specific target can be achieved by the surrounding residues. Other PTMs have their own specific readers, most of which will recognise multiple adjacent PTMs rather than a single one. Unmodified histone H3 will also be recognised by specific readers. The combinatorical readout of modified histones is responsible to the fine-tuned targeting of genomic sites (Musselman et al., 2012).

Histones have a high turnover rate. As DNA is required to open regularly to allow transcription, the histones involved – either as a whole nucleosome or parts of it - will be degraded and replaced by new histones or other components, as required by the transcription process. However, most PTMs on histones are reversible and facilitated by

16 specific enzymes, which control not only the nature of the modifications, but also how long they might last (Venkatesh et al., 2015).

In addition to chromatin modifiers, the transition between packed chromatin and loose DNA is also controlled by chromatin remodelling complexes. These complexes move, restructure and discard nucleosomes in ATP-dependant reactions, in order to expose promoters, enhancers and origins of replication of the DNA. Thus, chromatin remodelling is vital in allowing DNA transcription, replication, repair, and even recombination. The spacing between nucleosomes may vary between cell types, but it is critical as a control point at promoter regions to allow (or deny) accessibility (Clapier et al., 2009). There are currently four known families of chromatin remodellers conserved in eukaryotes, each with its own specialization. They share several properties: recognition domains for histone modifications, affinity to nucleosomes and DNA, a DNA- dependant ATPase domain and its regulators, and domains for protein interaction. Specialization is achieved by unique association with subunits, and also through unique domains within their catalytic ATPases (Flaus et al., 2006).

The SWI/SNF (switching defective/sucrose nonfermenting) family remodellers can remove nucleosomes for diverse purposes (Tang et al., 2010). The ISWI (imitation switch) family remodellers are characterized by a SANT and HAND domains adjacent to a SLIDE ATPase-flanking domain that recognize unmodified histone tails. Other proteins belonging to the complex add additional histone and DNA-binding motifs, and together they control nucleosome spacing and activation or repression of transcription (Corona et al., 2004). The CHD (chromodomain, helicase, DNA binding) family remodellers can move or remove nucleosomes to expose promoters; other complexes of the same family contain histone deacetylases and have repressive functions (Marfella et al., 2007). The INO80 (inositol requiring 80) family remodellers have a split ATPase domain which binds helicase-related subunits. Its functions include the promotion of transcription activation and DNA repair. One member of this family, SWR1, is able to remove H2A-H2B histone dimer, and replace it with H2A.Z-H2B dimer, which marks the nucleosome for gene activation (Bao et al., 2007). The fluidity in chromatin remodelling seems to be the result of an apparent antagonism between assemblers and disassemblers.

17

To allow transcription, depending on the gene of course, AT-rich sequences can create a bend in the DNA that prevents nucleosomal binding at a transcription start site (Segal et al., 2006). These loose regions normally contain binding sites for transcription factors. Nucleosome wrapped enhancers would be released via remodelling as needed. In general, the remodeller complex ATPase (highly similar to DNA-translocase) attaches to DNA and to the nucleosome and translocates the DNA by drawing it from the linker histone towards the nucleosome. This may result in the creation of a DNA loop on the nucleosome, and the loosening of the DNA-histone connection. It is unclear whether the size of the loop has any significance. The nucleosomes around a transcribed gene are not entirely discarded, but chaperoned around the RNAPII, thus preventing DNA damage and uncontrolled transcription (Clapier et al., 2009).

18

1.2 Transcription factors and coregulators Transcription of protein-coding genes involves the positioning of RNA polymerase II (Pol II) at the promoter site of a gene, initiation of transcription at the transcription start site (TSS), transcription elongation, termination, transcript processing and export from the nucleus (Fuda et al., 2009). Regulation of transcription is dependent on promoter regions, the proximal promoter, and enhancers; all three are DNA sequences, of which the enhancer may be the furthest away. Transcription is initiated by a preinitiation complex comprised of general transcription factors and Pol II, and regulated by site specific transcription factors. These site specific transcription factor complexes can be found either at promoter or enhancer regions, with enhancer-bound transcription factors thought to contact the promoter via a DNA loop (Calo et al., 2013). In some cases, Pol II along with the general transcription factor complex is positioned and poised ready for transcription on inducible genes (Muse et al., 2007), and is therefore controlled by site specific transcription factors. These transcription factors can either directly recruit Pol II and initiate transcription, or indirectly affect transcription by recruiting coregulators such as chromatin remodellers or histone modifying agents.

Transcription factors generally occupy 6-12 bp long DNA sequence, which implies low sequence specificity and a requirement for additional factors for a more individual control. A specific combination of transcription factors around an enhancer site can result in a precise pattern of transcription (Lettice et al., 2012), and the combination of interaction partners can determine a different pattern of transcription. Some transcription factors such as KLF1 during erythrocyte differentiation (Pilon et al., 2011), or MYOD1 during myoblast differentiation (Cao et al., 2010) can only be found bound to enhancers at certain times during development, thus providing temporal control over gene expression.

Additional specificity and control is provided by cooperative factors that bind to common transcription factor conserved components. These transcriptional coregulators can be coactivators such as the CBP family (p300-CREB binding proteins), such as Groucho, they can activate chromatin remodelling or block repositioning of nucleosomes (reviewed in Spitz et al. (2012)). It is the balance between coactivators and corepressors that regulates gene expression.

19

Coactivators can act as a bridge between the preinitiation complex and activators like the large complex (Allen et al., 2015), or facilitate the opening of chromatin - histone acetyltransferases (CBP) and chromatin remodelling factors (BAF). The latter two can be parts of the same complexes, and biological specificity is achieved through the combinatorial cooperation of different subunits in these complexes (Ho et al., 2010). On the other hand deacetylation of histones by the nuclear receptor (N-CoR) and silencing mediator of retinoic and thyroid hormone receptors (SMRT), for example, will cause a decrease in gene expression (Guenther et al., 2001).

Variance in transcription regulatory factors expression can cause a myriad of diseases, including but not limited to various types of cancer, neurological developmental disorders, diabetes and autoimmune disease (Lee et al., 2013b). Overexpression of the transcription factor c-Myc, for example, has been shown to contribute to elevated gene expression levels in most tumour cells (Lin et al., 2012), and a mutation in the MED23 unit of the Mediator coactivator complex is known to cause dysregulation of transcription of immediate-early genes that affects brain development (Ding et al., 2008). While cofactors are generally ubiquitously expressed, certain mutations will can cause a tissue-specific disorder, for example mutations in transcription regulator NF-κB, which can be found in most cell types, has been linked to autoimmune disease and impaired inflammatory response (Hayden et al., 2012). The mechanisms for this tissue specificity necessitate further investigation.

20

1.3 SUMO - Small ubiquitin-like modifier

1.3.1 Introduction to SUMO Protein modification with Small Ubiquitin-like MOdifier (SUMO) is a major regulatory event that affects the activities of hundreds of proteins. This modification thus affects a diverse array of biological activities. SUMO conjugation or deconjugation can regulate the dynamic assembly and disassembly of multi-subunit protein complexes such as complexes required for transcription regulation, ribosome biogenesis and DNA repair networks (Finkbeiner et al., 2011b). SUMO modification can also alter protein cellular localization, stability, solubility and function (Geiss-Friedlander et al., 2007). SUMO operates by attaching to proteins covalently as a PTM (Silver et al., 2011) on one side, and by non-covalently interacting with other proteins on the other. This way, SUMO controls interactions between hundreds of known target proteins and mediates a myriad of cellular processes.

SUMO proteins are expressed throughout the eukaryotic kingdom. Some organisms have a single SUMO gene, while others, such as plants and vertebrates, have several SUMO genes. The human genome encodes five distinct SUMO proteins: SUMO1– SUMO5. Of these, SUMO1, 2, and 3 are expressed throughout the body. SUMO2 and SUMO3 mature forms are 97% identical, but share only 50% sequence identity with SUMO1 (Figure 1‎ .2A). SUMO1 and SUMO2/3 are conjugated to different target proteins and serve multiple distinct functions in vivo, with a large pool of unconjugated SUMO2/3 present in cells, ready for quick conjugation when required (Saitoh et al., 2000). SUMO4 was discovered in relation to the modification of IκBα and negative regulation of IL12B transcription (Bohren et al., 2004). Mutation in SUMO4 has been associated with susceptibility to diabetes (Song et al., 2012). More recently SUMO4 was implicated in promoting HIV-1 latency in CD4+ T-cells through TRIM28-mediated multiple- SUMOylation of CDK9 (Ma et al., 2019). The recently discovered SUMO5 mainly regulates promyelocytic leukemia (PML) nuclear bodies (Liang et al., 2016). In this work we will look at ubiquitously expressed SUMO moieties only.

21

1.3.2 Genome wide SUMOylation Studies of global SUMOylation of cells have uncovered many SUMOylated Lysine residues known to be acetylated, methylated or ubiquitinated (Hendriks et al., 2014), indicating there is possibly a cross reaction between modifications. However, most global SUMOylation studies were done under stress conditions such as heat shock (Bruderer et al., 2011, Tammsalu et al., 2014) or oxidative stress (Sahin et al., 2014). Few studies exist that have looked at endogenous SUMOylation of the cellular proteome (Becker et al., 2013, Hendriks et al., 2014).

Whereas other PTMs modify proteins throughout the cell, SUMOylation tends to occur mainly in the nucleus, both at the chromatin level (Stielow et al., 2008a) and in PML bodies (Ishov et al., 1999, Liang et al., 2016). SUMOylation is widely involved in many DNA damage response pathways (Impens et al., 2014, Jackson et al., 2013), and seems to increase under stress conditions (Golebiowski et al., 2009a), occurring at non- consensus lysines on target proteins (Hendriks et al., 2016).

1.3.3 SUMO Structure and functionality: SUMO proteins are ~11 kDa in size and resemble the three-dimensional structure of ubiquitin. They share less than 20% amino-acid sequence identity with ubiquitin and are different in their overall surface-charge distribution and the presence of an extended flexible N-terminal tail of SUMO which is absent in ubiquitin (reviewed in Geiss- Friedlander et al. (2007)). Nuclear magnetic resonance (NMR) analysis of SUMO proteins reveals that the structures of SUMO1 and SUMO2 are very similar, though in sequence they mostly differ in the second β-strand and the α-helix of both isoforms (Huang et al., 2004) (Figure 1‎ .2B,C). Ubiquitin was initially known for marking proteins for degradation, but it is now known its function is dependent on the modifications it acquires after ubiquitination (Swatek et al., 2016). SUMO resemblance to ubiquitin is high, both in structure, enzymatic reactions and even binding sites, both have been shown to alter the localization of a modified target, alter protein-protein interactions, alter activity and substrate stability and modulate protein-DNA interactions. These modifications affect cellular processes including mitosis, gene transcription, nucleocytoplasmic transport, sub-nuclear targeting and protein stability (Girdwood et al., 2004, Li et al., 2008, Sekiyama et al., 2008).

22

A.

SUMO1 SUMO2 … SUMO3

β-Strand β-Strand α-Helix

α-Helix β-Strand β-Strand

B. SUMO1 C. SUMO2

Figure 1.2 – SUMO paralogs. A. SUMO paralog alignment performed in MUSCLE (Zimmermann et al., 2018). B. SUMO1 (solution NMR, PDB ID 1A5R (Bayer et al., 1998)) and C. SUMO2 (solution NMR, PDB ID 2N1W (Chang et al., 2011)) 3D structures extracted from PDB (Prlić et al., 2018).

23

Ubiquitin binds preferentially to lysines, and less frequently to N-terminal fragment cysteine, serine or threonine residues, (Tait et al., 2007, Wang et al., 2007) while SUMO binds solely to lysines. SUMO1 forms an isopeptide bond between the C-terminal of SUMO1 and a lysine side chain of the target protein (Mahajan et al., 1998). SUMO1 modification of these lysines might stabilize the substrates by blocking ubiquitin modification (Sampson et al., 2001). In fact, SUMO1 has been shown to compete with ubiquitin over the target lysine (Desterro et al., 1998, Hoege et al., 2002). Also, the discovery of SUMO-targeted ubiquitin ligases (STUbLs) that facilitate the addition of ubiquitin to SUMOylated proteins has shown that SUMO can indirectly target -dependent ubiquitin-mediated degradation of proteins (Prudden et al., 2007).

Numerous SUMO targets have been identified through cell and biochemical studies and many of these substrates are involved in key cellular processes and structures (Finkbeiner et al., 2011b, Hendriks et al., 2014, Hendriks et al., 2017, Xiao et al., 2015). A large portion of SUMOylated proteins are nuclear factors (Impens et al., 2014), which suggests that SUMOylation is capable of promoting both transcriptional activation and repression (Garcia-Dominguez et al., 2009). SUMO is involved in nuclear transport, PML nuclear body (NB) formation and recruitment of PML associated proteins including Daxx and SP100 (Muller et al., 2004), it is also needed for recruitment of DNA repair factors (Galanty et al., 2009, Morris et al., 2009). Ubiquitin- and SUMO-dependent signalling mechanisms cooperate to promote retention of genome caretakers at DSB-flanking chromatin (Danielsen et al., 2012).

SUMOylation is an essential process in most organisms, though it is unclear whether all individual SUMO paralogs are essential in organisms that express more than one paralog. However, disruption of SUMO1 (Alkuraya et al., 2006) or SUMO2 (Wang et al., 2014) in mice causes embryonic death, and SUMO1 haploinsufficiency induces a split lip and palate developmental defect in mice and possibly in humans (Alkuraya et al., 2006). It has also become abundantly clear that defective SUMO regulation is highly associated with various diseases, such as neurodegenerative diseases (Eckermann, 2013, Lee et al., 2013a), congenital heart defects (Wang et al., 2011), diabetes (Zhao, 2007) and cancer (Baek, 2006, He et al., 2015, Seeler et al., 2007).

24

1.3.4 Reversible SUMOylation SUMOylation results in the formation of an isopeptide bond between the C-terminal glycine residue of SUMO and the ε-amino group of a lysine residue in the acceptor protein, the same bond as ubiquitination. SUMO proteins are expressed as an immature proform: a stretch of 2–11 amino acids at their C-terminus, after a conserved Gly-Gly motif. SUMO specific proteases (SENPs) recognize and cleave a freshly translated SUMO, with a specific catalytic domain - different SENP proteases acting with different SUMO paralogs (Mukhopadhyay et al., 2007). Removal of the C-terminal extension by SENPs is essential for the conjugation of SUMO to targets.

Like ubiquitination, SUMOylation requires an enzymatic cascade that involves three classes of enzymes (Figure 1‎ .3): First, ATP dependant activation of a mature (cleaved) SUMO protein by the SUMO-specific E1 activating enzyme heterodimer AOS1–UBA2 at the C-terminus of SUMO. This forms a SUMO–adenylate conjugate which is transferred from UBA2 to the only known E2 conjugating enzyme - UBC9, forming a thioester link between the catalytic cysteine residue of UBC9 and the C-terminal carboxy group of SUMO. UBC9 then transfers SUMO to the target protein, forming an isopeptide bond between the C-terminal glycine residues of SUMO and a lysine side chain of the target protein. This process is usually facilitated by a SUMO E3 ligase, which catalyse the transfer of SUMO from UBC9 to a substrate (Geiss-Friedlander et al., 2007). As opposed to Ubiquitin E3 ligases, only few E3 ligases are known in the SUMO pathway, and they are not obligatory for completion of the SUMOylation process. In addition, while Ubiquitin E3 ligases might be characterised by HECT, RING or U-box domains (Ardley et al., 2005), SUMO E3 ligases are often characterized by the presence of SUMO-Interacting Motifs (SIMs) as in RNABP2 (Song et al., 2004), and an SP-RING motif, which is essential for their function. SP-RING ligases, such as the PIAS protein family, bind UBC9 and their targets directly, and bind SUMO non-covalently via a SIM. The SIMs might position the target and UBC9 in a favourable orientation for SUMO transfer (Hochstrasser, 2001).

25

Figure 1.3 - The SUMO cycle. SUMO is activated via C-terminal cleavage by SENPs, bound by a thioester linkage to its E1 activating enzyme (SAE1/2), transferred to UBC9, which is the only conjugating E2 factor for all SUMO proteins, and together with E3 ligating enzymes transfers the SUMO protein to acceptor lysines (K) in a consensus sequence on the substrate protein. SUMO is then released by SENP and is free

to interact with another molecule. (Geiss-Friedlander et al., 2007)

SUMOylated proteins might be rapidly deconjugated by SENPs, making SUMOylation a reversible modification (Dou et al., 2011). SUMO proteases therefore play a pivotal role in keeping the balance between SUMOylated and non-SUMOylated proteins. Most SENPs cleave the SUMO terminal glycine from the target proteins lysine. Six SENPs are known in humans – SENP1-3, SENP5-7. Other deSUMOylating isopeptidases more recently discovered include DeSI-1 and 2 (Shin et al., 2012) and USPL1 (Schulz et al., 2012). Many SUMO specific proteases contain a SIM as well, either in the catalytic region or outside of it (Hickey et al., 2012). The specificity of these proteases to SUMOylated- target-deconjugation is determined mostly by N-terminal-directed localization even though the processing of proform SUMO is isoform specific (Hickey et al., 2012).

26

Any disruption of this process will affect SUMOylation. For example, bacterial and viral infections such as chicken adenovirus (Boggio et al., 2004, Boggio et al., 2007) and Listeria monocytogenes (Ribet et al., 2010), can counteract SUMOylation by controlling enzymes in the SUMO pathway to facilitate uninterrupted infection.

1.3.5 SUMO consensus binding sites A SUMO-acceptor site is the lysine residue in a target to which SUMO is coupled by an isopeptide bond. It is frequently found in the sequence motif ψKxE (Ψ - a bulky aliphatic amino acid, x is any amino acid and E is an acidic residue) (Rodriguez et al., 2001, Sampson et al., 2001). Similar to the case of ubiquitin having multiple known consensus binding sites such as D-box (Glotzer et al., 1991) and KEN-box (Pfleger et al., 2000), extended SUMO consensus motifs have also been characterized in several SUMO targets, such as Phosphorylation-Dependent SUMO Motifs (PDSMs) that are characterized by the presence of a phosphorylated residue downstream of a ψKxE consensus motif (ΨKxExxSP, in which K is the SUMO-conjugated lysine and S the phosphorylated serine) (Mohideen et al., 2009). Negatively charged residues can be found downstream of SUMO-conjugated lysines, which may replace the phosphorylated serine side chain of PDSMs and keep a constitutively active motif for SUMO conjugation (Yang et al., 2006). In addition to those an inverted consensus motif of [DE]xKx[no DE] has also been identified (Matic et al., 2010).

1.3.6 SUMO interacting motifs (SIMs) The recognition of the SUMO modification on targets by downstream effector proteins (SUMO-readers) is achieved mainly by two SUMO-binding motifs, a short hydrophobic sequence (V/I)X(V/I)(V/I) flanked by acidic residues, known as SUMO interacting motif - SIM (Song et al., 2004) and a specific zinc finger (ZZ, MYM zinc finger) (Danielsen et al., 2012, Guzzo et al., 2014) located on the SUMO-reader. The SIM is usually found in an unstructured area of the protein, forming a short β strand (Sriramachandran et al., 2014). The three main SUMO isoforms differ mostly in the amino acid composition of the β2-strand and the α-helix (Huang et al., 2004), the same regions that were found to mediate the non-covalent binding to SIMs. The hydrophobic core of the SIM is paired with the β2-strand of the β-sheet of SUMO in a parallel or anti-parallel conformation that extends the SUMO β-sheet, and is embedded in the cleft formed between those

27 and the α-helix (Hecker et al., 2006). Several SIM motifs have been characterised; RNF4 alone has four tandemly arranged SIMs, for example, and if prefers to bind poly-SUMO chains (Tatham et al., 2008).

1.3.7 Poly-SUMO chains Ubiquitin carries seven lysine sites, onto which additional ubiquitin molecules can be ligated, forming a poly-ubiquitin chain (Peng et al., 2003). Different linkages can exist within the same poly-ubiquitin chain, resulting in forked chains (Ben-Saadon et al., 2006, Kim et al., 2007). In addition to that, linkage can also occur between the c-terminus of one ubiquitin to the N-terminus of another, resulting in the formation of a linear chain (Kirisako et al., 2006). Different linkage types in ubiquitin chains can be recognised by specific receptors, affecting in turn the downstream cellular response. The best described linkages are Lys48 and Lys63. Lys48-linked ubiquitin chains mainly target a substrate to proteasomal degradation (Pickart et al., 2004, Thrower et al., 2000), while Lys63-linked ubiquitin chains can target to DNA damage repair, cellular signalling and ribosomal biogenesis to name a few (Pickart et al., 2004).

All SUMO proteins carry an unstructured, flexible stretch of 10–25 amino acids at their N-termini that is not found in any other ubiquitin related proteins, and the only function that has been assigned to these N-terminal extensions is the formation of SUMO chains (Figure 1‎ .4B). The formation of poly-SUMO chains (as is regularly found with ubiquitin) has been observed both in vivo and in vitro. To enhance the SUMO signal during stress responses such as heat shock (Golebiowski et al., 2009a), the flexible N-terminal tail of SUMO2/3 contains a lysine residue (K11) within a SUMO consensus motif that is recognized by the SUMO conjugation machinery and is the major acceptor site for SUMO2/3 chain formation. In contrast with its close family members SUMO2 and SUMO3, SUMO1 does not contain a lysine in a consensus motif, an internal SUMOylation site, and thus usually doesn’t form polymeric chains (Hendriks et al., 2017, Tatham et al., 2001). UBC9 E2 ligase can attach non-covalently to SUMO and promote chain formation (Knipscheer et al., 2007). Although SUMO2 and SUMO3 are 96% identical, SUMO2 appears to be a better substrate for chain formation than does SUMO3 owing to a 3- amino acid difference in the N-terminal region which appears to be partially inhibitory to conjugation of SUMO3 in comparison with SUMO2 (Tatham et al., 2001). Several

28 proteins were found to interact with poly-SUMOylated targets, such as the microtubule motor protein CENP-E that can only bind to poly-SUMO chains (Zhang et al., 2008a). SUMO chains are processed quickly by SUMO proteases, and thus have a short life span. Specifically, SUMO proteases SENP6 and 7 each contain multiple SIMs, which might indicate a specialization in acting on SUMO chains (Vertegaal, 2010). Several ubiquitin E3 ligases, such as RNF4, were proposed to have multiple SIMs in order to recognise poly- SUMOylated targets (Ulrich, 2008).

1.3.8 Multi-SUMOylation Cell damage, such as DNA breakage or heat shock induces a SUMOylation wave, affecting whole protein complexes by multiple-site SUMOylation. The process requires group SUMOylation - SUMOylation of multiple DNA repair factors - to speed repair in a synergistic manner (Psakhye et al., 2012). In this rapid process, rather than presenting one poly-SUMO chain, one target protein/complex can be SUMOylated on multiple lysine residues. In such a scenario, multi-SUMOylation of one target could provide an alternative recruiting platform for proteins containing multiple SIMs, thus helping to recruit coregulatory partners into regulatory complexes. In fact, several proteins have been demonstrated as SUMO1 modified on several sites; for instance CREB-binding protein (CBP) is SUMOylated on three sites, which recruits Daxx and thus inhibition transcription (Kuo et al., 2005). Megakarioblastic leukemia (MKL1) has three SUMOylation sites that serve to inhibit transcription (Nakagawa et al., 2005), whereas multi-SUMOylation of PEA3/ETV4 is required for promoter activation (Guo et al., 2009).

Recently, the construction of a multi-SUMO scaffold (constructed using the coiled-coil pentamerization domain of cartilage oligomeric matrix protein (COMP)) has allowed identification of several SUMO-readers containing more than one SIM motif. Transcriptional corepressor protein zinc finger - ZMYM2 (or zinc finger protein 198 – ZNF198) for example, has three canonical SIMs, and can recognize both multi- and poly- SUMOylated targets with clear preference to multi-SUMOylated ones for anchoring to chromatin. In this work, Aguilar-Martinez et al. (2015) postulated that the recruiting SUMOs could be contributed by either a multi-SUMOylated target (Figure 1‎ .4C) or by several different targets acting together to attract a protein to a complex (Figure 1‎ .4D) (Aguilar-Martinez et al., 2015).

29

A. B.

C. D.

Figure 1.4 - SUMO conjugated targets and their interactions with SIM and multi-SIM containing readers. A. mono-SUMOylated lysine on target protein bound to a single SIM (red) containing protein. B. poly-SUMOylated lysine on target protein bound to a multi-SIM containing protein. C. multiple SUMOylated lysines on target protein bound to a multi-SIM containing protein D. multiple mono- SUMOylated targets bound by single effector protein to create a complex (based on (Aguilar-Martinez et al., 2015)).

30

1.3.9 Consequences of SUMOylation The functional consequences of SUMOylation are hard to predict, as wide spread and varied as its roles are. SUMOylation may act in several different fashions – it may act as a binding platform for recruitment of other proteins, it may mask a binding site and prevent interaction with other proteins, such as ubiquitin (Pichler et al., 2005), and it can cause conformational change of a target protein (Wilkinson et al., 2010). Histone SUMOylation may exist in a dynamic interplay with histone acetylation and ubiquitination; for example, the recruitment of histone deacetylase (HDAC) and transcriptional repression is promoted by SUMOylation of Elk-1 (Yang et al. 2004).

Many transcription factors are controlled by SUMO modifications. The covalent attachment of SUMO to transcription factors, as mentioned above, has been shown to inhibit transcription. This can be achieved by recruitment of corepressors, and indeed many corepressor complexes have been shown to interact with different SUMOs (reviewed in Rosonina et al. (2017)), or by changing the sub-nuclear localisation of the transcription factor (Gill, 2005). However, though only found in a few cases, SUMO- related activation has also been demonstrated. For instance, the SUMOylation of T-cell transcription factor enables interaction with coactivator p300 thus promoting transcription (Zhang et al., 2012), and the homologous histone acetyltransferases CBP coactivator associates with SUMOylated transcription factors through zinc finger domain (Lee et al., 2015). In other cases transcription factor SUMOylation can simply disrupt corepressor binding (reviewed in Rosonina et al. (2017)).

In the next section we will explore the role of nuclear deSUMOylation in a specific transcription factors transactivation.

31

1.4 The Five Friends of Methylated CHTOP Among several groups of proteins discovered as multi-SUMO bound in Aguilar-Martinez et al. (2015), a complex characterised by Fanis et al. (2012) was prominent. This protein complex is recruited by methylated Chromatin Target of Pmrt1 (Chtop) in mouse erythroleukemia cells. Upon recruitment the complex probably deSUMOylates the Zbp- 89 transcription factor (also known as ZNF-148), and possibly other repressor complexes as well, resulting in transcription stimulation of Zbp-89 dependant genes. This complex was first found in a mass spectrometry analysis of the MLL-WDR5 complex (Dou et al., 2005), and in a large scale affinity purification scan for coregulatory complexes (Malovannaya et al., 2011).

This complex, comprised of five proteins, was named The Five Friends of Methylated CHTOP (5FMC). In addition to its ability for deSUMOylate Zbp-89, the complex was also implicated in large ribosomal RNA subunit maturation (Finkbeiner et al., 2011a).

1.4.1 The 5FMC complex is recruited to multi-SUMOylated targets Tatham et al. (2008) has established that RNF4 binds linear poly-SUMO chains through its multiple SUMO binding motifs. In 2015 Aguilar-Martinez et al. postulated that multi- SIM-containing proteins might be able to recognize multiple-SUMOs conjugated to a protein or protein complex rather than one poly-SUMO chain. A mass spectrometry scan for multi-SUMO bound proteins performed by Aguilar-Martinez et al. recovered several dozens of multi-SUMO bound proteins related to DNA repair and transcription. Among the proteins found bound to their multi-SUMO3 traps were all the members of the 5FMC complex. The 5FMC complex proteins only rarely appear in published single-SUMO3 mass spectrometry experiments, LAS1L appears as SUMO bound in several different studies, PELP1 has only been found SUMO-bound once, and TEX10, SENP3 and WDR18 only appear in one experiment under stress conditions (Golebiowski et al., 2009a, Hendriks et al., 2014, Tammsalu et al., 2014).

32

1.4.2 The 5FMC complex members

Only little is known about the 5FMC complex members – PELP1, LAS1L, SENP3, TEX10 and WDR18:

PELP1 - Proline-, glutamic acid- and leucine-rich protein 1, (1130aa, 17) contains several motifs common to transcriptional regulators, but it shows no homology to other proteins. PELP1 is a known coactivator of estrogen receptor-mediated transcription (Vadlamudi et al., 2001), and may have important functional role in estrogen receptor growth factor cross talk. PELP1 has possible roles in embryonic development, particularly in the brain (reviewed in (Girard et al., 2014)).

PELP1 is overexpressed in several types of hormonal cancers and also in cancer cell lines. Specifically in breast cancer, 60-80% of tumours show PELP1 overexpression (Vadlamudi et al., 2005). High expression of PELP1 has been linked to tumour grade, proliferation and invasiveness, and it seems to be inversely proportional to the levels of estrogen receptor in breast cancer (Habashy et al., 2010). It has also been shown that localisation of PELP1 to the cytoplasm in cancer coincides with poor response to tamoxifen treatment (Kumar et al., 2009). PELP1 was also found to have increased expression in endometrial tumours, prostate cancer and astrocytic brain tumours. PELP1 elevated levels in other types of cancer, such as colon carcinoma, were found to be a favourable prognostic factor (reviewed in (Girard et al., 2014)).

PELP1 protein contains several SUMOylation sites and SIMs, and is both SUMOylated (Finkbeiner et al., 2011a) and has non-covalent interactions with SUMO through SIMs. PELP1 modification by SUMO allows its release from the nucleus (Rosendorff et al., 2006).

LAS1L – LAS1-like protein (734aa, chromosome X) is a nuclear protein, essential for cell proliferation and ribosomal subunit synthesis and maturation. Depletion of LAS1L results in P53 dependant G1 cell cycle arrest (Castle et al., 2010). LAS1L contains a domain conserved throughout evolution, mutation of which results in spinal muscular atrophy with respiratory distress - a rare autosomal recessive disorder of neonatal weakness and early respiratory failure resulting in neonatal death (Butterfield et al., 2014). LAS1L is

33

SUMO modified in a SENP3 dependant manner, and SUMOylation affects its localisation, just as it does PELP1 (Castle et al., 2012).

SENP3 – SUMO specific protease (574aa, ), is active mainly against SUMO2/3. It contains a C-terminal conserved domain, and an active cleft within which the isopeptide bond between SUMO Gly-Gly motif and its associated lysine is captured. By deSUMOylating targets, SENP3 can activate or supress a myriad of cellular processes. It is redox-sensitive, and the presence of reactive oxygen species stabilises it, and thus increases deSUMOylation; while this is mostly beneficial for immune mediated diseases, it also promotes autoimmune response, arthritis (Gelderman et al., 2006) and tumour- induced immunosuppression (Nathan et al., 2013).

TEX10 – Testis Expressed 10 (929aa, ), is an essential component in control of gene expression and rRNA processing. Tex10 structure indicated it can interact with DNA, RNA and proteins, which might allow for diverse functionality (Ding et al., 2015). Tex10, Las1L and Wdr18 are all enriched in undifferentiated embryonic stem cells (ESC) (Buganim et al., 2013). In particular, Tex10 is essential for maintenance of ESC pluripotency and promotes cell reprogramming by regulating super enhancer activity through epigenetic modifications (Ding et al., 2015). More recently, TEX10 was found elevated in hepatocellular carcinoma, correlating with poor differentiation and increased drug resistance (Xiang et al., 2018).

WDR18 – a member of the WD repeat protein family (432aa, chromosome 19), contains a conserved core repeat sequence. WDR18 contains six WD repeats. This region may facilitate the formation of multi protein complexes, thus enabling WDR18 to take part in multiple developmental processes, such as left-right determination in zebrafish embryos (Yan et al., 2013).

1.4.3 5FMC complex members involvement in other complexes The importance of the 5FMC complex becomes apparent when we look into the involvement of each complex member in other biological processes. It seems that every member of the 5FMC complex is involved in more than one other complex as well. PELP1 is the stabilizing component of the 5FMC complex (Fanis et al., 2012), but it has also been shown to interact with many other protein complexes. For example,

34 demethylation of H3K9 histone is mediated by a KDM1–ESR1–PELP1 functional complex (Nair et al., 2010), chromatin remodelling by H3K4 methylation through PELP1 interaction with MLL1-WDR5 complex along with SENP3 and LAS1L (Kashiwaya et al., 2010) and PELP1–CARM1 interactions which enhance estrogen receptor 1 transactivation by modifying H3 acetylation (Mann et al., 2013).

Tex10 was identified as a cofactor of Sox2, working together with Nanog, Oct4, and Sox2 in regulating pluripotency in mouse ESC. Tex10 is essential in recruitment of p300 coactivator and Tet1 modifier which results in elevated enhancer RNA transcription and positive regulation of pluripotency (Ding et al., 2015, Zhao et al., 2017). While not strictly in complex, Sox2-Tex10-Tet1 directly contribute to open chromatin and super-enhancer mediated transcription activation (Ding et al., 2015).

WDR18 is involved in DNA damage check point signalling, probably mediating TopBP1 and Chk1 activity in Xenopus (Yan et al., 2013). It was also found to collaborate with TEX10, PELP1 or MDN1 in rRNA processing and large ribosomal subunit maturation (Finkbeiner et al., 2011a, Raman et al., 2016). In another study PELP1, TEX10, LAS1L, WDR18, SENP3 and NOL9 were shown to form a ribosomal processing complex (Castle et al., 2012).

1.4.4 5FMC complex and transcription regulation When working as a complex Fanis et al. (2012) demonstrated that the 5FMC complex is recruited to arginine-methylated Chtop and activates the associated transcription factor Zbp-89 possibly by deSUMOylating it. This results in transcription of Zbp-89 target genes (Figure 1‎ .5). Due to the fact that the 5FMC mainly localises to the nucleoplasm (van Dijk et al., 2010), and that Chtop is strongly associated with chromatin (Fanis et al., 2012), it is possible that the interaction is highly dynamic and controlled by additional factors. The 5FMC complex members’ involvement in other complexes briefly explored in the previous section is almost all connected to chromatin remodelling and regulation of gene expression as well.

35

A.

B.

Figure 1.5 - 5FMC complex recruitment to chromatin. A. The 5FMC complex is attracted by methylated Chtop to a SUMOylated transcription factor Zbp-89. B. 5FMC complex possibly deSUMOylates Zbp-89 and initiates transcription (model according to Fanis et al. (2012)).

36

Some complex members as well as the 5FMC complex as a whole, sometimes with other proteins in tow, were described as a part of ribosome biogenesis and maturation (Castle et al., 2012). Depletion or re-localisation of LAS1L or PELP1 results in cell cycle arrest due to stabilisation of transcription factor p53 (Castle et al., 2012). As ribosomes serve as the main sites for translation, complex members thus participate in cellular growth and proliferation. Several of the complex members were also found to be vital to normal embryonic development, embryo maturation and axis determination (Ding et al., 2015, Yan et al., 2013). Tex10 targeting to super enhancer regulation during early embryonic development to facilitate epigenetic modification (Ding et al., 2015) for example, and its involvement in ESC maintenance of pluripotency, which allows for continual proliferation, serves to strengthen these observations.

These roles all point to the involvement of 5FMC complex members in epigenetic modifications, effecting transcription factors by deSUMOylating them and resulting in stimulation of transcription. Controlling proliferation, several of these proteins were shown to be elevated in different types of cancer, making them an interesting target for controlling tumorigenesis.

37

1.5 Aims SUMO is a key player in many cellular systems and serves a pivotal role in providing a binding platform to different proteins thus mediating interactions between cellular components. SUMO interactions can be facilitated by target conjugation to a single SUMO, multiple SUMOs, or SUMO chains. SUMO has been implicated in a myriad of stress responses, and damage to SUMO or SUMO-cycle enzymes can cause birth defects or cancer. Cellular damage or stress leads to a SUMOylation wave which results in activation or repression of countless repair processes and SUMOylation of transcription factors has been shown to either activate or repress gene expression. Recruitment of repressor complexes to transcription factors will inevitably cause disruption in gene expression, thus the removal of conjugated SUMO will enable initiation of transcription. One previously described deSUMOylating complex, the 5FMC complex, was recently found recruited to multi-SUMOylated targets (Aguilar-Martinez et al., 2015).

In this study we aimed to find reader proteins that specifically bind to multi-SUMOylated targets, and further characterise them. Focusing on the multi-SUMO recruitment of the 5FMC complex, its recruitment to chromatin, and the effects of SUMO binding and deSUMOylation by the 5FMC complex on gene expression.

1.5.1 Aim 1: Identify and characterise proteins recruited to different multi- SUMOylated targets. The importance of SUMOylation and SUMO binding in cellular processes is well established, and the role of SUMO chains has been widely explored as well. However, it has been recently shown that multi-SUMOylation of a protein serves to recruit specific readers, with a clear preference for interaction with multiple SUMO moieties rather than a single poly-SUMO chain (Aguilar-Martinez et al., 2015). Using the same system described in Aguilar-Martinez et al. (2015), this study will aim to characterise multi- SUMO-binding proteins and their properties. This will be pursued in three different directions: First, we will look for multi-SUMO3 specific readers, characterising their interactions with multi-SUMOylated targets as opposed to poly-SUMOylated ones. Second, as SUMO1 does not form chains due to the lack of a lysine in a conserved sequence, we will look for multi-SUMO1 specific readers, characterising their properties. And third, since SUMO proteases are attracted to SUMOylated targets through a binding

38 site we suspect differs from the known SIM, we thus aim to look for alternative SUMO binding sites, identify a common binding motif, and characterise SUMO proteases and their SUMO-binding properties.

1.5.2 Aim 2: How does the 5FMC complex affect transcription? The 5FMC complex appeared in Aguilar-Martinez et al. (2015) mass spectrometry data as multi-SUMO3 bound. The same complex was identified as a deSUMOylating complex, recruited to chromatin bound elements and removing repressor complexes through deSUMOylation (Fanis et al., 2012). Members of this complex were described in an abundance of cellular reactions, mostly related to cell proliferation and gene expression; they were found elevated in different types of cancer and were shown to be involved in stem cell pluripotency circuitry (Ding et al., 2015); however there is a lack in information as to which transcription factors this 5FMC complex controls other than Zbp-89, and the effects of the complex on global gene expression. We therefore propose to explore the effects of complex disruption on global gene expression.

1.5.3 Aim 3: What is the interplay between the 5FMC complex and SUMO on chromatin? After recruitment by methylated CHTOP, 5FMC deSUMOylates transcription factor Zbp- 89 in order to remove repressor complexes and activate it (Fanis et al., 2012); this is probably performed by the SUMO protease unit of the complex, SENP3. In continuation with the previous section, this study aims to map SUMO on chromatin, and assess the interplay between 5FMC complex and SUMO on chromatin.

39

2 Materials and methods

2.1 Bacterial cloning methods

2.1.1 Bacterial transformation Mammalian expression plasmids: Plasmid DNA (Table 2‎ .1) was added to competent DH5α E. coli cells and incubated on ice for 30 min. Following a 20 second 42°C heat shock and 2 minute additional incubation on ice, cells were incubated at 37°C for 1 hour in Lysogeny Broth (LB) media. Transformants were spun down and plated on LB agar containing the appropriate antibiotics (Table 2‎ .1) and incubated at 37°C overnight.

Post-Ligation mammalian expression plasmids: Ligated plasmid DNA (Table 2‎ .1) was added to NEB® 5-alpha Competent E. coli (High Efficiency) and incubated on ice for 30 min. Following a 40 second 42°C heat shock and 2 minute additional incubation on ice, cells were incubated at 37°C for 1 hour in SOC outgrowth medium (BioLabs). Transformants were spun down and plated on LB agar containing the appropriate antibiotics (Table 2‎ .1) and incubated at 37°C overnight.

Protein expression plasmids: Plasmids encoding GST-fusion proteins (Table 2‎ .2) were added to BL21-CodonPlus(DE3)RIL E. coli cells, codon optimised for mammalian protein expression, and incubated on ice for 30 min. Following a 30 seconds 42°C heat shock and 5 minute additional incubation on ice, cells were incubated at 37°C for 1 hour in Lysogeny Broth (LB) media. Transformants were spun down and plated on LB agar containing the appropriate antibiotics (Table 2‎ .2) and incubated at 37°C overnight.

2.1.2 Glycerol stock preparation Individual transformed colonies were picked and used to inoculate LB media with the appropriate antibiotics (Table 2‎ .1, 2.2) and incubated at 37°C overnight. The cultures were then supplemented with equal measure of 50% glycerol, and stored at -80°C.

40

Table 2‎ .1 – Plasmid constructs for mammalian expression. Selection by Ampicillin at 200 µg/µl.

Protein Encoded Plasmid number Vector Source FLAG-SENP1wt pAS1138 pcDNA3 Witty et al. (2010) FLAG-SENP2wt pAS1140 pcDNA3 Witty et al. (2010) FLAG-SENP2C547S pAS1141 pcDNA3 Witty et al. (2010) FLAG-SENP3wt pAS1142 pcDNA3 Witty et al. (2010) FLAG-SENP3C532S pAS1143 pcDNA3 Witty et al. (2010) FLAG-PELP1 pAS2808 pCl Finkbeiner et al. (2011a) FLAG-TEX10 pAS2809 pCl Finkbeiner et al. (2011a) FLAG-WDR18 pAS2810 pCl Finkbeiner et al. (2011a) FLAG-LAS1L pAS2811 pCl Finkbeiner et al. (2011a) 3xFLAG-ZMYM2 pAS4055 pcDNA5 Aguilar-Martinez et al. (2015) WDR18ΔC-3xFLAG pAS2813 pcDNA3.1 This Study WDR18ΔN-3xFLAG pAS2814 pcDNA3.1 This Study WDR18AA-3xFLAG pAS2826 pcDNA3.1 This Study WDR18-3xFLAG pAS2812 pcDNA3.1 This Study

Table 2‎ .2 – Plasmid constructs for protein expression in bacteria. Selection by Ampicillin (200 μg/ml), Chloramphenicol (34 μg/ml).

Protein Encoded Plasmid number Vector Source GST-SUMO1 pAS2974 pGEX-6P1 Lemercier et al. (2000) GST-SUMO3 pAS2976 pGEX-6P1 Lemercier et al. (2000) GST-COMP pAS4017 pGEX-6P1 Aguilar-Martinez et al. (2015) GST-COMP-SUMO1 pAS4018 pGEX-6P1 Aguilar-Martinez et al. (2015) GST-COMP-SUMO3 pAS4020 pGEX-6P1 Aguilar-Martinez et al. (2015) GST-COMP-SUMO3AAA pAS4097 pGEX-6P1 Aguilar-Martinez et al. (2015) GST-SUMO3x4 pAS4179 pGEX-6P1 Aguilar-Martinez et al. (2015)

41

2.1.3 Plasmid DNA purification Individual transformed colonies were picked and used to inoculate 5 ml or 50 ml LB media with the appropriate antibiotics (Table 2‎ .1, 2.2) and incubated at 37°C overnight. Cultures were centrifuged at 6000 g for 15 min at 4°C to pellet the cells. DNA was extracted from the cell pellet by using either miniprep (Qiagen) or midiprep (Promega), following the protocols provided. Resulting DNA was resuspended in UltraPure water, and concentration was measured using NanoDrop (ND 1000).

2.1.4 Restriction digests DNA was digested using the appropriate restriction enzymes (1 unit per restriction) and buffer according to manufacturer’s instructions (NEB – New England Biolabs). Diagnostic digests typically contained 100 – 300 ng plasmid DNA in a 20 µl reaction, whereas digestion for ligation purposes contained 1 µg plasmid DNA in 50 µl reaction. Digests were incubated for 2-4 hours, then analysed by 1% gel electrophoresis. If down-stream ligation was required, digested plasmid DNA was treated with Alkaline Phosphatase for 15 min at 37°C (to prevent re-ligation), then purified using either QIAquick Gel Extraction kit or Polymerase Chain Reaction (PCR) purification kit (Qiagen) according to manufacturer’s instructions.

2.1.5 Ligation reactions Ligations were performed using T4 DNA Ligase (NEB), 1 unit, with the appropriate buffer. Reactions typically contained 100 ng cut plasmid DNA with appropriate ratio of insert DNA. After 30 minute incubation at room temperature (RT), 5 µl of the total ligation reaction were transformed into super competent cells (as described in section 2.1.1‎ ).

2.1.6 Plasmid DNA mutagenesis Polymerase chain reaction (PCR) was used to amplify the desired section of the WDR18 gene. Primers for gene truncation (Table 2‎ .3) were designed by introducing a start codon (blue) at position 73 coupled with a BamHI restriction site (Table 2‎ .3 marked in red), or by introducing a restriction site at position 934 at the end of the gene coupled with XhoI restriction site (Table 2‎ .3 marked in green). The stop codon was omitted in order to enable 3xFLAG tag to be added to the resulting protein, when ligating into pcDNA3.1 plasmid using BamHI/XhoI restriction.

42

Site directed mutations at the SIM site were introduced using forward and reverse primers replacing Leucine at position 114 and Isoleucine at position 116 with Alanine, creating WDR18114A,116A or WDR18AA. Primers were used to amplify whole plasmid as described in section 2.7.2‎ , and then digested with DpnI for 2 hours at 37°C to remove template plasmids.

All constructs were used to transform super competent cells as described in section 2.1.1‎ . Plasmids were purified and verified by restriction digests and sequencing before further analysis was carried out.

Table 2‎ .3 – Oligonucleotides - primers for WDR18 mutagenesis. Start codon marked in blue, BamHI restriction site in red, XhoI restriction site in green.

Template gene Name Sequence 5’ – 3’ Purpose WDR18ΔC_F ADS5946 TAAGCAggatccATGgcggcgcccatggaggtg C terminal truncation WDR18ΔC_R ADS5947 TGCTTActcgaggcctttgagggccaccgtccg C terminal truncation WDR18ΔN_F ADS5948 TAAGCAggatccATGgaacttcactcgggcgccaac N terminal truncation WDR18ΔN_F ADS5949 TGCTTActcgagcttggccggccgcgtgatgaa N terminal truncation WDR18ΔN_R ADS6095 TGCTTActcgagcttggccggccgcgtgatgaa N terminal truncation WDR18 114A,116A_F ADS6096 gtctccaccgggaaccttgcggtcgccctgagtcgacac SIM-site taccag directed mutagenesis WDR18 114A,116A_R ADS5946 ctggtagtgtcgactcagggcgaccgcaaggttcccggt SIM-site ggagac directed mutagenesis

43

2.1.7 Plasmid sequencing Sequencing primers were designed using Primer3 (Untergasser et al., 2012). 100 ng Plasmid DNA (Table 2‎ .1, 2.2) and the appropriate sequencing primers (Table 2‎ .4) were sent to GATC Biotech. Sequencing results were analysed using SnapGene software (GSL Biotech) and aligned to WT sequences using NCBI Blast search.

Table 2‎ .4 - Oligonucleotides - sequencing primers

Template gene Name Sequence 5’ – 3’ Purpose T7 terminator_R ADS2889 TATGCTAGTTATTGCTCAG Sequencing SP6_F ADS6264 ATTTAGGTGACACTATAG Sequencing pcDNA3.1_R ADS6109 GCAACTAGAAGGCACAGTCG Sequencing TEX10_end ADS6111 GTAATTCCTGACAGCACGGC Sequencing TEX10_middle ADS6110 TGGCAGATGGATCCAGTAGG Sequencing PELP1_R ADS5945 TTGATGAGAGGCCCACACAT Sequencing PELP1_F ADS5944 TTTGCAGACTGGGAAGCCTA Sequencing SENP3 mutation site ADS5411 TCTCTGTTGATGTGAGGCGA Sequencing C532S

2.2 Protein expression methods

2.2.1 Bacterial protein expression Escherichia coli BL21-CodonPlus(DE3)-RIL (Stratagene) were transformed with 100 ng plasmids encoding GST-fusion proteins according to manufacturer’s instructions. Transformed bacteria were grown in High salt Luria Broth (LB) media (1% tryptone, 0.5% yeast extract, 170 mM NaCl) containing Ampicillin (200 µg/ml) at 37°C with shaking until they reached OD600nm of 0.5-0.6, then induced by Isopropyl β-D-1- thiogalactopyranoside (IPTG, 200 µM) for 4 hours at 25°C for GST-COMP constructs or 37°C for other GST constructs with shaking, to express recombinant proteins.

44

2.2.2 Bacterial protein purification Pelleted bacteria from above (2.2.1‎ ) were lysed by 10 30 sec on / 30 sec off cycles of high frequency sonication (bioruptor, Diagenode) in GST lysis buffer (1X Phosphate Buffer – PBS (137 mM NaCl, 10 mM Phosphate, 2.7 mM KCl, pH of 7.4), 0.5% Triton- X100) and recombinant GST-fusion proteins were purified by adding the bacterial lysate to 500 µl glutathione agarose beads (Sigma, 1:1 beads/H2O) and incubating 1 hour at 4°C with gentle agitation. Beads were then washed four times for 5 min at 4°C with gentle agitation with GST-wash buffer (1X PBS, 400 mM NaCl), adding 0.5% Triton-X100 to the last wash (Aguilar-Martínez et al., 2016).

Serial dilutions of bovine serum Albumin (BSA) were resolved by 12% SDS-PAGE and stained by Coomassie brilliant blue R-250 stain (0.25% commassie R-250, 45% Methanol, 45% Acetic acid), then used as a standard curve for GST-trap assessment.

2.2.3 In-Vitro protein production Cell free protein production was performed using TnT Quick Coupled Transcription/Translation Systems (Promega) and the appropriate mammalian expression plasmids (Table 2‎ .1), with or without Fluorotect GreenLys (Promega) fluorescent labelling. Protein was produced according to manufacturer’s instructions.

2.3 Mammalian cell extraction methods

2.3.1 Total cell protein lysates Cells were scraped off 10 cm plates after two 5 ml cold PBS washes into 500 µl SUMO- binding buffer (50 mM Tris pH 7.5, 250 mM NaCl, 0.1% Igepal, 5% Glycerol) containing complete protease inhibitor at 1:50 dilution (Roche). Cells were lysed by 5 min high frequency sonication (30 sec on / 30 sec off cycles) (bioraptor, Diagenode). Following 10 minute centrifugation, the supernatant was used for further analysis.

2.3.2 Nuclear extracts For pull-down experiments: Nuclear extracts were produced by first incubating the cells in cytoplasmic lysis buffer (10 mM HEPES pH 7.9, 10 mM KCI, 0.1 mM EDTA, 0.1 mM EGTA) for 15 minutes, then adding Igepal to a final concentration of 0.625% and vortexing vigorously for 30 seconds. Then nuclei were pelleted and supernatant

45 removed. Remaining nuclei were lysed by sonication (bioraptor, Diagenode) in SUMO- binding buffer as above, centrifuged to remove cellular debris and supernatant retained.

For mass-spectrometry: HeLa nuclear extracts (5x109) were purchased from Ipracell (05842) were dialysed against pull down buffer (SUMO-lysis buffer) before use.

For immunoprecipitation: Nuclear extracts were produced by first incubating the cells in Cytoplasmic lysis buffer (10 mM HEPES pH 7.9, 10 mM KCI, 0.1 mM EDTA, 0.1 mM EGTA) on ice. After addition of 2% Igepal and vigorous vortexing, the resulting nuclei were pelleted and supernatant removed. Remaining nuclei were lysed by sonication (bioraptor, Diagenode) in IP-lysis buffer (1x PBS, 1% Triton X-100) as above, centrifuged and supernatant retained.

For chromatin immunoprecipitation (ChIP): Cells were cross linked for 10 minutes in 1% Formaldehyde in cell media, then 125 mM Glycine added for quenching the cross linking reaction, 5 min at room temperature with slow rocking. Following two 4°C PBS washes, cells were scraped into PBS containing complete protease inhibitor at 1:50 dilution (Roche). Cells were pelleted and resuspended in ChIP Lysis Buffer I (50 mM Hepes-KOH pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% Glycerol, 0.5% Igepal, 0.25% Triton X-100), then pelleted again and resuspended in ChIP Lysis Buffer II (10 mM Tris-HCl pH 8, 200 mM NaCl, 1 mM EDTA pH 8, 0.5 mM EGTA pH 8.0). The resulting nuclei were pelleted and resuspended in ChIP Lysis Buffer III (10 mM Tris-HCl pH 8.0, 100 mM NaCl, 1 mM EDTA pH 8.0, 0.5 mM EGTA pH 8.0, 0.1% Na-Deoxycholate, 0.5% N-lauroylsacosine), split into three 1.5 ml tubes and sonicated (bioraptor, Diagenode) at 4°C for 12 min with 30 second on/off cycles to create chromatin fragments of ~200-500bp in length. Following the addition of 1% Triton, samples were clarified by centrifugation at top speed, 4°C, for 10 min and the supernatant was collected for ChIP.

2.3.3 DNA extraction Linear DNA fragments from various reactions, such as PCR reactions and linearised plasmids, was purified using QIAquick PCR purification kit following the manufacturer’s instructions.

46

2.3.4 RNA extraction To extract RNA cells were washed twice in 4°C PBS, and then scraped into RLT buffer (RNeasy kit, Qiagen) and homogenised using QIAshredder homogenisers. Extraction was performed using RNeasy kit (Qiagen) following the manufacturer’s instructions.

2.4 GST pull down assays To isolate proteins binding to the GST-tagged SUMO constructs, GST pulldown assays were carried out using purified recombinant GST-proteins (generally 1 µg, see 2.2.2‎ ) and either total cell lysates from ~ 107 HEK293T cells transfected with the desired expression construct, or nuclear extracts from HEK293T or HeLa cells, either transfected or not.

As described in Aguilar-Martínez et al. (2016), the total cell lysate or the nuclear extract was mixed with 1 µg glutathione agarose-bound GST-tagged protein and incubated for 2 hours at room temperature with gentle agitation. Protease inhibitors (Roche) at 1:50 dilution were added every hour. After a wash with SUMO lysis buffer (5 min with gentle agitation at 4°C) and two 5 min washes with SUMO wash-buffer (50 mM Tris pH 7.5, 250 mM NaCl, 0.1% Triton-X100, 5% Glycerol) at 4°C, proteins were extracted from the glutathione-agarose beads by the addition of 2xSDS buffer (0.25M Tris-HCl pH 6.8, 20% Glycerol, 0.2% Bromophenol blue, 0.4% SDS, 0.4% β-Mercaptoethanol) and 5 min boiling. Resulting proteins were used in immunoblotting.

2.5 Mass-spectrometry For mass spectrometry purposes a slightly different pull down was performed; 1 µg GST- COMP purified protein was incubated at room temperature for 2 hours with gentle agitation with 5x109 HeLa nuclear extract (Ipracell) in SUMO-binding buffer, adding protease inhibitors (Roche) at 1:50 dilution every hour as above. The supernatant, containing proteins that did not interact with GST or COMP, was collected and added to GST-COMP-SUMO beads, and pull down continued as described above.

Following overnight incubation at 4°C with gentle agitation, supernatant was discarded and bound proteins were released from the glutathione agarose beads by overnight treatment (at 4°C with gentle agitation) with PreScission protease (8 units; GE Healthcare) in cleavage buffer (50 mM Tris-HCl, 150 mM NaCl, 1 mM EDTA, 1 mM dithiothreitol (DTT), pH 7.0). The PriScission protease was then captured to fresh

47 glutathione agarose beads washed with cleavage buffer by transferring supernatant to the washed beads and incubating 25 min at 4°C with gentle agitation. The remaining proteins, (PriScission protease free) were resolved by 12% SDS-PAGE and visualised by overnight (4°C with agitation) colloidal coomassie blue stain (ammonium sulphate 10%, coomassie G-250 0.1%, Ortho-phosphoric acid 3%, ethanol 20%) followed by double distilled H2O destain until bands were clearly visible. Bands corresponding to bound proteins were excised from the gel and subjected to mass spectrometry as described before (Aguilar-Martinez et al., 2015).

2.6 Immunoblotting Immunoblotting was carried out with primary antibodies listed in Table 2‎ .5, diluted according to the manufacturer’s instructions. Total cell extracts and isolated proteins after pull down or immunoprecipitation with were resolved on SDS/PAGE gels (12% for most, 8% was used when screening for larger ˃100 kDa proteins such as PELP1 and TEX10), and then blotted at 250 mA for 1 hour to a nitrocellulose membrane (Amersham) in cold Towbin buffer (25 mM Tris, 192 mM Glycine) containing 20% methanol. The protein-containing membrane was then stained with Ponceau S solution (Sigma) and photographed, blocked for 1 hour with Odyssey blocking buffer (LiCor) at room temperature and exposed to primary antibodies in 5% bovine serum albumin (BSA) in PBS with 0.03% w/v sodium azide. Following 3 washes in 1x PBS containing 0.25% tween (Sigma), the proteins were detected using infrared dye-conjugated secondary antibodies (LiCor Bioscience, IRDye 800CW and IRDye 680LT, Table 2‎ .5) in Odyssey buffer containing 10% Triton-X100. The signal was collected with a LiCor Odyssey infrared imager. Data was quantified using Fiji (Schindelin et al., 2012) and plotted using Excel (Microsoft).

48

Table 2‎ .5 – Antibodies for immunoblot detection and immunoprecipitation

Antibody Species Manufacture Serial No Use αFLAG M2 mouse Sigma F3165 WB-Primary, IP αFLAG rabbit Sigma F7425 WB-Primary, IP αTEX10 mouse Santa Cruz SC-398384 WB-Primary αLAS1L goat Santa Cruz SC-132737 WB-Primary αLAS1L rabbit Abcam ab140656 WB-Primary, IP αWDR18 rabbit Abcam ab176261 WB-Primary, IP αPELP1 rabbit Abcam ab200203 WB-Primary, IP αPELP1 mouse Santa Cruz SC-393534 WB-Primary αSENP3 rabbit Abcam ab124790 WB-Primary, IP αSENP2 rabbit Abcam ab124724 WB-Primary αZMYM2 rabbit Bethyl A301-711A WB-Primary, IP αSUMO2/3 rabbit Abcam ab3742 WB-Primary, IP, ChIP αSUMO2/3 mouse MBL International M114-3 WB-Primary IgG mouse Merck 12-371 ChIP, IP αMouse IRDye LiCor 680LT/800CW WB-secondary αRabbit IRDye LiCor 680LT/800CW WB-secondary αGoat IRDye LiCor 680LT/800CW WB-secondary

49

2.7 Polymerase chain reaction (PCR)

2.7.1 Amplification of DNA fragments for cloning For each 50 µl reaction 100 ng of DNA was added to 1x PfuUltra buffer, 200 μM dNTPs, 0.1 μM primer mix and 1 Unit of PfuUltra II Fusion Hotstart DNA polymerase (Agilent). Cycling conditions can be found in Table 2‎ .6.

Table 2‎ .6 – PCR reaction conditions: amplification of DNA fragments for cloning purposes

Cycle Temperature Time (min:sec) Cycles Initial Denaturation 95°C 1:30 1 Denaturation 95°C 0:30 Annealing 56-58°C 0:30 25 Extension 72°C 2:30 Final Extension 72°C 10:00 1

2.7.2 Whole plasmid PCR for site directed mutagenesis For each 50 µl reaction 50 ng of DNA was added to 1x PfuTurbo buffer, 200 μM dNTPs, 0.125 μM primer mix and 2.5 Units of PfuTurbo DNA polymerase (Agilent). Cycling conditions can be found in Table 2‎ .7.

Table 2‎ .7 – PCR reaction conditions: whole plasmid PCR for site directed mutagenesis

Cycle Temperature Time (min:sec) Cycles Initial Denaturation 95°C 0:30 1 Denaturation 95°C 0:30 Annealing 55°C 1:00 14 Extension 68°C 2 min per kb Final Extension 68°C 10:00 1

50

2.7.3 Colony PCR Used for ligation verification both for transformed E-coli and CRISPR-modified cells.

From each ligation plate, transformed E-coli colonies were marked and stabbed into PCR reaction mix containing BioMix Red (Bioline) and 0.1 µM appropriate primers.

From CRISPR-modified cells DNA was extracted using DNeasy blood and tissue kit (Qiagen) and used in PCR reaction mix containing BioMix Red (Bioline) and 0.1 µM appropriate primers.

Cycling conditions can be found in Table 2‎ .8.

Table 2‎ .8 – PCR reaction conditions: colony PCR

Cycle Temperature Time (min:sec) Cycles Initial Denaturation 95°C 5:00 1 Denaturation 95°C 0:30 Annealing 55°C 1:00 35 Extension 72°C 0:30 Final Extension 72°C 5:00 1

2.7.4 qPCR Following ChIP experiments input samples were diluted 1:10 and 2 µl used for qPCR reactions. 2 µl ChIPed samples were used in qPCR. 1 x QuantiTect SYBR Green mix

(Qiagen) was used with 0.6 µM of appropriate primer mix (Table 2‎ .9) and 1 mM MgCl2 in 10 µl reaction. Amplified DNA was detected by increase in fluorescence and measured by the cycles (CT) taken to reach a manually set threshold. Technical replicates were performed in duplicates and appropriate negative controls were used. A standard curve is done for each primer pair and used to calculate the concentration of replicated DNA in samples. Cycling conditions are described in Table 2‎ .10.

51

Table 2‎ .9 – Oligonucleotides – primers for ChIP-qPCR

Template gene Name Sequence 5’ – 3’ Purpose Source LOC101929140_R ADS5975 CTGTAGCCTTGACTGTTGTG ChIP-qPCR Aguilar- Martinez LOC101929140_F ADS5974 AGTGCTGTCTTTGTTCTGGG ChIP-qPCR Aguilar- Martinez ITPRIP_R ADS5973 TTCCTCCAAGAGACTAAATAGGTG ChIP-qPCR Aguilar- Martinez ITPRIP_F ADS5972 GTAAACTGGAGCGAGTGGAC ChIP-qPCR Aguilar- Martinez Negative_Control_R ADS5943 GACACGAGTTCTCTAGACCCT ChIP-qPCR Zongling Ji Negative_Control_F ADS5942 ATTACGTTGACCAGGGGAGG ChIP-qPCR Zongling Ji

Table 2‎ .10 – PCR reaction conditions: qPCR

Cycle Temperature Time (min:sec) Cycles Initial Denaturation 95°C 15:00 1 Denaturation 95°C 0:20 Annealing 53°C 0:30 40 Extension 72°C 0:30 Melting 72°C – 95°C Pre-melt 1:30, then 1 1°C rise every 0:05

2.7.5 RT-qPCR Following RNA extraction samples were diluted to 20 ng/µl and 2 µl used for Reverse transcription (RT) qPCR reactions. 1 x QuantiTect SYBR Green mix (Qiagen) was used with 0.6 µM of appropriate primer mix (Table 2‎ .11) and QuantiTect RT enzyme in 10 µl reaction. Amplified DNA was detected by increase in fluorescence and measured by the cycles (CT) taken to reach a manually set threshold. Technical replicates were performed in duplicates and a GAPDH positive control was used. A standard curve is done for each primer pair and used to calculate the concentration of replicated DNA in samples. Cycling conditions are described in Table 2‎ .12.

52

Table 2‎ .11 – Oligonucleotides – primers for RT-qPCR

Template gene Name Sequence 5’ – 3’ Purpose LAS1L_F ADS6358 CTACTGCTGACCTGATACGCT RT-qPCR LAS1L_R ADS6359 TTCACAAACCTGACCAATGCC RT-qPCR TEX10_F ADS6360 CATCCAGACACAGCTTTCCC RT-qPCR TEX10_R ADS6361 AGTCAACCCAACCAAATGCT RT-qPCR PELP1_F ADS6362 CTCACAGGAGACTGCATGAC RT-qPCR PELP1_R ADS6363 CAAGGCTATCTTCTCGCTGG RT-qPCR SENP3_F ADS6506 GGTAGTAGAGAAGCTGGAGGA RT-qPCR SENP3_R ADS6507 ATGTTCATCACCTGGTCATTGAG RT-qPCR WDR18_F ADS6504 GGGAACCTTCTGGTCATCCT RT-qPCR WDR18_R ADS6505 AGACCTCCCATAGCTTCACC RT-qPCR GAPDH_F ADS1285 ACAGTCAGCCGCATCTTCTT RT-qPCR GAPDH_R ADS1286 TTGATTTTGGAGGGATCTCG RT-qPCR LPAR6_F ADS6366 CACATTCCCTCATGGCTTCC RT-qPCR LPAR6_R ADS6367 GTCGCTTTCTCACAATTCTTGG RT-qPCR ELK3_F ADS6503 CATCAAAGCAAACACCAAATGG RT-qPCR ELK3_R ADS6502 TCTGAGAGTTTGAAGAAAGCAG RT-qPCR FOS_F ADS6508 AGAATCCGAAGGGAAAGGAA RT-qPCR FOS_R ADS6509 CTTCTCCTTCAGCAGGTTGG RT-qPCR

Table 2‎ .12 – PCR reaction conditions: RT-qPCR

Cycle Temperature Time (min:sec) Cycles Reverse 50°C 30:00 1 Transcription Initial Denaturation 95°C 15:00 1 Denaturation 95°C 0:20 Annealing 57°C 0:30 40 Extension 72°C 0:30 Melting 72°C – 95°C Pre-melt 1:30, then 1°C 1 rise every 0:05

53

2.8 Immunoprecipitation Dynabeads magnetic beads (Invitrogen) for the appropriate antibody species (Protein A for rabbit polyclonal, protein G for mouse monoclonal antibodies) were washed with block solution (0.5% BSA in PBS), then resuspended in IP-lysis buffer and incubated with 0.5 µg of the desired antibody (Table 2‎ .5) per 10 µl beads according to the manufacturer’s specification at 4°C with shaking for 4 hours. An additional non-specific IgG tube was prepared in the same way per each sample as binding control. Nuclear extracts in IP-lysis buffer protease inhibitors (Roche) at 1:50 dilution were then equally divided between antibody and IgG control tubes and incubated with rotation at 4°C overnight.

Following two washes with IP wash Buffer (1x PBS, 1 M NaCl, 0.5% Triton) and another two washes in Tris-buffered saline – TBS (50 mM Tris-base pH 7.6, 150 mM NaCl), beads were resuspended in 20 µl 2xSDS buffer (0.25M Tris-HCl pH 6.8, 20% Glycerol, 0.2% Bromophenol blue, 0.4% SDS, 0.4% β-Mercaptoethanol) and boiled. Resulting proteins were resolved on SDS/PAGE gels along with input and flow through samples, and immunoblotted.

54

2.9 Chromatin immunoprecipitation Nuclear extraction was performed as specified in section 2.3.2‎ .

2.9.1 Sonication efficiency test 10 µl of sonicated (bioruptor, Diagenode) ChIP lysate were boiled for 10 minutes to reverse cross links, and ran on 1% agarose gel electrophoresis.

2.9.2 SUMO2/3 ChIP with protein A beads Nuclear extract from ~20x106 OE19 cells or 4x106 MCF10A cells (section 2.3.2‎ ) in ChIP Lysis Buffer III (10 mM Tris-HCl pH 8.0, 100 mM NaCl, 1 mM EDTA pH 8.0, 0.5 mM EGTA pH 8.0, 0.1% Na-Deoxycholate, 0.5% N-lauroylsacosine) containing protease inhibitors was added to SUMO2/3 antibody (Abcam) -bound magnetic protein A Dynabeads (Invitrogen) prepared as above (section 2.8‎ ). Following overnight incubation Dynabeads were washed at 4°C with various buffers as follows: 5 times with RIPA wash buffer (50 mM Hepes-KOH pH 7.6, 500 mM LiCl, 1 mM EDTA pH 8.0, 1% Igepal, 0.7% Na- Deoxycholate); Once with TE-NaCl (10mM Tris-HCl pH 8.0, 1mM EDTA, 50 mM NaCl). ChIP DNA was eluted from Protein A Dynabeads by adding 150 µl elution buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA pH 8.0, 1% SDS) and incubating at 65°C for 30 minutes with shaking at 1250 rpm, twice. The eluted ChIP DNA was collected in a 1.5 ml tube and incubated overnight at 65°C. 225 µl elution buffer was also added to input samples, following which these were also incubated over night at 65°C to reverse the crosslinks. Following incubation, Proteinase K (0.2 mg/ml) was added to the samples and incubated for 1 hour at 55°C, 600 rpm. ChIP DNA was purified using QIAquick PCR Purification kit (Qiagen) following the manufacturer’s instructions. Samples were then either used in ChIP-qPCR experiments or sent for sequencing.

55

2.10 RNA-seq Following RNA extraction a fraction of the samples was diluted to 20 ng/µl and 2 µl used for RT-qPCR reactions. After RT-qPCR verification of samples, the remainder of undiluted samples were sent for sequencing.

2.11 Mammalian cell culture methods

2.11.1 Sub-culturing cell lines Cells were cultured in appropriate media and supplements (Table 2‎ .13), and grown at

37°C in a humidified, 5% CO2 incubator. Cells were grown to ~80% confluency and washed in warm PBS before being dissociated with trypsin-EDTA 0.25% and passaged to new flasks in appropriate dilutions.

Table 2‎ .13 – Cell lines and culturing conditions

Cell Line Description Growth media Supplements HEK293T Embryonic kidney DMEM (Gibco) 10% fetal bovine serum (Transformed) OE19 Oesophageal RPMI (Gibco) 10% fetal bovine serum carcinoma MCF10A Mammary epithelial DMEM/F12 Normal Media: 10% horse serum, Insulin (Gibco) 1/1000, cholera toxin 1/10,000, Hydrocortisol 0.5µg/ml, EGF 20 ng/ml Starvation Media: 0.5% horse serum, Insulin 1/1000, cholera toxin 1/10,000, Hydrocortisol 0.5µg/ml

56

2.11.2 Freezing and thawing cells Cells were cultured till 80% confluency, trypsinised (as described in section 2.11.1‎ ) and pelleted by centrifugation for 3 minutes at 1000g. Cells were resuspended in the appropriate media supplemented with 5% Dimethyl sulfoxide (DMSO). Cells were frozen in -80°C overnight in a Mr. Frosty freezing container in isopropyl alcohol, in order to achieve a rate of cooling of -1°C/minute, the optimal rate for cell preservation. Cells were then transferred to liquid nitrogen for long term storage. When needed, cells were rapidly thawed at 37°C, and seeded into flasks with required volume of growth media.

2.11.3 Cell transfection Plasmid transfection – HEK293T cells were counted and seeded at 2.5x104/cm2 in appropriate media (Table 2‎ .13) and incubated overnight. The chosen plasmid was mixed with Polyfect transfection reagent (Qiagen) in Optimem - an improved Minimal Essential Medium (MEM) (Gibco) according to manufacturer’s instructions and incubated for 10 minutes at room temperature. Following incubation, plasmid-Polyfect mixture was added to the cells, and incubated 48 hours before analysis. siRNA transfection – OE19 cells were counted and seeded at 2.5x104/cm2 in appropriate media (Table 2‎ .13) and incubated overnight. The chosen siRNA (Table 2‎ .14) in the manufacturer’s recommended concentration was mixed with Lipofectamine RNAiMAX (Invitrogen) in fully supplemented media according to manufacturer’s instructions, incubated for 5 minutes at room temperature and added to cells. Following 48 hours incubation, the media was replaced with fresh media and transfection was repeated. After 48 hours incubation cells were collected for analysis.

57

Table 2‎ .14 – siRNAs used for knockdown

Target Product siRNA Sequence 5’ – 3’ Manufacturer PELP1 ON-TARGET plus GACCAAGGUGUAUGCGAUA Dharmacon L-004463-00 SMARTpool GAGGAUUUGACAGUUAUUA GUAAUGCACGUCUCAGUUC GCGAGAAGAUAGCCUUGAG LAS1L ON-TARGET plus GCUCAGAAGACGUGCGAUG Dharmacon L-018577-01 SMARTpool GUACGGAAAGUGCGUUAAA ACGGAAUGCUCGCCGAUUU ACCAUAAGUUGCAGCGGUA TEX10 ON-TARGET plus CUACACAGCUUAUCGAUAU Dharmacon L-020741-02 SMARTpool UGGGAAUGGUAUAGAACGA UCAAGAAUAUCACGACAUU GAUUUGCUUUCUCGGUUAA WDR18 ON-TARGET plus UGGAGGACGAGGUGCGCAA Dharmacon L-021451-01 SMARTpool GCUGCUCAAUGGCGAGUAU GGACCUGCACUGCGGCUUU CCGGGAACCUUCUGGUUCAU SENP3 ON-TARGET plus ACGAAUUCCUUCAAACGUA Dharmacon L-006034-00 SMARTpool GCACUGAUGAGGUAGUAGA GAUAAACUCCGUACCAAGG CAAGUCAGGUGGAGGGUUU

Nucleofection – OE19 cells were counted and diluted to 2x106 cells in 100 µl Nucleofector solution V (LONZA) prepared according to manufacturer’s instructions. The desired amount of DNA was added to the mix and sample was transferred to an Amaxa certified cuvette. The sealed cuvette was inserted to the Amaxa Nucleofector and an appropriate optimized program was chosen (E-032 for OE19 cells). When program has finished cells were immediately transferred into 500 µl of pre-warmed RPMI medium, and incubated at 37°C for 5 minutes. Additional RPMI media containing 10% fetal bovine serum was added and cells were transferred to a 6 well plate for overnight incubation at 37°C.

58

2.11.4 EGF stimulation of cells MCF10A cells were counted and sparsely seeded (2x106 cells per 15 cm2 plate) in 20 ml starvation media (Table 2‎ .13) and incubated overnight at 37°C in a humidified, 5% CO2 incubator. Following 48 hour incubation EGF at 25 ng/ml final concentration was added to the media, incubated for 15, 30, 60, 120 or 180 minutes, then collected for analysis. Non stimulated cells were collected and referred to as ‘0’ time point.

2.12 Clustered, regularly interspaced, short palindromic repeats (CRISPR)

2.12.1 Construct and homology arm design The 3xFLAG-2PA-Neomycin resistance epitope tagging sequence was designed according to sequence obtained from pFETCh-Donor plasmid (Addgene 63934) with 3xFLAG sequence modified by Dr Zongling Ji to reduce repeats according to Integrated DNA Technologies’ (IDT) instructions. Homology arm sequences for each individual 5FMC complex member were designed to be ~200 base pairs on either side of the stop codon of each gene. Designed sequences were ordered in the form of double stranded gblocks from IDT (Table 2‎ .15).

CRISPR gRNA guides were designed to target area near stop codons at 3’ end of each 5FMC complex member using Zhang Lab’s online tool www.crispr.mit.edu, and ordered from IDT. gRNAs (Table 2‎ .16) were cloned downstream of a U6 promotor element in the PX458 pSpCas9(BB)-2A-GFP plasmid (Addgene 48138) (Ran et al., 2013) using 2 units of BbsI-HF (NEB) restriction enzyme to cut the plasmid, then 1 unit of T4 DNA ligase (NEB) for the ligation reaction as described in section 2.1‎ . Following transfection of guide plasmids (Table 2‎ .17) into NEB super competent DH5-α E-coli, colonies were identified using colony PCR (section 2.7.3‎ ) with U6 forward primer (ADS5216, 5’- GAGGGCCTATTTCCCATGATTCC -3’) and the relevant reverse guide RNA (Table 2‎ .16).

59

Table 2‎ .15 – gblock design. Homology arms marked in blue, insert containing modified 3xFLAG (red), 2PA (green) and Neomycin resistance (orange) with stop codon underlined in bold.

Gene gblock design Name PELP1 GCGAGAAGGTGGGGCTGGCGGGAGAAGGGGTGGCAGCTGCTTCTCTTAGGAGAGGTTTGCCCAGCCTGAACCA TTCTTCTGTGTATGTGTGTGTGACTATTCTGAATCTTGTATCTTTGTGCCTAAAAGGAGCAGGATGACACAGCTGC CATGCTGGCCGACTTCATCGATTGTCCCCCTGATGATGAGAAGCCACCACCTCCCACAGAGCCTGACTCCGGGAG CGGAGGAGGTTCCGGTGGAGGTGGTTCTGGAGACTACAAGGACGATGATGATAAGGGCGATTACAAGGATGAC GACGATAAGGGAGATTATAAGGACGACGACGATAAGGTTTCAGGAAGCGGAGCTACTAACTTCAGCCTGCTGAA GCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTGGATCGTTTCGCATGATTGAACAAGATGGATTGCACGCA GGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGC CGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCGGTGCCCTGAATGA ACTGCAGGACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTG TCACTGAAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACCTTGCT CCTGCCGAGAAAGTATCCATCATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTC GACCACCAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCT GGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAACTGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAG GATCTCGTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGCTTTTCTGGATTCATC GACTGTGGCCGGCTGGGTGTGGCCGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCT TGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCCTTCTAT CGCCTTCTTGACGAGTTCTTCTGACCATCTTCTGCACCCCACTCTTTGTTTCCAATAAAGTTATGTCCTTAGATAGC GACTGCTGCTTCTGCCTTTTGCCACACCTGGGTCCCCAGCCAGAAGCTCAAGTGTCTGCTGGGCTTTTCAGGGTGT CCATAGTATTCCCCAGATGTCTCCATAGCTGTCCTCTTTTTGACGATCCAGAGAGGAACAACTGACATTGGTGTAA TGGACTGTCCCATCACTCAGCACGATGGTGTCAGAATT TEX10 CAGATGGGGATACAGTTGCCAAAGCTCTGGGAAATTGTGTTTGGGTGTGAAAAATACTGTCAGGTAGCATAGTG ATTTGGGTGCAGTTCTCAGATTACTCTTGTTTTTCTCTCCAGACATTGAAGAGTGGAAGTGTTCAGGAACAGTGGC TCACAGACTTACATTACTGCTTCAACGTGTATATCACTGGGCATCCCCAAGGGCCCAGTGCACTGGCTACAGTGTA TGGGAGCGGAGGAGGTTCCGGTGGAGGTGGTTCTGGAGACTACAAGGACGATGATGATAAGGGCGATTACAA GGATGACGACGATAAGGGAGATTATAAGGACGACGACGATAAGGTTTCAGGAAGCGGAGCTACTAACTTCAGC CTGCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTGGATCGTTTCGCATGATTGAACAAGATGGATT GCACGCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCT CTGATGCCGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCGGTGCCC TGAATGAACTGCAGGACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTC GACGTTGTCACTGAAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCA CCTTGCTCCTGCCGAGAAAGTATCCATCATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTG CCCATTCGACCACCAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGG ATGATCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAACTGTTCGCCAGGCTCAAGGCGCGCATGCCCGAC GGCGAGGATCTCGTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGCTTTTCTGGA TTCATCGACTGTGGCCGGCTGGGTGTGGCCGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAA GAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCC TTCTATCGCCTTCTTGACGAGTTCTTCTGAAGAGGCCATAGTACCTCCTGTTTGAAGTTGTTTATTCACATCTATCT TATTTGAAGAAAAAGACTGATGTAATAGATCTTTGTCATTAAAGCTGAACTTTTAAAGAAGTTTACGAACCTTCTT TCTTTGCACTGATATATGAAAATAATGTTTGGGCGTTCCTATTGATGTGTAGGCTGGCTTCCCGGACTAGCACACG ATACATGGTCATAGGCGGTTGTGCATTGTGCATTTCATAGGAGCTAAATATTTACAAAATAGAGAGTGATA

60

WDR18 GGGGTCAGTCCTGGCCAGTGGGGGTGAGTGGCCCCCCTCCAGCACACCCCAGGCCACTTCTGCCCTCTGACCCCG ACTTCTCCCGCAGAGCGTGCTCGGCGGCCAGGACCAGCTGCGCGTCCGTGTGACGGAGCTGGAGGACGAGGTG CGCAACCTGCGCAAGATCAATCGGGACCTGTTCGACTTCTCCACGCGCTTCATCACGCGGCCGGCCAAGGGGAG CGGAGGAGGTTCCGGTGGAGGTGGTTCTGGAGACTACAAGGACGATGATGATAAGGGCGATTACAAGGATGAC GACGATAAGGGAGATTATAAGGACGACGACGATAAGGTTTCAGGAAGCGGAGCTACTAACTTCAGCCTGCTGAA GCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTGGATCGTTTCGCATGATTGAACAAGATGGATTGCACGCA GGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGC CGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCGGTGCCCTGAATGA ACTGCAGGACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTG TCACTGAAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACCTTGCT CCTGCCGAGAAAGTATCCATCATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTC GACCACCAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCT GGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAACTGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAG GATCTCGTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGCTTTTCTGGATTCATC GACTGTGGCCGGCTGGGTGTGGCCGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCT TGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCCTTCTAT CGCCTTCTTGACGAGTTCTTCTGAGGCCCGGAGACCCCGGCCCGAGGCGCCCAGGCCTGAGCCCCATGCCTCCCA GCAACCAGGGCCCGCGGGTGTGGCCCCCACCAGCCCAGGCCTGGACTCTCCTCAGTTCTGTGTCGTGTTCGGGTT TTTCCTCTGTGACTGGGCCGTCTTGGTGTCTCGTGGCACGCGTCACAGTGGTGCTAGTCTGTTTTTAACAAAAGAG GATGAAAAGCCCCTCCTCTCCGGC LAS1L CCCACAGATACACACATGTCAAGCACTCTTGGCCCCTCCAGAGCCTAAAGCTGCCTCCTTCCCTGACTGACCTCCC ACTCTCTCCTCAGCACCTTGGGCCTGAGCTGTGGTGTCGGCAGTGGCAACTGCAGCAACAGCAGCAGCAGCAAC TTCGAGGGCCTTCTCTGGAGCCAGGGGCAGCTGCATGGGCTCAAAACTGGCCTGCAGCTCTTCGGGAGCGGAGG AGGTTCCGGTGGAGGTGGTTCTGGAGACTACAAGGACGATGATGATAAGGGCGATTACAAGGATGACGACGAT AAGGGAGATTATAAGGACGACGACGATAAGGTTTCAGGAAGCGGAGCTACTAACTTCAGCCTGCTGAAGCAGG CTGGAGACGTGGAGGAGAACCCTGGACCTGGATCGTTTCGCATGATTGAACAAGATGGATTGCACGCAGGTTCT CCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGCCGT GTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCGGTGCCCTGAATGAACTGCA GGACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTG AAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACCTTGCTCCTGCC GAGAAAGTATCCATCATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGACCAC CAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCTGGACGA AGAGCATCAGGGGCTCGCGCCAGCCGAACTGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAGGATCTC GTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGCTTTTCTGGATTCATCGACTGT GGCCGGCTGGGTGTGGCCGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGG CGAATGGGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCCTTCTATCGCCTT CTTGACGAGTTCTTCTGATGGCCATCCCTGGTGCAAGTGTTCATCCAGCCGTGCCAGGGCAACAGCCCACCCCCT AGTACAACTGATGCTCCCTGAGACAACCTGGGAGACAGCCTGGATCAGCCACATCAACTCAGTTGTCCACCACAG GGGAATTTTGAATGTCTTTTGTTTTTGTTTTGTTTTGAAAAATAATAAACAGGCAACTGTAGTTTTGGTGTTTGTGA GATGAAGAGGGTG SENP3 GCTGTGGGACCCTGGCTCCCTTTGGGGATGTTCTCTGAAGGATGGAGACACATCTCATATGAAATGTGTAGCACA GGTCCTGACACGGGGGGTTTCTCATGGCTTGCTTTGTTAACACCCAGTACTGCAAGCATCTGGCCCTGTCTCAGCC ATTCAGCTTCACCCAGCAGGACATGCCCAAACTTCGTCGGCAGATCTACAAGAAGCTGTGTCACTGCAAACTCAC TGTGGGGAGCGGAGGAGGTTCCGGTGGAGGTGGTTCTGGAGACTACAAGGACGATGATGATAAGGGCGATTA CAAGGATGACGACGATAAGGGAGATTATAAGGACGACGACGATAAGGTTTCAGGAAGCGGAGCTACTAACTTC AGCCTGCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTGGATCGTTTCGCATGATTGAACAAGATG GATTGCACGCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGC TGCTCTGATGCCGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCGGT GCCCTGAATGAACTGCAGGACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGT GCTCGACGTTGTCACTGAAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCAT CTCACCTTGCTCCTGCCGAGAAAGTATCCATCATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTA CCTGCCCATTCGACCACCAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATC AGGATGATCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAACTGTTCGCCAGGCTCAAGGCGCGCATGCC CGACGGCGAGGATCTCGTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGCTTTTC TGGATTCATCGACTGTGGCCGGCTGGGTGTGGCCGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGC TGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCAT CGCCTTCTATCGCCTTCTTGACGAGTTCTTCTGAGCCTCGTACCCCAGACCCCAAGCCCATAAATGGGAAGGGAG ACATGGGAGTCCCTTCCCAAGAAACTCCAGTTCCTTTCCTCTCTTGCCTCTTCCCACTCACTTCCCTTTGGTTTTTCA TATTTAAATGTTTCAATTTCTGTATTTTTTTTTCTTTGAGAGAATACTTGTTGATTTCTGATGTGCAGGGGGTGGCT ACAGAAAAGCCCCTTTCTTC

61

Table 2‎ .16 – CRISPR guides. Designed using Zhang Lab’s online tool www.crispr.mit.edu, with overhang (in lowercase).

Template gene Name Sequence 5' - 3' SENP3gRNA_F ADS6368 caccACTTCGTCGGCAGATCTACAAGG SENP3gRNA_R ADS6369 aaacCCTTGTAGATCTGCCGACGAAGT WDR18gRNA_F ADS6370 caccATCACGCGGCCGGCCAAGTGAGG WDR18gRNA_R ADS6371 aaacCCTCACTTGGCCGGCCGCGTGAT PELP1gRNA_F ADS6372 caccGCAGAAGATGGCTAGGAGTCAGG PELP1gRNA_R ADS6373 aaacCCTGACTCCTAGCCATCTTCTGC TEX10gRNA_F ADS6376 caccACTTCAAACAGGAGGTACTATGG TEX10gRNA_R ADS6377 aaacCCATAGTACCTCCTGTTTGAAGT LAS1LgRNA_F ADS6374 caccGCTGGATGAACACTTGCACCAGG LAS1LgRNA_R ADS6375 aaacCCTGGTGCAAGTGTTCATCCAGC

2.12.2 Cell culture and nucleofection OE19 cells were grown under recommended growth conditions (section 2.11.1‎ ). As OE19 are very hard to transfect, nucleofection (see section 2.11.3‎ ) was used to transfect a mix of 9 µg guide RNA plasmid (Table 2‎ .17) with 1 µg of the relevant gblock (Table 2‎ .15). 48 hours from transfection, cells were selected with 600 µg/ml Geneticin (G418, Invitrogen), an analogue of Neomycin. 2-3 weeks post nucleofection individual colonies were picked up from the plates and moved to individual flasks with no Geneticin for 5-8 media changes to enable faster growth, before re-introducing Geneticin at the same concentration as before to ensure cell in culture contain the inserts.

Table 2‎ .17 – CAS9 guide RNA plasmids. gRNA coded Name Vector Source Empty vector pAS4298 px458 (pspCAS9 LBB-24-GFP) (Ran et al., 2013) PELP1 pAS2827 px458 (pspCAS9 LBB-24-GFP) This Study TEX10 pAS2828 px458 (pspCAS9 LBB-24-GFP) This Study WDR18 pAS2829 px458 (pspCAS9 LBB-24-GFP) This Study LAS1L pAS2830 px458 (pspCAS9 LBB-24-GFP) This Study SENP3 pAS2831 px458 (pspCAS9 LBB-24-GFP) This Study

62

2.12.3 Homologous recombination validation DNA from CRISPR-modified cells was isolated using DNeasy blood and tissue kit (Qiagen), and tested using colony PCR (section 2.7.3‎ ) with appropriate primers (Table 2‎ .18). Gene specific primers were designed to anneal outside homology arms using Primer3 (Untergasser et al., 2012), and two additional primers were designed to anneal inside the insert. DNA was resolved in 1% agarose gel containing Ethidium Bromide, and imaged using the Bio Rad Gel-Doc system.

Protein from CRISPR modified cells (section 2.3.2‎ ) was used for Immunoblotting (section 2.6‎ ) with αFLAG antibody and αFLAG Immunoprecipitation (section 2.8‎ ) followed by immunoblotting with the relevant antibodies (Table 2‎ .5).

Table 2‎ .18 – Oligonucleotides - primers for CRISPR homologous recombination verification

Template gene Name Sequence 5’ – 3’ Purpose 5'_Rev ADS6295 CGTCCTGCAGTTCATTCAGG PCR verification 3'_For ADS6296 ACCGCTTCCTCGTGCTTTAC PCR verification SENP3_For ADS6512 CCCTGCTAACCTCCAACTCA PCR verification SENP3_Rev ADS6513 AAGAGACTGGCTGGAAGGTC PCR verification WDR18_For ADS6514 TGTGTTACAGGGCTCGGAG PCR verification WDR18_Rev ADS6515 GGCAACAGAGCAGAGTCAGA PCR verification PELP1_For ADS6378 GAGAGGTGGAGAGGGAAGG PCR verification PELP1_Rev ADS6379 ACGGACACCCTTTCATCTTG PCR verification TEX10_For ADS6382 AGAATGGCAAGAGGGAGAGG PCR verification TEX10_Rev ADS6383 GGTATTTTGGGGGTTGCTTT PCR verification LAS1L_For ADS6380 TTAACCAGGTGTTGGGTGTG PCR verification LAS1L_Rev ADS6381 AGGGAAGCCCACTCTCATCT PCR verification

63

2.13 Data analysis

2.13.1 SUMOylation and SUMO binding site prediction To predict SUMOylation consensus sites and SIMs the GPS-SUMO prediction algorithm (Zhao et al., 2014) was employed, with prediction threshold set to high.

2.13.2 Protein networks analysis Following mass-spectrometry, protein networks were screened using CRAPome v1.1 (Mellacheruvu et al., 2013), with high confidence defined as appearing in less than 60 out of 411 individual experiments represented in the database. Protein interactions were analysed using Metascape (Tripathi et al., 2015b) and plotted using STRING v.10 (Szklarczyk et al., 2015).

2.13.3 Immunoblot band quantification To quantify immunoblot bands, images were opened in ImageJ (Schindelin et al., 2012) and a rectangular area was marked around each band, encompassing the whole band width and containing at least x3 area around the band for background reduction purposes. A histogram of each band was used to quantify the area in each rectangle, and relative pixel percentage coverage was determined for each.

2.13.4 qPCR analysis For ChIP-PCR – we used a standard curve, and the DNA concentration calculated by Rotor-Gene Q software (Qiagen) multiplied by the the amount of samples we used in the PCR reaction.

For RT-qPCR - Relative transcript abundance was calculated using the ΔΔCT method (Livak et al., 2001) following the formula below, with the housekeeper referring to GAPDH and GoI referring to the gene of interest.

ΔΔCT = (CT GoI - CT housekeeper) knockdown - (CT GoI - CT housekeeper) control

Any relevant fold changes (FC) in mRNA levels between time points were calculated by:

FC = 2-ΔΔCT

64

2.13.5 Statistical analysis Statistical significance of image quantification results was determined using an unpaired Student’s T Test (Excel 2010, Microsoft Corporation).

2.13.6 Sequencing data analysis

2.13.6.1 Raw data analysis Initial raw data analysis was done by Ian Donaldson at the Bioinformatics Core Facility (University of Manchester).

RNA-seq - Following FASTQC quality control (Andrews, 2014), reads were aligned to human hg19 genome using STAR (Dobin et al., 2012). Reads were then counted into genes using HTseq (Anders et al., 2015).

ChIP-seq – Following FASTQC quality control (Andrews, 2014), reads were aligned to human hg19 genome using bowtie2 (Langmead et al., 2012). SAM output files were then sorted, removed of any blacklist regions and converted to BAM files using SAMtools (Genome Project Data Processing Subgroup et al., 2009). Peaks were called using MACS2 (Zhang et al., 2008b) with broad peaks setting.

2.13.6.2 Normalisation RNA-seq – HT-seq matrix was normalised in DEseq2 (Anders et al., 2010).

ChIP-seq – Normalised MACS2 files were used to count the number of reads in peaks (RIP) using BEDOPS (Reynolds et al., 2012). The resulting mean of bedgraph coverage was used to scale the un-normalised MACS2 files. Peaks were re-called using MACS2 (Zhang et al., 2008b) using broad peaks setting.

2.13.6.3 Differential expression analysis Following normalisation of RNA-seq reads, DEseq2 normalised reads were analysed for differential expression in the same program. Data sets from each knockdown were compared to control to determine up or down regulation effects, and to all other knockdown datasets as well. Analysis threshold was adjusted p-value of 0.05 and below, with 1.5 fold change compared to the control samples taken as up or down regulation, respectively.

65

2.13.6.4 Differential binding analysis Differentially bound peaks in ChIP-seq data were established using R DiffBind package (Ross-Innes et al., 2012). Correlation between samples was visualised using DeepTools (Ramírez et al., 2016). Peak overlap was determined as 30% overlap by intersect (BEDtools, (Quinlan et al., 2010))

2.13.6.5 Genome annotation ChIP-seq peaks were assigned to genomic regions and associated genes (based on nearest TSS method) using the HOMER annotatePeaks package (Heinz et al., 2010), with default parameters applied.

2.13.6.6 Data-set intersections RNA-seq datasets were intersected and static plots generated using UpSerR (Conway et al., 2017).

2.13.6.7 analysis Following the genomic annotations of peaks, lists of genes were assigned to categories based on known biological functions using GREAT (McLean et al., 2010), set to nearest gene or METASCAPE (Tripathi et al., 2015b) for gene ontology comparisons.

2.13.6.8 Motif enrichment analysis Motif enrichment in defined experimental groups was analysed using the HOMER findMotifsGenome package (Heinz et al., 2010).

66

2.14 Supplementary information S 1 – SUMO3 pull downs followed by mass spectrometry. Data table. http://dx.doi.org/10.17632/5v9mv5yrpy.2

S 2 – Differential expression analysis of RNA-seq of OE19 5FMC complex knockdowns. Data table. http://dx.doi.org/10.17632/326vpn2x5y.2

S3 – Differential binding analysis of OE19 SUMO2/3 ChIP-seq with SENP3 knockdown. Data table. http://dx.doi.org/10.17632/ggxs227h4z.3

S4 - Peak and read files. SUMO ChIP-seq OE19 (siSENP3), MCF10A: http://dx.doi.org/10.17632/wzkwr333vv.2 , SUMO ChIP-seq MCF10A EGF induction time course: http:// dx.doi.org/10.17632/zs5p25bn23.3 , RNA-seq with 5FMC complex knockdowns: http://dx.doi.org/10.17632/6nfpv4rz3v.2

67

3 Identification and verification of 5FMC complex as a multi- SUMO binding complex

3.1 Introduction Previous studies sought multi-SUMO binding proteins, demonstrating the ability of some proteins such as ZMYM2 to preferentially bind to SUMO3 multi-SUMOylated targets, rather than poly-SUMOylated ones (Aguilar-Martinez et al., 2015). In the same study, the recruitment of the complex known as five friends of methylated CHTOP (CHromatin Target Of PRMT1), or 5FMC was linked to binding to multi-SUMOylated proteins (Aguilar-Martinez et al., 2015). The 5FMC complex contains five proteins that include SENP3 (Nishida et al., 2000), a known SUMO protease, which was discovered in direct relation to ribosome biogenesis. This complex binds to CHTOP when it is arginine- methylated by PRMT1. The 5FMC complex is comprised of SENP3, PELP1, LAS1L, TEX10 and WDR18 (Fanis et al., 2012, Castle et al., 2012). Both LAS1L and PELP1 were identified as SUMOylated, and deSUMOylation by SENP3 is essential for their localisation (Castle et al., 2012). Recruitment of the 5FMC complex by CHTOP may enhance the ability of SENP3 to identify SUMO2 substrates and deSUMOylate them, resulting in stimulation of transcription of target genes (Fanis et al., 2012).

3.2 Results

3.2.1 Design of SUMO and multi-SUMO traps Previous studies have established that several multi-SIM-containing proteins preferentially bind to multiple single SUMO3 units conjugated to different sites on a protein rather than to SUMO3 chains (Aguilar-Martinez et al., 2015). Since SUMO-SIM interactions can be paralog specific (Hecker et al., 2006), and SUMO1 mostly does not form poly-SUMO chains, we sought to compare protein binding of singly and multi- SUMOylated targets of SUMO1 and SUMO3. In addition, to facilitate finding proteins that potentially bind to alternative sites on SUMO3, we chose to use a SUMO3 mutated on its SIM recognition site - multi-SUMO3AAA (mutated in I33A, K34A, R35A). To establish a system that identifies and compares single SUMO to multi-SUMO binding, we first employed glutathione-agarose beads to pull down known SUMO binding proteins from HEK293T whole cell protein extracts using Glutathione S-Transferase (GST) tagged SUMO

68 constructs as traps to represent mono-SUMO. GST-COMP-SUMO constructs, fusion proteins containing SUMO fused to GST through a linker region of coiled-coil pentamerisation domain of cartilage oligomeric matrix protein (COMP), create a scaffold which presents five copies of SUMO and thus mimics a multi-SUMOylated target (Figure 3‎ .1A) (Aguilar-Martinez et al., 2015). As expected, GST-COMP-SUMO proteins (Figure 3‎ .1B, lanes 4, 6 and 7) migrated as higher molecular weight (~42kDa) than GST- SUMO proteins (~37kDa) in denaturing polyacrylamide gels (Figure 3‎ .1B).

A Poly-SUMO3 trap, containing four covalently linked linearly arranged SUMO3 moieties with the same GST tag, was also employed in future comparisons to ensure the specificity of binding to multi-SUMOylated targets rather than to Poly-SUMO chains.

A. B.

Figure 3.1 - SUMO trap design. A. Schematic representation of GST-SUMO, pentameric GST-COMP- SUMO and GST-poly-SUMO constructs. B. GST-fused proteins were purified by glutathione agarose beads, and resolved by denaturing polyacrylamide gel electrophoresis. Proteins were visualised with Coomassie blue stain. GST-COMP (31kDa, lane 2) serves as the control scaffold, lacking SUMO, to eliminate non-specific binding interactions. Molecular weight marker was loaded to lane 1.

69

3.2.2 SUMO paralog-specific binding To test whether GST-COMP-SUMO constructs can bind to specific proteins, we chose to use three FLAG-tagged SUMO specific proteases (SENPs), as they are thought not to use a SIM to bind to SUMO and therefore will bind to all of our GST-SUMO scaffolds. Also, we expected that they will not be affected by the mutation introduced to the SUMO3AAA SIM binding domain for the same reason. Both wild type SENPs and SENPs mutated on their catalytic site which will not affect SUMO binding (SENPcss), were tested by pull down with GST-COMP-SUMO traps, and visualised by immunoblotting. The catalytic site mutation will allow SENP-SUMO binding, but will not be able to deSUMOylate proteins, thus demonstrating that the catalytic activity is not necessary for SUMO binding.

Binding reactions between SENPs 1 and 2 and all three multi-SUMO constructs were carried out to test for paralog specificity (Figure 3‎ .2A). SENP2 shows SUMO3 specificity, in accordance with previous studies (Reverter et al., 2004). As SENP1 binds to all three SUMO traps, it was chosen as control for the continuation of our experiments. A GST- COMP scaffold was used as control, to make sure no unspecific binding occurred. As expected SENP binding to the mutated- multi-SUMO3AAA was not affected by the SIM recognition site mutation (Figure 3‎ .2B).

A. B.

Figure 3.2 - SUMO paralog-specific binding. GST pull down assay was performed using GST-COMP- SUMO constructs as traps, pulling down FLAG-tagged SUMO specific proteases (SENPs) from transfected HEK293T cell extracts. Images were captured after immunoblotting with α-FLAG primary cs AAA antibody. A. SENP2 and SENP1 pull down with GST-COMP-SUMO1, 3 and 3 . B. SENP proteins pull down with GST-COMP scaffold and GST-COMP-SUMO3AAA.

70

3.2.3 SUMO trap system stability To test the stability of SUMO traps, GST-SUMOs were immobilised to beads, then incubated with FLAG-tagged SENP1-transfected HEK293T nuclear extract at different time points after trap preparation (Figure 3‎ .3A). Bands were quantified using Fiji (Schindelin et al., 2012) and normalised compared to multi-SUMO3 captured proteins, for clear visualisation as it was consistently high throughout all experiments (Figure 3‎ .3B). Normalisation to SUMO3 rather than the input was chosen, as input was inconsistent in intensity.

In these experiments, SENP1 shows higher affinity to multi-SUMO1 (lane 4) than to mono-SUMO1 (lane 3). Multi-SUMO3 pulls down significantly higher amounts of SENP1 than both multi-SUMO3AAA (P=0.002) and mono-SUMO3 (P=0.0005). Also, even when transfection efficiency is relatively low, as observed by the input in day 4, the traps still pull down SENP1, and keeping to the same observable pattern. In contrast to previous experiment in Figure 3‎ .2A, SENP1 bound to multi-SUMO3 more than to multi-SUMO3AAA.

The efficiency of the traps appeared to remain good up to the tested 4 days after protein purification, though the efficiency of multi-SUMO1 trap deteriorated more than multi- SUMO3 over time, it still showed overall good binding (Figure 3‎ .3C). Therefore traps were deemed usable for up to four days after production from here on, however in practice traps were used until day 2.

71

A. B.

P=0.12 P=0.0005 P=0.002

160

140

120

100.0 100 86.6 80

60

Binding Intensity Binding 32.8 40 27.6 19.5 20 5.3 4.0 2.0 0

C. 180

160

140

120

100 Day 0 80 Day 1

Binding Intensity Binding 60 Day 4 40

20

0

Figure 3.3 - SUMO-trap consistency. A. Top: Nuclear extract containing overexpressed FLAG-tagged SENP1 from transfected HEK293T cells was incubated with different SUMO traps at day 0 (day of preparation), plus 1 and 4 days after preparation of traps. Proteins were visualised by western blot using α-FLAG M2 primary antibody. Bottom: Ponceau S stain of the GST-tagged traps, day 4. Molecular weight marker was loaded to lane 1. B. Band quantification of the binding of SENP1 to each of the GST- fusion proteins is shown as a percentage relative to binding to GST-COMP-SUMO3 (taken as 100%). Data is the average of the three experiments presented in panel A, and error bars represent SD. P values for unpaired T test are indicated. C. Comparison of bands by day of use after trap production. Band quantification was done as in B. Multi-SUMO3AAA trap was marked in graphs B and C as COMP- SUMO3A due to Excel (Microsoft) software limitations.

72

3.2.4 Identifying multi-SUMO bound proteins To identify novel multi-SUMO–binding proteins, a pull down assay was carried out with HeLa nuclear extracts, using GST-COMP-SUMO1 (multi-SUMO1), GST-COMP-SUMO3 (multi-SUMO3) and mutated GST-COMP-SUMO3AAA (multi-SUMO3AAA) as baits. Multi- SUMO3AAA, a multi-SUMO3 mutated on its SIM recognition site, was employed in order to find proteins binding to alternative sites on the SUMO3 protein.

Specific interactions with multi-SUMOs were identified by first removing nonspecific binding proteins. This was done by incubating the nuclear extract with glutathione- agarose bead-immobilised GST-COMP, and then incubating the remaining proteins in the supernatant with multi-SUMO traps bound to agarose beads (Figure 3‎ .4A). After extensive washing, the remaining bound proteins were eluted from the beads by cleaving the GST tag, shortly running on a SDS/PAGE gel for sample purification, and then eluting from the gel. Bound protein identities were then determined by mass spectrometry with positive set to at least 2 unique peptides, screened for background proteins using the Contaminant Repository for Affinity Purification - CRAPome (Mellacheruvu et al., 2013). CRAPome aggregates bait-independent negative controls from multiple Affinity purification coupled with mass spectrometry experiments. We chose a threshold of 100/411 experiments represented in CRAPome, and plotted specifically binding proteins – proteins that appeared in less than 100 other experiments using STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) (Szklarczyk et al., 2015).

In two separate mass spectrometry repeats (see supplementary table S1)- one done with a GST-COMP clearing column and one without GST-COMP clearing column, a total of 369 proteins were identified as bound to multi-SUMO3, out of which 206 were specifically binding, and 40 of those proteins appeared in both repeats. We combined our results with previous published multi-SUMO3 mass spectrometry results (Aguilar- Martinez et al., 2015) and found 19 proteins (including SUMO3) that appear in all three (Table 3‎ .1, Figure 3‎ .4B). 11 of these have been detected in at least three published SUMO-binding experiments (Golebiowski et al., 2009b, Hendriks et al., 2014, Hendriks et al., 2017, Impens et al., 2014, Tammsalu et al., 2014, Xiao et al., 2015). Several members of the SUMO conjugation/deconjugation machinery, namely PIAS1 – a SUMO E3 ligase,

73

SENP2 – SUMO protease and PML – known to be SUMO modified, all appear in our experiments. ZMYM2 and all members of the 5FMC complex (Figure 3‎ .4B, highlighted in orange) were found bound to multi-SUMOylated targets. A Metascape analysis (Tripathi et al., 2015a) indicated that multi-SUMO3 interacts with proteins involved mainly in DNA metabolic processes. A known chromatin regulator was also found as multi-SUMO bound, SETDB1, though it has been indicated to be SUMO-bound in several mono- SUMO-binding studies as well. We could only find single-SUMO-bound TEX10, SENP3 and WDR18 in one other publication, only under stress conditions (Hendriks et al., 2017). PELP1 was found to be SUMO-bound but not SUMOylated both in heat-shock and control samples (Golebiowski et al., 2009b), and an RNF4 pull down of Poly-SUMO conjugates followed by mass spectrometry found only PELP1 out of the 5FMC complex to be poly-SUMOylated (Bruderer et al., 2011). LAS1L was indicated as SUMO-bound in several studies under various conditions (Becker et al., 2013, Golebiowski et al., 2009b, Hendriks et al., 2015a, Hendriks et al., 2014, Hendriks et al., 2017, Hendriks et al., 2015b, Impens et al., 2014, Lamoliatte et al., 2014, Schimmel et al., 2014, Tammsalu et al., 2014, Xiao et al., 2015). Desmoglein-1 (DSG1), a major component of the desmosome cell-cell junction, is the only protein found uniquely in multi-SUMO binding studies to the best of our knowledge.

362 proteins were identified as bound to multi-SUMO1, out of which 177 proteins were specifically binding (CRAPome), but only 2 proteins appeared in both experiments (Figure 3‎ .4C). PIAS2 is a known E3 SUMO-protein ligase, and Filaggrin-2 (FLG2), which contains a SIM site (GPS-SUMO analysis) and is involved in epithelial homeostasis. This suggests either that few proteins bind to multi-SUMO1, or that binding exists but is of brief duration.

Proteins that potentially bind to an alternative binding site on SUMO3 - 395 proteins were identified as bound to multi-SUMO3AAA out of which 224 were high confidence and 50 appeared in both repeats (Figure 3‎ .4D). As we postulated that proteins bound to alternative binding sites should be found to both wild type and mutated SUMO3, we looked for proteins that appeared in both experiments. Out of the 50 proteins found bound to multi-SUMO3AAA, 15 proteins were found bound to multi-SUMO3 as well. Within these 15 proteins, along with DNA repair proteins such a RFC4 and RAD50, we

74 found proteins related to mitosis (NUMA1), replication (RFC1 and RIF1) and proliferation (MKI67). Since the mutated SUMO3AAA does not bind proteins through their SIM, we performed scans for common motifs among the 15 multi-SUMO3AAA bound proteins. Such a motif would represent a potential alternative SUMO-binding site. However motif search using MEME (Bailey et al., 2009) produced no common feature within these proteins. GPS-SUMO analysis revealed that all but 2 of these proteins (MKI67 and ABCF2) are predicted to contain a SIM site.

Table 3‎ .1 – Multi-SUMO3 affinity purification followed by mass-spectrometry. Summary of two repeats and previously published multi-SUMO3 binding data with at least 2 unique peptides per experiment. Mass spectrometry results were screened for specific binding using CRAPome. Right column represents presence of protein in other mono-SUMO3 binding studies, as collected by Hendriks et al. (2017). Members of the 5FMC complex highlighted in yellow.

Official Gene Symbol Peptide count Presence in other Repeat 1 Repeat 2 Aguilar- SUMO3 binding Martinez et al. studies (study count) (2015) MDN1 85 96 199 1 ZMYM2 53 29 32 8 RAD54L2 19 18 3 7 KDM1A 16 4 10 5 LAS1L 13 14 41 5 PELP1 12 12 55 1 ATF7IP 12 5 9 3 TEX10 11 10 58 1 WDR18 8 7 21 1 RCOR1 8 2 2 6 BLM 7 16 10 7 SETDB1 7 5 4 10 SENP3 6 9 62 1 PML 6 3 13 10 SUMO3 5 2 40 11 SENP2 5 2 19 1 NOL9 5 2 17 1 PIAS1 4 3 4 8 DSG1 2 9 3 0

75

A. C. Multi-SUMO1

B. Multi-SUMO3

Figure continued on the next page

76

D. Multi-SUMO3AAA that were also found in multi-SUMO3

Figure 3‎ .4 - Identification of multi-SUMO bound proteins by mass spectrometry. A. Schematic representation of the experimental procedure. Nuclear extract was first incubated with GST (purple)- tagged COMP (orange) immobilised to glutathione-agarose beads (green) to remove non-specific binding. Remaining proteins were then transferred to GST-COMP-SUMO (blue) traps. Bound proteins were resolved with 12% SDS/PAGE gel, eluted and determined by mass spectrometry. Mass spectrometry data was screened by CRAPome v.1.1 with specific binding defined as appearing in less than 100 different IP experiments out of 411 documented in the program. B-D. Network analysis of B. Specifically bound multi- SUMO3 proteins with 5FMC complex proteins highlighted in orange (proteins appearing in both repeats and in Aguilar-Martinez et al. (2015)), C. multi-SUMO1 bound proteins (2 repeats) and D. proteins bound AAA to multi-SUMO3 that were also found bound to multi-SUMO3 (2 repeats). Protein interactions were plotted by STRING v.10, with line thickness indicating strength of data support.

77

Due to the high confidence of our results and the recurrent appearance of the 5FMC complex in the context of multi-SUMO3 bound proteins, we decided to focus on this complex. Proteins from multi-SUMO3 data set, such as the 5FMC complex proteins and ZMYM2, were also analysed by GPS-SUMO to find SUMO consensus binding sites and also SIMs. The discovery threshold was set to high and the predicted SIMs and SUMOylation sites are presented in Table 3‎ .2. The SUMOylation consensus sequence was defined as ψKxE. Of the 5FMC complex, all except WDR18 contain both SUMOylation sites and SIMs.

Table 3‎ .2 - Prediction of SIMs and SUMOylation sites by GPS-SUMO. Proteins pulled down with multi- SUMO traps were analysed by GPS-SUMO with discovery threshold set to high. Proteins of the 5FMC complex are highlighted; predicted SIM marked in blue and predicted SUMOylated lysine in red.

Protein Position Sequence P-Value Interaction type LAS1L 151 - 155 EVNIPDW IVDLR HELTHKK 0.041 SUMO Interaction 217 - 221 DQEEDKN IVVDD ITEQKPE 0.138 SUMO Interaction 241 KSTESDVKADGDSKG 0.027 SUMOylation Consensus 565 QGSVNDVKEEEKEEK 0.002 SUMOylation Consensus PELP1 498 PSAPKKLKLDVGEAM 0.031 SUMOylation Consensus 541 LMCGPLIKEETHRRL 0.002 SUMOylation Consensus 716 - 720 LGLSVPG LVSVP PRLLPGP 0.092 SUMO Interaction 788 - 792 ESDSDDS VVIVP EGLPPLP 0.003 SUMO Interaction 826 ASPPVPAKEEPEELP 0.027 SUMOylation Consensus 880 - 884 ALEEDLT VININ SSDEEEE 0.046 SUMO Interaction TEX10 45 IHLPEQLKEDGTLPT 0.019 SUMOylation Consensus 607 - 611 YDPQEGA VVVLP ADSQQRL 0.043 SUMO Interaction 740 - 744 TEAVFHS LLVIP ARSQNFD 0.075 SUMO Interaction WDR18 114 - 118 EVSTGNLLVILSRHYQDVS 0.093 SUMO Interaction 144 - 148 ISGGKDCLVLVWSLCSVLQ 0.089 SUMO Interaction 432 RFITRPAK******** 0.05 SUMOylation Partial SENP3 455 - 459 VDIFNKE LLLIP IHLEVHW 0.073 SUMO Interaction 505 YLQAEAVKKDRLDFH 0.021 SUMOylation Consensus ZMYM2 483 - 487 SKGAGNN VLVID GQQKRFC 0.042 SUMO Interaction 1107 SSKSVKLKEDLLSHT 0.035 SUMOylation Consensus

78

3.2.5 5FMC complex binds to multi-SUMOylated targets We proceeded to verify the recruitment of 5FMC complex members to multi- SUMOylated targets by the pull down method as described before. HEK293T cells were transfected with individual FLAG-tagged members of the 5FMC complex for ease of detection, and pulled down with SUMO3, multi-SUMO3 and poly-SUMO3 traps. In three separate repeats, members of the complex preferentially bind to multi-SUMO3 traps, rather than to single SUMO3 (Figure 3‎ .5A, B). While some complex members bind to poly-SUMOylated targets as well, there is a visible advantage to SUMO3 multi- SUMOylation in binding the complex members individually. TEX10 is difficult to overexpress, as has been observed before by other groups (Fanis et al., 2012). With that in mind, we could still see some multi-SUMO binding in one repeat. When we quantified the bands WDR18 and SENP3 stood out with the strongest binding to multi-, and to some extent poly-SUMOylation as compared to control (Figure 3‎ .5B). LAS1L binds more consistently to multi-SUMOylated targets then to poly-SUMOylated ones. WDR18 runs at a similar size to that of the COMP-SUMO3 trap and therefore is hard to visualise when multi-SUMO3 is present, because of cross reaction with the GST fused protein. However, it binds to multiple SUMOs more than to the mono-SUMO trap. PELP1 binding was inconsistent. SENP3, the active unit of the complex, binds to all SUMO entities, and more strongly to multiple SUMOs. We presume the ability to bind to multi-SUMOylated targets is directly related to the proteins bound together as a complex, with the possible exception of SENP3 in its capacity as a SUMO protease. Therefore, it seems that the overexpressed complex members are incorporated successfully into the complex. We will further examine this possibility in the following chapters.

79

A.

GST:

PELP1

TEX10

WDR18

LAS1L

SENP3

1 2 3 4 5

B. 12 PELP1 TEX10 10 WDR18 LAS1L 8 SENP3

6

% inputof 4

2

0 COMP SUMO3 multi-SUMO3 poly-SUMO3

Figure 3.5 - Multi-SUMOylation of the 5FMC complex. A. αFLAG antibody immunoblot of SUMO-trap pull down experiment using FLAG tagged over-expressed 5FMC complex proteins. B. Band intensity quantification of the binding of each member of 5FMC complex to each of the GST-fusion proteins is shown as a percentage relative to binding to input (lane 1, at 3%). Data is the average of three repeats for WDR18, LAS1L and SENP3, and two repeats for PELP1 and TEX10. Error bars represent SD for three repeats . Proteins were visualised by western blot using FLAG antibodies.

80

3.2.6 5FMC behaves as a complex In order to determine whether PELP1, LAS1L, TEX10, SENP3 and WDR18 behave as a complex, we used immunoprecipitation method. We observed that all overexpressed members of the complex appear together with αPELP1-IP (Figure 3‎ .6A, boxed in red). As LAS1L and TEX10 run as the same size, they are hard to distinguish. The same was verified with LAS1L and SENP3 antibodies. WDR18 runs at the same size as the antibody heavy chain, which makes it hard to distinguish when using same species for detection. We could not find a reliable TEX10 antibody that functioned well in immunoprecipitation experiments (Figure 3‎ .6B, lanes 3-6). As SENP3 is a SUMO protease, one of several known SUMO proteases of the same family, we decided to extend our IP experiment to include SENP2 as well, as we identified it in our mass spectrometry experiments (Figure 3‎ .4B & Table 3‎ .2). We could see three of the complex members when immunoprecipitating with SENP2, and we could not detect SENP2 in any of the other members of the 5FMC complex IPs. We therefore asserted that a connection to the 5FMC complex might be possible. A knockdown of SENP3 may allow for replacement by SENP2, however of the SENP family, SENP5 is the closest in function to SENP3 (Kunz et al., 2018), and thus also bears for further exploration, though we opted not to pursue this matter further at this point.

A. B.

Figure 3.6 – Co-immunoprecipitation of 5FMC complex. A. αPELP1 antibody immunoprecipitation of

overexpressed FLAG- tagged 5FMC complex proteins co-transfected in HEK293T cells. Immunoblotted with αFLAG antibody. 5FMC complex members boxed in red B. Immunoprecipitation with each of the 5FMC complex member antibodies and of SENP2 antibody, visualised by immunoblot with the same antibodies. 81

3.2.7 All 5FMC complex members are necessary for binding to multi-SUMO Having verified that the complex preferentially binds to multi-SUMOylated targets, we proceeded to pull down nuclear extracts in order to validate the binding capabilities of the endogenous 5FMC complex. However, owing to low levels of expression we could not visualise the results, and we resorted to co-transfecting all five members of the complex to the cells. This allowed us to also find out what happens when some subunits are missing. FLAG tagged members of the complex were co-transfected into HEK293T cells either as a mix of all five or with one member missing. Figure 3‎ .7A shows a schematic representation of the experiment, with SENP3 as the missing unit. All combinations of four complex members were tested. A pull down experiment was performed as before. In this series of experiments we observed that all complex members together will only pull down when all of them are over expressed (Figure 3‎ .7B, lanes 2-4). With SENP3 missing, for example, no pull down is observed for any of the complex components (Figure 3‎ .7B, lanes 6-8), and SENP3 in itself is pulled down in almost all experiments, and in every protein combination as previously observed. As we have previously seen that overexpression of one complex member results in multi- SUMO binding (Figure 3‎ .5A), we found this result surprising. We speculate that the interaction between overexpressed and endogenous complex members is possibly interrupted by the imbalance of complex member quantity.

A complete penta-transfection is difficult to achieve, sometimes resulting in only partial transfections and inconclusive results. We therefore decided not to pursue this line of investigations. Thus, 5FMC complex integrity is essential for its multi-SUMOylation; however this could only be seen with the proteins tagged and overexpressed due to the low expression levels in the cells.

82

A.

B.

Figure 3.7 – 5FMC complex binding to multi-SUMO traps. A. Schematic representation of pull down experiment procedure. Transfection of all 5 FLAG tagged members of the 5FMC complex or a combination of 4 of them (here with SENP3 missing) into HEK293T cells, and then pull down with GST fused SUMO3 constructs and immunoblotting with αFLAG antibody. B.

Immunoblot of the experiment described in A. Proteins were visualised by western blot using αFLAG M2 primary antibody.

83

3.2.8 Determination of which sub-units provide multi-SUMO binding activity In order to determine which 5FMC complex unit is responsible for multi-SUMO binding, and to try and assess whether complex recruitment to multi-SUMOylated targets is affected by specific SIMs, we decided to manipulate SUMO-binding and SUMOylation capabilities of complex members. We chose to begin with WDR18, as it is small, easy to over-express and detect, and has a highly conserved WD repeat domain (Figure 3‎ .8A). To that effect, we created three different mutations, all designed to keep the WD domain intact.

The first mutation was to truncate the C-terminal part (WDR18ΔC) of the protein, since it contains a lysine in a partial conserved SUMOylation site (predicated by GPS-SUMO, Table 3‎ .2), this was achieved by using PCR to amplify the region encoding aa 1-307 of WDR18, using appropriate primers (Table 2‎ .3). The Second mutation was to truncate the beginning of WDR18 (WDR18ΔN, region encoding aa 23-432). This was achieved by copying WDR18 from position 73 on, and adding a start codon in the appropriate location. The last construct contained full length WDR18 to which we introduced mutations at a putative SIM site (Table 3‎ .2), taking care not to affect the β-sheet configuration in the process. We chose to replace Leucine at position 114 and Isoleucine at position 116 with Alanine, creating WDR18114A,116A. We did not mutate the second SIM at this stage.

FLAG-tagged mutated WDR18 constructs and a wild type WDR18 were transfected into HEK293T cells; then pulled-down with GST fused SUMO3 constructs and blotted with αFLAG antibody as before (Figure 3‎ .8B). In 6 repeats, analysis of immunoblot bands showed reduced binding when this SIM is mutated. Also, though there was little effect of C-terminal truncation, removal of the start of the protein improved multi-SUMO binding in some cases. Both truncation mutations did slightly decrease poly-SUMO binding.

We next tried to co-transfect all WDR18s in combination with 5FMC plasmids, however co-transfections were inconsistent, and therefore pull down results inconclusive. We decided discontinue this course of investigation.

84

A. B.

Figure 3.8 - WDR18 recruitment to 5FMC complex is not SUMO mediated. A. Phyre2 (Kelley et al.,

2015) analysis of WDR18 constructs set to normal, presented through Pymol (Schrödinger, 2015). SIM predicted by GPS-SUMO marked in black. B. αFLAG antibody Immunoblot of SUMO trap pull-down experiment using FLAG tagged over-expressed WDR18 proteins in HEK293T cells.

85

3.2.9 Investigating interactions between multi-SUMO3 and in vitro 5FMC complex components For further investigation of SUMO binding, we chose to express the FLAG-tagged constructs in vitro. In this manner we can produce a mammalian protein in a cell free rabbit reticulocyte system quickly and efficiently.

In the first series of experiments all proteins were well expressed and tested for binding to GST fused multi-SUMO3 traps; however, no SUMO binding could be seen. We speculated that the large fluorescent tagged Lysine residue used for detection might be interfering with either protein folding or SUMO binding, and decided to omit the

FluoroTect™ GreenLys tRNA and use the FLAG-tag for immunoblot detection as this has proved to work in cellular environments (Figure 3‎ .5). There was no change in the results – while protein was produced, no SUMO-binding could be seen, except sometimes with SENP3 (Figure 3‎ .9A). In 12 experiments, producing all possible combinations of the complex member together or separately producing the same proteins then mixing them together, no SUMO3 pull-down could be achieved, though protein production was fairly consistent (example combinations in Figure 3‎ .9B). Noting that again, without a complete 5FMC complex we see no SUMO binding, we opted to discontinue this direction of investigation.

86

A. B.

Figure 3.9 – Multi-SUMO binding of in vitro produced 5FMC complex components. A. Immunoblot of GST fusion multi-SUMO trap pull down experiment with in vitro FLAG tagged 5FMC complex proteins. B. Immunoblot of GST fusion multi-SUMO trap pull down experiment with in vitro FLAG tagged 5FMC complex proteins co-produced in two combinations. Lanes 1-3 PELP1, SENP3 and WDR18 co-produced. Lanes 4-6 PELP1, LAS1L and SENP3 co-produced. Proteins were visualised by western blot using α-FLAG M2 primary antibody.

87

3.2.10 Endogenous tagging of the 5FMC complex As endogenous expression of 5FMC complex members is very low and detection antibodies proved problematic, we decided to use clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system in order to add a FLAG tag to the complex proteins, thus making detection easier owing to good αFLAG antibody affinity, and also possibly enabling chromatin immunoprecipitation experiments down the line. CRISPR technology for genome editing enables direct manipulation of genomic sequences in a relatively simple way (Cong et al., 2013, Savic et al., 2015). A 3xFLAG tag was modified to reduce sequence repeats, and targeted to the 3’ end of the endogenous loci of each of the 5FMC complex components using Cas9 and appropriate gRNA guides (designed using online software tool CRISPR.mit.edu (Hsu et al., 2013)) (Figure 3‎ .10A). We chose to use a gBlock containing three FLAG epitope sequences, followed by 2A self-cleaving peptide sequence (P2A) and neomycin resistance gene (adapted from Savic et al. (2015)). We chose to tag the 5FMC complex in OE19 - oesophageal carcinoma cell line, which expresses all 5FMC complex members (Figure 4‎ .1B), we will elaborate on this choice of cell line in section 4.2.1.

Neomycin selection for transfected cells resulted in several colonies of SENP3 targeted cells. These were tested using PCR amplification with specific forward primers (Table 2‎ .18) designed to anneal outside the homology sequence (Figure 3‎ .10A, marked in green), and an insert specific reverse primer. Two colonies showed an insert present at the expected size (Figure 3‎ .10B). The colonies were then tested for FLAG presence using immunoblots with αFLAG antibody (Figure 3‎ .10C), and the positive colony was used in αFLAG immunoprecipitation followed by SENP3 specific antibody detection (Figure 3‎ .10D), confirming the FLAG tag presence on the SENP3 protein.

At this point, despite attempting to tag all complex members, only one of the 5FMC complex members was successfully tagged and verified. However, this experiment is still ongoing at the time of writing this thesis. As OE19 cells are notoriously hard to transfect with large inserts, we resorted to use nucleofection for that purpose, which affected survival rates of the cells. This makes the transfection and selection process more time consuming. However, we believe that stable FLAG-tagged cell lines would prove

88

invaluable for 5FMC complex continued investigation, and as this method has proved successful, we will continue to pursue it.

A.

B. C.

D.

Figure 3.10 – Overview of CRISPR experiment. A. Schematic of CRISPR experiment design, showing gBlock insert (top) containing 3xFLAG insert, P2A self-cleavage region and neomycin resistance sequences flanked with gene-specific homology arms, and CRISPR mediated recombination targeted to the Stop codon region of the gene (bottom). Verification primers marked in green. B. PCR of 3 CRISPRed SENP3 OE19 cell colonies using SENP3 specific forward primer and insert specific reverse primer.

C. Anti-FLAG immunoblot of SENP3 CRISPR cells. D. Anti- FLAG immunoprecipitation followed by immunoblot for αSENP3 antibody detection.

89

3.3 Discussion

3.3.1 Binding of the 5FMC complex to multi-SUMO Previous mass spectrometry (MS) – based studies have found over 6000 SUMO2/3 targets, over half of those contain more than one SIM (data collated by (Hendriks et al., 2017)). Some proteins were shown to be SUMOylated on multiple sites, even as many as 40 lysines on a single protein (Hendriks et al., 2017). The possibility of specific recognition of multi-SUMOylated proteins has also been explored (Aguilar-Martinez et al., 2015) and has identified preferential multi-SUMO binding in some proteins.

Setting out to find multi-SUMO bound protein targets, we have found a myriad of proteins bound to multi-SUMO3 (Figure 3‎ .4B), several of which were relatively unique to multi-SUMOylation. We chose to continue with one complex - the five friends of methylated CHTOP – or 5FMC, a deSUMOylating complex implicated in ribosome biogenesis and transcription regulation (Fanis et al., 2012). 5FMC complex members are only seldom found in mono-SUMO2/3 pulldowns (data collected by Hendriks et al. (2017)), yet they appear together in two different Hela nuclear extract pull-downs with our multi-SUMOylated constructs and in a previously published multi-SUMOylation screen (Aguilar-Martinez et al., 2015). We scanned for predicted SIM sites and found at least two of them on three of the complex members. We speculated that when formed into a complex these will account for the multi-SUMOylation preference. Having verified the mass spectrometry results and compared them to binding to both single SUMO3 and poly-SUMOylated scaffolds, we observed that in most cases 5FMC complex members do preferentially bind to multi-SUMOylated targets (Figure 3‎ .5). Moreover, we observed that multi-SUMO3 binding is affected by the integrity of the complex – resulting in reduced binding when parts of the complex are missing.

We created several mutated WDR18 constructs in order to create a disturbance either in complex formation or in complex recruitment to multi-SUMO3. While removal of a section of the start of the protein, before the WD repeat, sometimes increased multi- SUMO3 binding, we did not see similar effects in other WDR18 mutations. As the GST- SUMO traps are of similar sizes to that of the WDR18 proteins, it was at times hard to

90 quantify the effects of the introduced mutations on SUMO binding. We believe re- visiting these mutations is in order, but did not do so in the progress of this project.

We also attempted to individually synthesise the complex members and form the whole complex in a rabbit reticulocyte cell-free environment, but found that while we could produce the proteins, we could not demonstrate multi-SUMO binding. As individual transfection in a cellular environment has produced binding before (Figure 3‎ .5A), we speculate that the presence of endogenous complex members in the cellular environment has allowed SUMO binding in previous experiments. We therefore opted to try and co-produce the complex as a whole, and also all the possible combinations of complex members in the same system, in order to find which composition of the complex is essential for multi-SUMO binding. We could not detect any SUMO binding in these experiments, suggesting the possibility of either lack of folding in-vitro, or additional proteins or PTMs needed to accomplish 5FMC complex formation. 5FMC complex formation could have been tested using one of the 5FMC complex member antibodies for immunoprecipitation followed by αFLAG immunoblot, in which co- precipitation of the 5FMC complex members would determine whether the complex can be formed in-vitro.

In the interest of time we chose to not to continue with complex manipulation outside the cellular system and try to address the question of 5FMC complex integrity using siRNA knockdown instead.

3.3.2 Few proteins bind to multi-SUMO1 In the same mass-spectrometry experiment (Figure 3‎ .4C), only two proteins were found as significantly bound to multi-SUMO1 targets, one of which is an epithelial protein (FLG2), and the other – PIAS2 – a SUMO E3 protein ligase. As epithelial proteins might be introduced as contamination during the experimental process, we chose not to investigate FLG2, even though it was not disqualified by the stringent CRAPome threshold we introduced. PIAS2, as part of the SUMOylation cascade, is a likely candidate for further study. As binding to SUMO1 in the context of multi-SUMOylation seems very rare, or possibly non-existent, we decided not to pursue this line of investigation further.

91

3.3.3 Several potential SIM-independent binding proteins could be detected using mass spectrometry We were also looking for protein binding to mutated multi-SUMO3. This SUMO3 is mutated on its SIM interaction site, and therefore should not bind proteins through recognising the SIM binding surface of SUMO. Thus, any protein identified with this construct will likely be bound through an alternative SUMO interacting motif. Such proteins might be SUMO proteases, for example, as these anchor on SUMO on a site that differs from the SIM recognition site (reviewed in (Mukhopadhyay et al., 2007)). Employing stringent screening, to ensure we only find alternative sites, we performed a scan for common motifs among the proteins found bound to both multi-SUMO3AAA and to WT multi-SUMO3, as a multi-SUMO bound protein appearing in both might contain an alternative binding site. We could not find a common motif among these proteins. However, as many of these proteins are implicated in DNA repair and cell growth, we suspect that this bears further investigation. Also, alternatively bound proteins might bind to singular SUMO moieties, and therefore the interference of our multi-SUMO constructs might impede our results. Therefore a mono-SUMO3AAA pull-down and mass spectrometry might have increased the number of proteins we could find, though we did not pursue this for lack of sufficient time.

3.3.4 Summary, limitations and conclusions In summary, the results from this study indicate that some proteins can preferentially bind to multiple SUMO3 entities, either in the form of poly-SUMO chains or multi- SUMOylated targets. While we found only little evidence for multi-SUMO1 bound proteins, we believe this merits further investigation.

In the search for alternative SUMO-binding sites by introducing a mutation to the SIM recognition site we found several proteins, with no common motif to group them. We deduced that alternative binding sites do possibly exist, since we found proteins bound to multi-SUMO3AAA. As we noticed that multi-SUMO trap pull-downs did not to generate as many proteins as single-SUMO traps found in literature, we thought that the presence of multiple SUMOs might interfere with SUMO binding. This interference might be similar in the case of alternative binding sites. Therefore we thought that repeated MS experiments, with single mutated SUMO3AAA trap, possibly even replacing the GST with

92

His tag as GST created a dimer, will be needed to further explore alternative binding sites. However, we decided this was a matter for a different project, and did not continue investigating it further.

Of the proteins we found bound to multi-SUMO3 we chose to explore the 5FMC complex as its members rarely appear in connection with single SUMO moieties. We have established that the 5FMC complex is only attracted to multi-SUMOylated targets when it is fully assembled, only pulling down with multi-SUMO targets if all parts of the complex are present in the cell. We reasoned that the complex might need additional modifications in order to fold or to form, as we were not able to create the complex in a cell-free environment.

We also looked at the ability of another member of the SENP family to replace the endogenous SUMO protease – SENP3 normally associated with the complex. SENP2-IP resulted in three complex members appearing on the blot, which might indicate a possible replacement of the SENP unit of the complex. A SENP3 knockdown experiment followed by a SENP2-IP might produce further information regarding this possibility.

Despite these observations, several limitations to this study can be pointed out. We could only compare our multi-SUMO pull down followed by mass spectrometry results to mono-SUMO studies found in the literature (Hendriks et al., 2017), as we did not conduct these experiments ourselves. Also, despite out best efforts, we could not find any mass spectrometry pull down experiments done with poly-SUMO chains to compare our results to. While we made sure to verify the difference in SUMO binding in our experiments, a full database of proteins bound to poly-SUMOylated targets would have helped our initial analyses.

We began manipulating complex members, transfecting mutated WDR18 into cells, however the methods we employed did not allow for reliable co-transfections, and we decided not to continue these experiments. Further mutations should be done in a cellular environment, in order to see the effects of each mutation on the overall formation and function of the complex within the cell.

93

Detecting of endogenous 5FMC complex members proved difficult to achieve, as not all members have reliable antibodies, and some have very low expression rates. In the absence of good antibodies we had to mostly work with over expressed proteins in cells, which might change the balance of regular cellular function. Also, penta-transfection was hard to achieve, and required endless repetition and long transfection times. We believe that creating cell lines with 5FMC proteins endogenously tagged might make researching this complex more manageable.

We have begun endogenously tagging the complex members, in order to create stable cell lines and help with complex detection and possibly also with chromatin immunoprecipitation (ChIP) experiments in the future.

In conclusion, to better understand the importance of complex integrity and its importance to recruitment to multi-SUMOylated targets, the following experiments are required:

1. Continued efforts to endogenously tag 5FMC complex proteins to enable better visualisation of the complex in a cellular environment. 2. In-vivo manipulation of complex members, including introducing mutations in different SIMs will enable better understanding of the importance of multi- SUMOylation to complex recruitment.

Another way of looking at the effects of 5FMC complex recruitment to the genome is to examine the effects of the complexon transcription. As we have established that a disruption of the complex integrity affects its ability to be recruited to multi-SUMOylated targets, we chose to map the effects of a knockdown of members of the complex on the whole transcriptome. Results from this experiment are discussed in the next chapter.

94

4 The effects of 5FMC complex on transcription

4.1 Introduction It has been previously established that SUMOylation of transcription factors often has a negative effect on their activation, as it attracts repressive complexes (Stielow et al., 2008b). More specifically, the recruitment of 5FMC to CHTOP was predicted to deSUMOylate transcription factors, thus removing repressive elements and activating transcription (Fanis et al., 2012). Within the 5FMC complex members, PELP1 - a nuclear receptor coregulator (Vadlamudi et al., 2007), was implied to be the core of the complex (Castle et al., 2012). SENP3 is a SUMO-specific protease (Nishida et al., 2000), LAS1L a nucleolar protein involved in 60S ribosomal subunit synthesis and maturation of 28S rRNA (Castle et al., 2010). TEX10 is testis expressed 10, depletion of which results in a G1 arrest and upregulation of P53 and P21 (Ding et al., 2015). WDR18 is less characterized as of yet.

Fanis et al. (2012) demonstrated that 5FMC is recruited to a methylated CHTOP. We found that the complex is recruited by a multi-SUMOylated target in vitro. CHTOP does not contain any SUMOylation sites, but it seems to recruit 5FMC to a SUMOylated ZNF148, containing multiple lysines in conserved sequences, in order to deSUMOylate it and activate transcription. As we have shown in the previous section, 5FMC complex seems to only be active when all members are present in the complex; we now turned to knocking down each of them in turn, and mapping the resulting transcriptome. In this way we could demonstrate the importance of each of the components of the complex, and also find out which processes this complex is regulating.

95

4.2 Results

4.2.1 Choice of cell lines Up to this point we have used HEK293T cells to over express the 5FMC complex proteins and test for complex members’ SUMO binding capacity, as these are easy to transfect and maintain. We have decided to continue our investigation with cancer cells.

To identify suitable cell lines which express all members of the 5FMC complex, we examined expression within cancer subtypes. The RNA-seq gene expression data generated by the TCGA Research Network (http://cancergenome.nih.gov/) were downloaded from the GDC data portal (Grossman et al., 2016), (https://portal.gdc.cancer.gov/) on 06/01/2017. The data contains the FPKM of all the genes in human genome (hg19) in cancer and normal tissue samples. After observing that the 5FMC gene expression appeared elevated in many cancer types, we selected all the samples from the oesophagus site, and divided into three sets, namely 80 adenocarcinoma samples, 80 squamous cell carcinoma samples, and 11 normal samples. The box plot in Figure 4‎ .1A, which was drawn in R (R Core Team, 2017), show s distribution of the FPKM in log2 scale of the TEX10, LAS1L, PELP1, WDR18, SENP3 and SUMO3 genes in the three sets. In the plot we also indicated by one or more asterisks if there is a statistically significant difference of the expression of one gene between any two of the three sample-sets in the cancer condition. In all samples there was elevated expression of the six genes compared to the normal samples.

Oesophageal adenocarcinoma (OAC) is one of the most rapidly increasing forms of cancer worldwide. It is 3-4 times more common in males, and is at the moment the 5th most common cancer in the UK. With obesity and acid reflux as the dominant risk factors in developed countries and a prognosis of about 20% 5 year survival rate (Coleman et al., 2018), we decided it posed a worthy target for investigation. The Oesophageal carcinoma cell line OE19, which expresses all 5FMC complex members (Figure 4‎ .1B), was derived of adenocarcinoma of gastric cardia/oesophageal gastric junction in a 72 year old human male. To add a perspective of normal cells, we have decided to also include MCF10A – a non-tumorigenic human breast epithelial cell line that responds to epidermal growth factor (EGF) treatment, thus enabling dynamics to be studied.

96

A.

) 2

FPKM(log

TEX10 PELP1 LAS1L WDR18 SENP3 SUMO3

B.

PELP1

TEX10

WDR18

LAS1L

SENP3

0 5 10 15 20 25

Expression level in FPKM

Figure 4.1 - Analysis of 5FMC and SUMO3 gene expression in Oesophageal RNA-seq samples. A. Gene expression data from 80 squamous cell carcinoma samples, 80 adenocarcinoma samples, and 11 normal samples was obtained from TCGA network and compared using R. Results presented as

FPKM (log2) distribution. Significant p-values marked as follows: one asterisk: p value < 0.05, two asterisks: p value < 0.01, three asterisks: p value < 0.001. B. Bar graph of OE19 cell line gene expression data deposited by the National institute of Health (NIH) genomic data commons, obtained from the European Bioinformatics Institute (EBI).

97

4.2.2 Establishing depletion conditions for the 5FMC complex We have shown that 5FMC complex needs all its components to enable its binding to multi-SUMO. Next, we asked whether each complex member is functionally equivalent. For that effect we decided to use small interfering RNAs (siRNA) to knockdown complex members, and then test whether gene expression was affected. We used pools of 4 siRNAs per gene (ON-TARGET plus, Dharmacon, Table 2‎ .14). We chose to transfect OE19 cells with siRNAs twice in 96 hours (at 0 and 48 hours), collecting both RNA and cell extracts at the end of the 96 hour period.

Following the 96 hour siRNA transfection, we tested the efficiency of the knockdown using RT-qPCR in order to demonstrate reduction in mRNA quantities of the relevant genes under the same conditions as above. Knockdown was over 95% effective in all complex members. We also tested whether a knockdown of one complex member affects the expression of the other complex members, to see if we can assess the extent of the relationship between complex members (Figure 4‎ .2A). We noted that SENP3 knockdown causes over expression of PELP1 and WDR18, possibly LAS1L as well, but the variation in results was too great to determine with certainty. Interestingly, the knockdown of WDR18 causes a highly significant (Pv~0.0002) reduction of SENP3 expression. A knockdown of LAS1L causes slight, though significant, reductions in expression of all complex members except SENP3. PELP1 knockdown possibly reduces expression of TEX10. PELP1 shows reduced expression levels after knockdown of LAS1L, and to a lesser extent TEX10 and WDR18, while SENP3 knockdown causes PELP1 expression to rise.

When verified by immunoblots, knockdown was visible to some extent with all complex members apart from WDR18, who we could not visualise by immunoblot at these low cell counts (Figure 4‎ .2B).

Overall, each member knockdown causes only small, mostly insignificant changes in expression of fellow complex members, with the exception of WDR18 knockdown causing a large reduction in SENP3 expression.

98

A.

3 PELP1 * TEX10

2.5 WDR18

LAS1L

2 SENP3

1.5

1 * * * * * ***

FoldChange mRNA expression ***

0.5 *** *** *** *** *** *** 0 siPELP1 siTEX10 siWDR18 siLAS1L siSENP3

Knockdown

B.

Figure 4.2 - siRNA knocks down 5FMC complex members efficiency. OE19 cells were transfected with siRNAs of each of the 5FMC complex members for 96 hours. Knockdown efficiency was verified by RT-qPCR (A) and immunoblot (B). A. Fold change in mRNA expression of each complex member relative to non- targeting control after siRNA knockdown. Graph represents 3 biological repeats, with error bars depicting SD. Significant p-values compared to control are marked as follows: one asterisk: p value < 0.05, two asterisks : p value < 0.01, three asterisks: p value < 0.001. B. Proteins were visualised by immunoblot using the relevant endogenous primary antibody. In SENP3 samples the bottom band represents SENP3 (marked with arrow).

99

4.2.3 5FMC complex member knockdown affects gene expression After verifying knockdowns by both immunoblotting and RT-qPCR (Figure 4‎ .2), we sent 3 biological replicates of each knockdown and siNT control for whole transcriptome sequencing (RNA-seq). All samples passed FastQC quality checks successfully (Andrews, 2014), then aligned to human hg19 genome and reads were assigned to genes. The resulting matrix was input into DEseq2 (Anders et al., 2010) where the reads in samples were normalised and differential expression was analysed (see supplementary data table S2). Further downstream analyses were done using UpSetR (Conway et al., 2017), as we found Venn diagrams to be inadequate for analysis of five sets of data efficiently. This process is summarised in Figure 4‎ .3. (See supplementary data S4 and tracks in http://genome-euro.ucsc.edu/s/Rotem/Thesis1).

Principal component analysis (PCA) of the differences in expression, performed by DEseq, shows that samples of the same conditions mostly cluster together, with the second principal component accounting for most of the variance. While LAS1L and TEX10 show little variance between experiment and control, PELP1 knockdown is responsible for most of the variance within the second principal component, and SENP3 and WDR18 knockdowns account for the variance in the first (Figure 4‎ .4A). WDR18 knockdown samples closely group together with SENP3.

After normalisation, we compared each set of samples to the controls looking at the fold-change of each gene against counts. The resulted MA-plot is used to check the quality of normalisation and visualise relationships between samples and controls. The plots emphasised that TEX10 knockdown highly correlates to control, and LAS1L also shows little difference from it (Figure 4‎ .4B). The effects of PELP1, WDR18 and SENP3 knockdowns are much more visible, and do not seem to have a tendency towards up or down regulation at an adjusted p-value < 0.1.

100

Figure 4.3 - Work flow of RNA-seq data analysis. Following next generation sequencing the analysis of RNA-seq data involved Quality control by FastQC, alignment to hg19 genome using STAR, counting reads into genes by HTseq, then normalisation and differential expression by DEseq. Subsequent downstream analysis followed.

101

A. 15 control

siPELP1 siTEX10 siWDR18 siLAS1L 5

%Variance siSENP3

14

-

2 PC

-5 -10 0 10 20 PC1 - 67% Variance

B.

PELP1

2

LAS1L

1 1

0 0

2

1 1

1 1

-

log fold change fold log

2

0 0 -

1e-01 1e+01 1e+03 1e+05

1 1

Mean of normalised counts -

change fold log

2 2 - TEX10 1e-01 1e+01 1e+03 1e+05

Mean of normalised counts

2

1 1

SENP3

0 0

2

1 1

-

log fold change fold log

1 1

2 2 -

1e-01 1e+01 1e+03 1e+05 0 0

Mean of normalised counts

1 1

-

change fold log

2 WDR18 - 1e-01 1e+01 1e+03 1e+05

Mean of normalised counts

2

1 1

0 0

1 1

-

change fold log

2 2 1e- -01 1e+01 1e+03 1e+05 Mean of normalised counts

Figure 4.4 – RNA-seq sample distribution. A. Principal component analysis of RNA-seq data from OE19 cells after siRNA knockdown of each of the indicated 5FMC complex members and control using the top 500 most variable genes indicated in the DEseq analysis. PC1 represents 67% of the total variance in the experiment data, and PC2 represents 14% of the total variance. B. MA-plots of each knockdown vs control. Genes with adjusted P-value < 0.1 marked in red. 102

Next we chose to look at all of the knockdown transcriptomes together and compare their differences against the controls. For that we plotted 10K genes, chosen by DEseq2 as the top most variable genes across samples, as a heatmap (Figure 4‎ .5A). Here we could see the resemblance between LAS1L and TEX10 knockdowns and control, with slightly different patterns emerging for the other complex members. To further investigate these differences, we narrowed our plot and focused on the top 200 genes that show the most difference in expression between samples as determined by row Z- score analysis (Figure 4‎ .5B). Here we could see a similar pattern to the previous one. The knockdowns of LAS1L and TEX10 have little effect on gene expression, and look similar to controls. On the other hand, WDR18 and SENP3 knockdowns give an almost reversed pattern of gene expression compared to control. We noted that PELP1 shows a unique pattern of up and down regulation, sometimes matching the SENP3/WDR18 pattern, and sometimes no different than the control.

We have previously noted a connection between SENP3 and WDR18 knockdowns (Figure 4‎ .2A), where a knockdown of SENP3 caused an increased WDR18 expression, whereas knockdown of WDR18 significantly reduced SENP3 expression. However, the effect of both knockdowns on gene expression is similar; suggesting they might work together or it might be due to the effect WDR18 knockdown has on SENP3 expression. The partial overlap of PELP1 knockdown effects on the transcriptome with that of SENP3 and WDR18 knockdowns suggest that in some cases PELP1 is also involved in controlling the same genes.

103

A.

RowZ -

Score

B.

Row Z Row

- Score

Figure 4.5 – Variable knockdown pattern produced by each member of the 5FMC complex. Hierarchical clustering of samples showing the variance across samples determined by row z-score and arranged by adjusted p-value. A. Top 10K genes. B. Top 200 genes. Heatmaps produced by DEseq2.

104

4.2.4 Each 5FMC complex member knockdown has a different effect on global gene expression We compared the differential gene expression levels produced by DEseq (each knockdown compared to control) in the five expression data sets using UpsetR (Lex et al., 2014). With cutoff of 0.05 FDR, we looked at all genes with at least 1.5 fold change, either up or down, after each of the knockdowns (Table 4‎ .1). Changes in gene expression following TEX10 and LAS1L knockdown are much lower than the other complex members’ knockdowns. SENP3 and WDR18 knockdowns each cause upregulation of over 2000 genes, and downregulation of a similar number of genes. PELP1 knockdown results in upregulation of 829 genes, and downregulation of 1345 genes.

Table 4‎ .1- Summary of 1.5 fold up and down regulated genes following each of the 5FMC complex knockdowns at FDR <0.05.

siPELP1 siTEX10 siWDR18 siLAS1L siSENP3

Downregulated 1345 107 2397 222 2147

Upregulated 829 26 2105 81 2312

We identified significant GO terms of the up and down regulated genes, then clustered them according to the subunit depleted. GO term analysis of 1.5 fold downregulated genes shows little similarities between TEX10 and LAS1L knockdown effects and the rest of the complex members’ knockdowns (Figure 4‎ .6A), which is to be expected considering the lack of differences between these samples and the control. The wide variety of pathways found in common to the other three members – PELP1, WDR18 and SENP3 knockdowns - implies this is a general activator/repressor complex, though a lot of the pathways indicated are of differentiation and development. As Tex10 is known to be active in mouse embryonic development and enriched in superenhancers (Ding et al., 2015), it is possible that the complex is more active during the embryonic stages, and that a knockdown of TEX10 in embryonic stem cell would produce more effect than what we have seen this far.

105

A. Downregulated genes

Actin cytoskeleton organisation Response to wounding Regulation of cell adhesion Regulation of secretion Tissue morphogenesis Negative regulation of cellular component organisation Chordate embryonic development Head development Blood vessel development Transmembrane tyrosine kinase signalling pathway Cellular response to hormone stimulus Cell morphogenesis involved in differentiation Regulation of vesicle –mediated transport Organelle localisation Cell-substrate adhesion Extracellular structure organisation Metabolism of RNA rRNA metabolic process DNA-templated transcription initiation Regulation of small molecule metabolic process siLAS1L siTEX10 siPELP1 siSENP3 siWDR18

B. Upregulated genes

Mitotic spindle checkpoint Respiratory electron transport Mitochondrial RNA metabolic process Neddylation Proteasomal protein catabolic process Mitochondrion organisation Nucleoside bisphosphate metabolic process Intraciliary transport Lipid biosynthetic process Cholesterol biosynthesis tRNA metabolic process Macromolecule depalmitoylation Lipoprotein metabolic process Carbohydrate derivative biosynthetic process Glycolipid biosynthetic process Transforming growth factor beta activation Membrane lipid metabolic process Acyl chain remodelling of PE TP53 regulates metabolic genes Lysosome siLAS1L siTEX10 siPELP1 siWDR18 siSENP3

Figure 4.6 – Comparative GO term analysis of changes in gene expression following 5FMC complex members’ knockdowns. A. Downregulated genes B. Upregulated genes. Metascape analysis performed with cutoff of FDR<0.05 and up/down regulation set at 1.5 fold change.

106

An analysis of the upregulated genes under the same restrictions does not come up with very significant enrichment for any pathway in any of the knockdowns, and no significant pathways could be found for TEX10 or LAS1L knockdown samples (Figure 4‎ .6B). However, there is again clustering of GO terms, mostly metabolic processes, for SENP3 and WDR18 knockdowns, some of which are implicated in PELP1 knockdown as well. The combination of these might indicate that the 5FMC complex is more active in gene regulation rather than in any specific pathway.

A look at ZBP-89 known target genes supplied us with a curios conundrum. Lymphocyte- specific protein tyrosine kinase (LCK), for example, known to be activated by ZBP-89 (Yamada et al., 2001), and thus expected to be down regulated when ZBP-89 is SUMOylated, indeed shows over 5 fold decrease when either PELP1, SENP3 or WDR18 are knocked down. However, no change is observed with the knockdowns of LAS1L or TEX10, indicating that perhaps these units are not essential for 5FMC complex activity in OE19 cells even though we have previously shown that all members are necessary for the complex formation. A similar result, though less pronounced, can be observed for P21WAF1, activated by ZBP-89 normally (Bai et al., 2000), and showing a 2 fold decrease with PELP1 knockdown, and slightly less for SENP3 knockdown. In contrast to this result stands Vimentin (VIM), known to be repressed by ZBP-89 interaction with Sp1 protein (Zhang et al., 2003), which shows a 3 fold increase with SENP3 and WDR18 knockdowns, but almost 10 fold decrease with LAS1L knockdown. No effect on Vimentin expression was seen for the other 5FMC complex member knockdowns (supplementary table S2).

We continued to compare the samples by looking for common up or down regulated genes following each of the knockdowns. 19 genes were found to downregulated in response to a knockdown in four members of the 5FMC complex (Figure 4‎ .7A) only one of which also reacted to TEX10 knockdown. Two genes, NOX1 and GYLTL1B, were upregulated in four out of five knockdowns (Figure 4‎ .7B). We then looked at a combined view of both up and down regulation response to each of the five knockdowns (Figure 4‎ .7C) in order to see if genes react differently to different complex members missing. Two genes reacted to all five knockdowns, and an additional 17 show either up or downregulation in response to four out of five knockdowns in different combinations (see gene list in Table 4‎ .2). Out of a gene set of about 7000 genes with over 1.5 fold

107 change either up or down from control, the majority of genes changed in expression in reaction to either WDR18 or SENP3 knockdowns, closely followed by gene expression response to PELP1 knockdown. Though some genes react differently to different knockdowns, for example 67 genes that are downregulated following PELP1 knockdown but upregulated following both SENP3 and WDR18 knockdowns (Figure 4‎ .7C), the majority of genes either down or upregulate in response to all knockdowns in the same way, possibly pointing to the relationship within the complex. Only 10 of the genes affected by four or more knockdowns (Table 4‎ .2, marked with *) appear in the top 200 most variable genes (Figure 4‎ .5B), and only 3 of those (Table 4‎ .2, **) are in the top 30.

These observations together with the pathway enrichment analysis possibly indicate that this complex is only active as a whole at certain times, possibly mainly during embryonic development. Also, some of its members, namely PELP1, WDR18 and SENP3 remain active at later stages as well in other configurations, though possibly in a partial complex as well.

108

A. Downregulated genes B. Upregulated genes

C.

size Interaction

TEX10 Upregulated LAS1L PELP1 SENP3 WDR18 TEX10 Downregulated LAS1L PELP1 SENP3 WDR18

Figure 4.7 - Comparison of gene expression datasets. A. 1.5 folds downregulated genes after each knockdown. B. 1.5 folds upregulated genes after each knockdown. C. Combination of up and downregulated gene expression, at 1.5 fold. Graphs created using R package UpsetR.

109

Table 4‎ .2 - Genes affected by knockdown of at least four members of 5FMC complex. Arrows mark gene response to knockdown relative to control. Rows containing genes downregulated by four knockdowns marked in light blue, rows containing genes upregulated by four knockdowns marked in light red. Downregulation arrows coloured for ease of viewing. Genes screened by FDR < 0.05 and gene expression at least 1.5 fold difference from control. Genes marked with * are in the 200 most variable group as sorted by adjusted p-value. ** Lowest 30.

Gene PELP1 TEX10 WDR18 LAS1L SENP3 Gene Description Gene Type

MYH4 ↑ ↓ ↓ ↑ ↓ myosin heavy chain 4 protein-coding AC007969.5 ↓ ↑ ↓ ↑ ↓ Ribosomal Protein S15 lncRNA 4 AQP2 ↓ - ↓ ↓ ↓ aquaporin 2 protein-coding ARC* ↓ - ↓ ↓ ↓ activity regulated cytoskeleton protein-coding associated protein B3GALT5 ↓ - ↓ ↓ ↓ beta-1,3-galactosyltransferase 5 protein-coding BRINP2 ↓ - ↓ ↓ ↓ BMP/retinoic acid inducible protein-coding neural specific 2 CCAT1* ↓ - ↓ ↓ ↓ colon cancer associated ncRNA transcript 1 (non-protein coding) CEACAM6* ↓ - ↓ ↓ ↓ carcinoembryonic antigen related protein-coding cell adhesion molecule 6 CLDN1 ↓ - ↓ ↓ ↓ claudin 1 protein-coding DENND2C ↓ - ↓ ↓ ↓ DENN domain containing 2C protein-coding HSPB6 ↓ - ↓ ↓ ↓ heat shock protein family B protein-coding (small) member 6 NCF2 ↓ - ↓ ↓ ↓ neutrophil cytosolic factor 2 protein-coding PIEZO2 ↓ - ↓ ↓ ↓ Piezo Type Mechanosensitive Ion protein-coding Channel Component 2 PMP22 ↓ - ↓ ↓ ↓ Peripheral Myelin Protein 22 protein-coding PPP1R15A* ↓ - ↓ ↓ ↓ protein phosphatase 1 regulatory protein-coding subunit 15A SERPIND1 ↓ - ↓ ↓ ↓ serpin family D member 1 protein-coding SLC45A3 ↓ - ↓ ↓ ↓ solute carrier family 45 member protein-coding 3 SPRR1B** ↓ - ↓ ↓ ↓ small proline rich protein 1B protein-coding SPRR3* ↓ - ↓ ↓ ↓ small proline rich protein 3 protein-coding VGF** ↓ - ↓ ↓ ↓ VGF nerve growth factor protein-coding inducible DHRS9 ↓ ↓ ↓ ↓ - dehydrogenase/reductase 9 protein-coding NOX1 ↑ - ↑ ↑ ↑ NADPH oxidase 1 protein-coding GYLTL1B - ↑ ↑ ↑ ↑ LARGE xylosyl- and protein-coding glucuronyltransferase 2 PLA2G2A* ↑ - ↓ ↑ ↓ phospholipase A2 group IIA protein-coding F10 ↑ - ↑ ↑ ↓ Coagulation Factor X protein-coding TCN2* ↓ - ↑ ↑ ↑ transcobalamin 2 protein-coding C9orf41 ↑ - ↑ ↓ ↑ C9orf41 antisense RNA 1 ncRNA CCNG2 ↑ - ↑ ↓ ↑ cyclin G2 protein-coding METTL18 ↑ - ↑ ↓ ↑ methyltransferase like 18 protein-coding

110

MOSPD2 ↑ - ↑ ↓ ↑ motile sperm domain containing protein-coding 2 SMIM13 ↑ - ↑ ↓ ↑ Small Integral Membrane Protein protein-coding 13 CCDC68 ↑ ↓ ↑ - ↑ coiled-coil domain containing 68 protein-coding LONRF1 ↑ ↓ ↑ - ↑ LON Peptidase N-Terminal protein-coding Domain And Ring Finger 1 MZT1 ↑ ↓ ↑ - ↑ Mitotic Spindle Organizing protein-coding Protein 1 PHTF2 ↑ ↓ ↑ - ↑ putative homeodomain protein-coding transcription factor 2 TMEM170B ↑ ↓ ↑ - ↑ transmembrane protein 170B protein-coding TMEM56** ↑ ↓ ↑ - ↑ TMEM56-RWDD3 readthrough protein-coding CTGF ↓ - ↑ ↓ ↑ connective tissue growth factor protein-coding HDAC9 ↓ - ↑ ↓ ↑ histone deacetylase 9 protein-coding PMEPA1 ↓ - ↑ ↓ ↑ prostate transmembrane protein, protein-coding androgen induced 1 RAB8B - ↓ ↑ ↓ ↑ RAB8B, member RAS oncogene protein-coding family AKR7A3 ↓ ↑ ↓ - ↓ aldo-keto reductase family 7 protein-coding member A3

A stronger connection between SENP3 and WDR18 knockdown effects exists. Gene expression is affected in the same direction for both knockdowns in more than 2000 genes, with PELP1 joining the pair affecting ~800 more. LAS1L and TEX10 have significantly lesser effect on gene expression when knocked down. WDR18, SENP3 and PELP1 have 336 genes in common in the downregulated group (Figure 4‎ .8A) and 423 genes upregulated (Figure 4‎ .8B). We therefore continued to look for common enrichment within these two groups. GO-term enrichment analysis using GREAT (McLean et al., 2010) for downregulated genes common to all three knockdowns resulted in only three significant cellular component terms (Figure 4‎ .8C), and no enrichment was found for the combination of the upregulated genes. This is in contrast to the myriad of pathways we could see when each individual knockdown effect was examined and compared to other complex member knockdowns (Figure 4‎ .6). The only pathway appearing in both analyses is related to actin cytoskeleton. It may be that different complex components control different members of the same pathways rather than all affecting the same genes. This again suggests to us a more general role in gene regulation rather than an association with any specific biological process.

111

A. Downregulated genes B. Upregulated genes

siPELP1 siPELP1

siSENP3 siSENP3 siWDR18 siWDR18

C. Downregulated genes

Figure 4.8 - Analysis of gene expression changes common to PELP1, SENP3 and WDR18 knockdowns. A. Venn diagram of 1.5 fold downregulated genes after each knockdown. B. Venn diagram of 1.5 fold upregulated genes after each knockdown. C. GO term enrichment of 336 downregulated genes common to all three knockdowns.

112

4.3 Discussion

4.3.1 5FMC complex members knockdown We chose to continue our experiments using an oesophageal cancer cell line – OE19. We chose to knock down each of the complex members separately, to investigate the effect each knockdown would have on gene expression. We noted that some of the members of the complex, when knocked down, cause an increase or decrease in gene expression of the other members. Specifically SENP3 knockdown causes an increase in expression of WDR18, while WDR18 knockdown significantly lowers SENP3 expression. This suggested to us a cross talk between complex members. This causes a difficulty in distinguishing between the effects that a WDR18 knockdown might have on gene expression on its own rather than through SENP3, and helps explain the large correlation in the effects of both WDR18 and SENP3 knockdowns on gene expression. We continued to map the entire transcriptome of the cells after each knockdown.

We were surprised to discover only one gene and one long-non-coding-RNA reacted to all five knockdowns. However, there was little difference between TEX10 knockdown and control, so we decided to look for genes that reacted to four of the knockdowns as well. A cross comparison of up and down regulated genes in response to each of the knockdowns showed that genes tended to react by either upregulating in response to each knockdown, or by downregulating. We could find only a few which responded by upregulating in response to one knockdown and downregulating in response to the other. This serves to strengthen our observation that these proteins are working as a complex. However, the lack of GO terms between the downregulated genes following PELP1, SENP3 and WDR18 knockdowns taken together seems to imply this complex is a general activator/repressor.

4.3.2 Summary, Limitations and Conclusions Our chosen siRNAs were effective in knocking down the 5FMC complex genes; however we did not eliminate the possibility of off-target effects. Differential expression analysis of complex member knockdowns did not result in the identification of specific gene targets that might be affected by the 5FMC complex activity. Parts of this complex are functional in other complexes as well (Castle et al., 2012, Ding et al., 2015, Gonugunta et

113 al., 2014) thus the picture we see is that of a variety of different effects on cellular processes. We also would like to point out that the low expression levels and the quality of some of the antibodies had possibly affected our results. Despite that, the RT-qPCR and immunoblot results show a significant reduction in mRNA and protein of the knocked down genes.

In conclusion, we found only one protein coding gene (MYH4) and one long-non-coding RNA affected by all five knockdowns. To better decipher the role of 5FMC complex in gene expression, the following experiments should be performed:

1. Another RNA-seq experiment, using a second siRNA group for each of the knockdowns will help verify the effects we observed are not the results of off- target effects of the siRNA used. At the time of writing this thesis, this experiment is already under way. 2. Knockdown of some or all five complex members together, followed by RNA-seq might enable us to find genes that are specifically controlled by the 5FMC complex. Double knockdowns that capture two of the 5FMC sub-complexes together might reveal functional redundancy, and help identify where one sub- complex compensated for the lack of the other. 3. Use of embryonic stem cells in the same series of experiments to establish the role of 5FMC in gene expression during development, as the complex might be active only in specific times, and as members of the complex has been previously demonstrated to be active in the pluripotency circuitry (Ding et al., 2015).

As this complex is recruited to chromatin-bound methylated CHTOP (Fanis et al., 2012) and we have shown it is attracted by multi-SUMOylated targets; and assuming SENP3 is the active member of the complex, serving to deSUMOylate its targets, another possible experiment is to explore the effects that SENP3 knockdown has on complex function through SUMO2/3 dependant recruitment to chromatin, which we will describe in the next section.

114

5 Connection between the 5FMC complex and the SUMOylation profile across the genome

5.1 Introduction The 5FMC complex is known to be recruited to chromatin bound elements, possibly to deSUMOylate them, thus enabling activation of transcription by removing repressive complexes (Fanis et al., 2012). After establishing that 5FMC is recruited only to multi- SUMOylated targets, we continued to investigate the effects SENP3, the active unit of the 5FMC complex, has on SUMOylation of chromatin bound elements. SENP3 is a SUMO protease, deSUMOylating proteins and recycling SUMO proteins for continual use. In the previous chapters we have shown that without SENP3 we did not see complex recruitment to multi-SUMOylated targets, and that SENP3 knockdown affects a large amount of genes. Thus, SENP3 was chosen for knockdown and examination by SUMO2/3 ChIP-seq.

5.2 Results

5.2.1 SENP3 knockdown affects SUMO2/3 binding at known sites In one paper, Castel et al (2012) has asserted that both LAS1L and PELP1 are SUMOylated in a SENP3 dependant manner and that their deSUMOylation affects their nuclear localisation. We observed (Figure 3‎ .5C) that with SENP3 missing, no SUMO- binding of 5FMC can be seen, though SENP3 in itself is ever present in every SUMO binding experiment we performed. We also established that an acceptable siRNA knockdown of SENP3 has little or no effect on gene expression of other complex members with the exception of WDR18, showing upregulation in response to SENP3 knockdown. SENP3 is downregulated in response to WDR18 knockdown (Figure 4‎ .2A). Now, we employed SENP3 knockdown in an attempt to disrupt complex deSUMOylating activity in a SUMO2/3 ChIP-seq. When compared with the control, this might help identify potential 5FMC complex SUMOylated chromatin-bound targets.

We initially performed chromatin immunoprecipitation (ChIP) in OE19 cells, with and without SENP3 knockdown. In preliminary checks, we screened several SUMO2/3 antibodies using the same method as the ChIP protocol, followed by immunoblotting. As we were looking to find protein-bound SUMO, our main interest was in finding a smear

115 of different sized proteins bound covalently to SUMO. We chose to use αSUMO2/3 rabbit polyclonal antibody (Abcam ab3742) previously used for SUMO2/3 ChIP (Chang et al., 2013). An αSUMO2/3 immunoblot of the SUMO2/3 immunoprecipitation experiment shows a number of distinct high molecular weight bands with a smear in the eluted protein lane (no. 7), corresponding to SUMOylated proteins of various sizes. The same smear is not seen in the IgG control (lane 4) (Figure 5‎ .1A). We next verified DNA sheering, with the desired fragment size at ~200bp, and settled on 12 sonication cycles (Figure 5‎ .1B).

We then performed chromatin immunoprecipitation with our chosen αSUMO2/3 antibody (Abcam) and IgG control (Merck) on OE19 cells following 96 hour siRNA knockdown for SENP3 and a non-targeting control. We chose two locations previously tested by our group in the MCF10A cell line (Figure 5‎ .1C) and performed ChIP-PCR. SUMO2/3 binding at previously described chromatin-bound-SUMO regions (LOC101929140 - Homo sapiens uncharacterized long non-coding RNA and ITPRIP - Inositol 1,4,5-Trisphosphate Receptor Interacting Protein), along with a negative control oligonucleotide set on a gene desert on chromosome 12, were measured relative to percentage input (Figure 5‎ .1D). In three separate repeats we observed an average increase in SUMO2/3 binding of the SENP3 knockdown samples relative to control in both chosen locations, with each individual repeat showing increase, indicating an increase in bound SUMO2/3 when the SUMO-protease is missing.

116

A. B. D.

siNT 12 siSENP3 10

8

6 SUMO conjugates SUMO input of % 4

2

0 Neg ctrl LOC101929140 ITPRIP

C.

Figure 5.1 - SUMO2/3 ChIP verification. A. ChIP-antibody verification. Immunoblot of OE19 SUMO2/3 IP using αSUMO2/3 rabbit polyclonal antibody (lanes 6-8) compared to α-rabbit IgG control (lanes 3-5). Lanes show protein marker (lane 1) OE19 nuclear extract input (lane 2), flow through (FT, lanes 3 and 6), proteins eluted from the antibody beads (Elute, lanes 4 and 7) and proteins left on the beads (Beads, lanes

5 and 8). Proteins were visualised by western blot using αSUMO2/3 primary antibody and the relevant IRDye secondary antibody. Images acquired by LiCor Odyssey Infrared Imager. B. OE19 (lane 2) and MCF10A (lane 3) DNA sheering at 12 sonication cycles. C. MCF10A SUMO2/3 ChIP-seq tracks from Aguilar- Martines unpublished data, showing location of ChIP verification primers boxed in red. D. ChIP-PCR analysis of SUMO2/3 binding in OE19 cells as a percentage of input DNA in SENP3 knockdown samples vs. non targeting control. Results represent 3 separate experiments, with error bars representing SD.

117

5.2.2 SENP3 knockdown has little effect on SUMO2/3 presence on chromatin Having demonstrated there is a difference between samples lacking SUMO proteases and the controls, we prepared two separate biological repeats, each containing siRNA knockdown of SENP3 and a non-targeting control, ChIPed them using rabbit polyclonal antibody specific against SUMO2/3 antibody (Abcam), and sequenced separately using Illumina HiSeq 4000 next generation sequencing. All samples passed FastQC quality checks successfully (Andrews, 2014), then mapped to human hg19 genome using bowtie2 (Langmead et al., 2012), followed by sorting and hg19 blacklist intersecting with SAMtools/1.3.1 (Genome Project Data Processing Subgroup et al., 2009) to remove sequences with an unstructured or high read in NGS experiments. We then called peaks using MACS2 broad peak setting (Zhang et al., 2008b) with and without –SPMR normalisation. To better compare different experiments we needed to take read depth into account. We used the MACS2 –SPMR normalised files to count the number of reads in peaks (RIP) using BEDOPS map (Reynolds et al., 2012) in order to determine the weighted mean of bedgraph coverage using the union of SUMO3 peaks as a reference. We then scaled the un-normalised MACS2 files to RIP and re-called peaks. This process is summarised in Figure 5‎ .2. See S4 for additional files, and tracks at http://genome- euro.ucsc.edu/s/Rotem/Thesis1.

118

Figure 5.2 - Work flow of ChIP-seq data analysis. Following next generation sequencing the analysis of ChIP data involved Quality control by FastQC and subsequent trimming when necessary, alignment to hg19 genome using bowtie2, sorting and blacklist removal using SAMtools, Peak calling using MACS2 and RIP normalisation. Subsequent downstream analysis followed. Flow chart created by Lucidchart.

119

After siRNA knockdown was verified by immunoblot (Figure 5‎ .3A), we commenced chromatin immunoprecipitation. In two separate repeats, each containing siRNA knockdown of SENP3 and a non-targeting control, the correlation between treatment and control within repeats - 95% in first repeat and 93% in the second - was greater than between similarly treated samples, standing at 89% for treated samples and 93% for control samples (Figure 5‎ .3B). Reads in peak normalisation did not affect the relationship between samples. Plotting the profile of all peaks using transcription start site (TSS) as a reference point (deepTools, (Ramírez et al., 2016)), SUMO2/3 clustered mostly at the TSS regions in the same genes in both treatment and control (Figure 5‎ .3C).

We looked at the same locations we chose for verification of immunoprecipitation earlier (Figure 5‎ .1C, D), and compared those to our ChIP-seq results. We could see peaks in each of our MCF10A samples, both induced and starved cells (blue) – with the same conditions as were used in the experiment presented in Figure 5‎ .1C. In addition to that, both of our OE19 repeats - the control samples (grey) and the siSENP3 (orange), show peaks in the same locations (Figure 5‎ .4).

Looking at the OE19 SENP3 knockdown vs control, SUMO2/3 presence at different regions of the chromatin as annotate by HOMER (Heinz et al., 2010) set to nearest peak shows on average about 45% of the peaks locate between genes, and a slightly lower percentage located at introns ranging from about 38% at the control samples to 43% at the treated ones. ~ 10% of SUMO2/3 peaks can be found at promoter-TSS regions in both sets, with promoter defined as +500 to -1000bp from TSS. The remaining peaks (~3%) divide more or less equally between exons and transcription termination site (TTS) (Figure 5‎ .5A). In fact, differential binding analysis (DiffBind (Ross-Innes et al., 2012), supplementary table S3) showed only one significantly different peak, FKBP3 - Peptidyl- Prolyl Cis-Trans Isomerase (track shown in Figure 5‎ .5B, along with an example for a peak that shows no difference –KDM3A) at its TTS region, and while this region is also indicated as a transcription factor binding site (Encode v2), the difference was not significant enough at FDR=0.0412 to merit further investigation.

120

A. B.

C.

Normalised read counts read Normalised

Figure 5.3 - SENP3 knockdown has no effect on chromatin recruitment of SUMO2/3. SUMO2/3 ChIP- seq with SENP3 knockdown compared to non-targeting control A. Spearman sample correlation of two RIP normalised SUMO2/3 ChIP-seq experiments (deepTools). B. SENP3 knockdown of ChIPed samples visualised by immunoblot, compared to ERK2 loading control. C. Heatmap of ChIP-seq

samples from two experiments normalised to reads in peak and arranged by distance from TSS (deepTools). Colour bar represents normalised read counts.

121

Figure 5.4 – ChIP-seq tracks at the same locations as the verification primers. Starved and induced MCF10A SUMO2/3 ChIP-seq tracks (blue), and OE19 SENP3 knockdowns (orange) and siNT controls (grey) showing location of ChIP verification primers boxed in red.

122

A.

B.

Figure 5.5 - SENP3 knockdown has no effect on chromatin recruitment of SUMO2/3. A. Distribution of peaks by genomic annotation of genes in samples (HOMER). B. KDM3A and FKBP3 tracks from both experiments, significantly different peak as determined by DiffBind boxed in blue.

123

5.2.3 SUMO2/3 dynamic chromatin binding We noticed that SUMO binding patterns across the genome was mostly similar to that of open chromatin regions, as mapped by ATAC-seq - a method that probes DNA accessibility used to map transcription factor binding and nucleosome positioning (Buenrostro et al., 2015). When we compared OE19 ATAC-seq results (Sam Ogden, Sharrocks lab unpublished data) side by side with our control OE19 and MCF10A SUMO2/3 ChIP-seq data (Table 5‎ .1), we noticed an accumulation of SUMO peaks at transcription start sites in both cell lines, amounting to about 10K TSS peaks for each of the cell lines, which is about half the amount of open chromatin peaks (about 20K TSS peaks) found on Promoter-TSS regions in OE19 cells. Out of these 10K promoter-TSS peaks, ~7500 peaks in OE19 and ~6800 in MCF10A cell lines co-inhabit the same regions as OE19 open chromatin. Overall SUMO occupation of open chromatin is divided almost equally between intergenic regions, gene bodies (including introns, exons and TTS regions), and promoter regions. The latter representing a large amount of SUMO concentrated in small regulatory regions indicates the significance of SUMO in regulating gene expression. Overlapping peaks between the OE19 untreated samples and the MCF10A starved samples were also determined by intersect (at least 30% peak overlap), and showed about 75% of the identified SUMO peaks in either cell line at promoter-TSS regions were on the same genomic locations in both. This emphasises both the similarities and differences between cell lines.

We divided the un-treated OE19 SUMO2/3 and the MCF10A starved SUMO2/3 detected peaks to three genomic region groups: Promoter-TSS regions defined as +500/-1000bp from TSS, intragenic areas - including introns, exons and TTS peaks - and intergenic areas. Division was done according to HOMER annotation set to nearest TSS (Heinz et al., 2010). We could see recruitment of SUMO2/3 to chromatin coincides to areas of open chromatin in OE19 cell line (Figure 5‎ .6A). We could also detect distribution of SUMO- binding along gene bodies and in intergenic areas, some of which in areas where chromatin is not necessarily open, but possibly contains enhancer regions. To further visualise the differences and similarities between the two cell lines we plotted the SUMO binding profiles of the same locations. We plotted both on the highest peaks (top 30% of the dataset) in OE19 SUMO2/3 (Figure 5‎ .6B) and on MCF10A starved, non-induced cells

124

SUMO2/3 highest detected peaks (Figure 5‎ .6C) aligning to peak centre. We could see that the pattern of SUMO2/3 recruitment to chromatin is almost identical between treated and non-treated samples within each cell line, though different between cell lines. Promoter-TSS peaks detected in both cell lines looks similar, though the width of the peak is slightly wider in OE19 than MCF10A peaks. In addition to that, MCF10A signal is higher when plotted on MCF10A peaks (Figure 5‎ .6C), and OE19 signal is higher when plotted on OE19 peaks (Figure 5‎ .6B). The difference between cell lines is mainly visible in intragenic areas, and in areas between genes – possibly containing enhancers. Comparison of the overlap of peaks between untreated cell lines (Table 5‎ .1) shows that about 7400 peaks (out of ~9.5K peaks) overlap in promoter-TSS regions between the two cell lines. Thus, SUMO2/3 binding to chromatin differs in pattern as well as location to some extent between cell lines. Both these similarities (highlighted in red) and differences (highlighted in blue) are exemplified when looking at the genomic tracks of these experiments at a SUMO-rich region (Figure 5‎ .6D) extracted from UCSC (Kent et al., 2002). All tracks are available on http://genome-euro.ucsc.edu/s/Rotem/Thesis1.

Table 5‎ .1 – Summary of peak distribution in ChIP-seq experiments. Peaks were annotated by HOMER with promoter-TSS region set at +500 to -1000bp, with additional introns, exons, transcription termination sites (TTS) and intergenic regions. Peak overlap (last lines, marked in light blue) was determined (intersect BEDtools, at least 30% overlap) between OE19 siNT samples and starved MCF10A samples, and between OE19 SUMO-ChIP and OE19 open chromatin (ATAC-seq).

Sample promoter- intron exon TTS intergenic Total TSS Peaks OE19 siNT rep1 10419 32690 1355 1070 38734 84268 OE19 siNT rep2 9922 33711 1227 1119 40700 86679 OE19 siSENP3 rep1 9738 24604 1257 877 28389 64865 OE19 siSENP3 rep2 10593 52664 1544 1607 59769 126177 ATAC-seq rep1 19561 22586 878 858 22621 66504 ATAC-seq rep2 22854 32050 1267 1287 32130 89588 MCF10A starved rep1 10382 53698 1469 1544 63550 130643 MCF10A starved rep2 8592 35898 1087 1295 34979 81851 MCF10A 30minEGF rep1 8456 30178 1041 858 43135 83668 MCF10A 30minEGF rep2 10337 51768 1455 1479 66466 131505 Peak overlap OE19/MCF10A 7404 14075 686 612 17389 40166 Peak overlap OE19/ATAC 7462 9472 493 412 10398 28237

125

A.

OE19 OE19 OE19 MCF10A MCF10A

siNT siSENP3 ATAC starved induced

-

B. OE19 SUMO2/3 peaks

TSS Promoter

Normalised read count read Normalised

Intragenic

C. MCF10A SUMO2/3 peaks

count read Normalised

Intergenic

Normalised read count

D.

OE19 siNT OE19 siSENP3 MCF10A starved MCF10A induced

Figure 5.6 – Genome wide identification of SUMO2/3 binding to chromatin. A. heatmap of SUMO2/3 ChIP-seq in OE19 cells (red) and MCF10A cells (blue) and ATAC-seq in OE19 cells (green), clustered to 3 genomic regions identified from SUMO2/3 peaks in untreated OE19 ChIP-seq (deepTools). Colour scale represents normalised read counts. B. Profile plot of the top 30% highest peaks in OE19 genomic region and C. MCF10A genomic region - peak distribution in OE19 and MCF10A SUMO2/3 reads. Centred on SUMO peaks in OE19 and MCF10A respectively. D. Example of genomic tracks of OE19 with and without SENP3 knockdown and MCF10A cells with and without 30 min EGF induction, showing SUMO2/3 chromatin binding. Different peaks highlighted in blue and similar peaks highlighted in red. 126

The promoter site accumulation of SUMO2/3, together with the 5FMC complex recruitment to chromatin bound elements in order to remove repressive complexes from transcription factors (Fanis et al., 2012), led us to query the role of SUMO2/3 in gene activation. We therefore decided to look into the dynamics of SUMO2/3 chromatin recruitment over time after EGF induction. For this purpose we chose to use the MCF10A cell line, a spontaneously immortalized, non-malignant breast epithelial cell line (Soule et al., 1990). MCF10A cells can be induced by epidermal growth factor (EGF), causing a cascade of downstream early gene activation, therefore making them ideal for chromatin-binding dynamics studies.

We cultured MCF10A cells in starvation media for 48 hours in order to enable induction, followed by addition of EGF to the media. Cells were collected at 0, 15, 30, 60, 120 and 180 minutes post induction, with ‘0’ time point represented by a non-induced sample. DNA sheering was tested as with OE19 cells, with the desired fragment size at ~200bp, and settled on 12 sonication cycles, the same as with OE19 cells (Figure 5‎ .1B). Chromatin immunoprecipitation was performed as described before; however, MCF10A response to EGF induction requires sparsely seeded plates. Therefore, we only used 4 million cells for each time point.

We verified our immunoprecipitation using qPCR (Figure 5‎ .7A) for the same locations previously chosen (Figure 5‎ .1C). Noting that each of the samples showed acceptable binding compared to input at known SUMO2/3 binding locations, we also could see a reduction in ITPRIP peak after induction, and changes in LOC101929140 binding as well. ChIP samples were deemed of good quality and sent for NGS. Library production was successful and sequencing depth and complexity were not affected by the reduction in cell number. After sequencing, normalisation and ChIP analysis were performed as previously described (Figure 5‎ .2). Correlation between MCF10A samples (Figure 5‎ .7B) was high, ranging from 78% to 87%. We looked at our primer locations in our ChIP-seq samples (Figure 5‎ .4), and while peaks are present in the same locations, the results do not match the ITPRIP on the ChIP-PCR for the same samples (Figure 5‎ .7A). However, these experiments need more repeats.

127

A comparison of SUMO2/3 presence at different regions of the chromatin as annotated by HOMER (Heinz et al., 2010), showed that at 15 minutes post induction, there is a higher proportion of SUMO-bound genes at promotor-TSS regions compared to any other time point we measured (Figure 5‎ .7C), though the 15 min dataset has the lowest counts, which might tip the balance as the promoter peaks would be the highest and thus most visible in a dataset. This indicates SUMO2/3 binding as a potential participant in gene activation, though more repeats are necessary to verify this data. In order to analyse the differences between time course sequencing data points, and to cluster and visualise the temporal patterns, we used TCseq R package. Using z-score transformation and focusing on the change pattern in the time course, we used fuzzy cmeans soft clustering to divide SUMO-bound peaks into four main groups (Figure 5‎ .8).

15 minutes after induction, some chromatin regions accumulate SUMO2/3, and remain SUMO-bound throughout the experiment time frame (Figure 5‎ .8A cluster 2). Others take 30 minutes to reach maximum SUMO2/3 binding, and do not maintain the same level of binding over time (Figure 5‎ .8A cluster 3). The other two clusters start the time course bound to SUMO2/3, and lose it either gradually throughout the time course (Figure 5‎ .8A cluster 4) or by 30 minutes (Figure 5‎ .8A cluster 1) and then gain it again at later times. Very similar patterns emerged when we ran the same TCseq analysis on a list of genes known to activate within the first 15 minutes from EGF induction, rather than all of the genome (Figure 5‎ .8B). The resulted figure takes into account all the peaks associated with these genes. In this group we note either a gain or loss of SUMO within the first 30 minutes after induction, followed by a return to starting point after three hours. This concentration on the fast responding immediate early genes is what causes the loss of pattern of the slow response genes seen in cluster 4 in Figure 5‎ .8A. This similarity indicated to us that SUMO2/3 can quickly bind and potentially cause either activation or inhibition of transcription; SUMO recruitment to chromatin can produce both results regardless of the speed of recruitment.

128

A. B.

30 25

0 min 20

15 min 15 30 min 60 min % of input 10 120 min 180 min 5

0 Neg Cont LOC101929140 ITPRIP

C.

Figure 5.7 – Genome wide SUMOylation analysis across an EGF time course. SUMO2/3 ChIP-seq for MCF10A starved cells following EGF induction. A. SUMO2/3 binding as a percentage of input DNA in MCF10A EGF induced samples. Results represent one experiment. B. Spearman sample correlation of normalised reads from SUMO2/3 ChIP-seq (deepTools). C. Pie charts representing genomic annotation of genes in time course samples (HOMER).

129

A. All peaks

B. Immediate early genes

Figure 5.8 - Analysis of SUMOylation dynamics across an EGF time course. Change pattern of SUMO peaks on A. all genes, and B. immediate early genes within time course ChIP-sequencing data. Graphs produced using fuzzy cmeans soft clustering of z-score normalised reads (TCseq R package, Wu M (2019)).

Colour bar represents the degree to which each data point belongs to a cluster, percentage indicated in B. represent the fraction of immediate early genes in each cluster. 130

Looking at the spread of chromatin bound SUMO2/3 in early gene activation (Figure 5‎ .9) we could see that in a group of immediate response genes such as FOS (FBJ Murine

Osteosarcoma) and EGR1 (Early Growth Response 1), SUMO2/3 is spread over the whole gene body 15 minutes after induction, then accumulates at the 3’ end at 180 minutes. Other early genes take longer to respond, and do not show as much SUMO2/3 binding along the gene body (as exemplified by HES1 gene tracks). This division into immediate early genes and delayed primary response genes is well documented (reviewed in Bahrami et al. (2016)) however the involvement of SUMO in creating the differences between these groups is less documented and bears further investigation.

Figure 5.9 – SUMO2/3 dynamic binding to chromatin. MCF10A starved cells induced by EGF at several time points. SUMO2/3 ChIP-seq UCSC tracks presented for FOS, EGR1 and HES1 early genes.

131

5.3 Discussion

5.3.1 Knockdown of SENP3 does not affect SUMO2/3 binding to chromatin The 5FMC complex has been shown to be recruited to transcription factors and deSUMOylate them, possibly removing repressive complexes and allowing transactivation of genes (Fanis et al., 2012). As we have shown in the previous chapter, we could not see complex recruitment to multi-SUMO without the entire complex being present. We chose to knockdown SENP3, a SUMO protease member of the complex, which we predicted to be an active member in SUMO removal. Having successfully done so, we sequenced SUMO2/3 bound chromatin. Differential binding analysis of the normalised results showed that there was no difference between knocked-down samples and the control. As these results could not be explained by insufficient knockdown, we speculated that SENP3 role is of such importance as to merit a high number of redundancies. However, as SENP3 might be also a part of 5FMC complex, and as we could not see complex binding to SUMO without its presence, we did expect to see an increase in SUMO presence on chromatin in the genes regulated by the complex. It is possible that the SENP3 is redundant and can be replaced by another SUMO protease within the 5FMC complex, or that the complex is redundant in itself. Another possibility is that 5FMC complex does not regulate gene expression by removing chromatin-bound SUMO.

5.3.2 SUMO2/3 recruitment to chromatin is dynamic Cross comparison between SUMO2/3 ChIP-seq data with ATAC-seq data from our lab has shown that while SUMO2/3 maps to open TSS regions, in some cases we can see SUMO2/3 bound to regions along the gene bodies, and also to areas between the genes, possibly corresponding to regulating elements on the genome. Taken with 5FMC complex recruitment to chromatin by multi-SUMOylated targets, we asserted that looking at the dynamics of SUMO2/3 binding to chromatin chould provide information about gene binding. In an induction time-course experiment, two main groups of genes emerged with relation to SUMO occupancy over the experiment time. In one group SUMO occupancy is very low at the beginning of experiment and increases over time and in the other genes begin experiment with SUMO present and occupancy decreases. These two groups can be subdivided into two each: increased occupancy can happen

132 within 15 or 30 minutes, and genes that acquire SUMO faster seem to retain it throughout experiment time frame, while genes that take longer to accumulate SUMO also lose it at the same rate. The second group has SUMO at the beginning of the experiment and loses it either gradually or within the first 30 minutes, the latter gaining SUMO again over time. The question remains as to whether these genes are activated or repressed – as SUMO role is so diverse, and SUMOylation can either activate or repress, depending on the SUMOylated protein.

Interestingly, known early activated genes accumulated SUMO all along the gene body after 15 minutes, only to have it aggregated at the TTS region 15 minutes later. Delayed early genes accumulated SUMO either in the TSS or TTS regions of the protein, and only at later times post activation. In the former case SUMOylation and/or SUMO recruitment seems to be a part of the gene activation process, though we did not have a chance to research this further in this work.

With respect to 5FMC recruitment to SUMOylated targets and subsequent deSUMOylation and removal of repressive complexes, we can assume these will belong to the group of genes found SUMO-bound at the beginning of the experiment. However, due to the nature of the data attained through ChIP-seq, we cannot determine which gene in these groups is activated by SUMO, and which is repressed. Without further experiments we cannot be sure of the dynamics of this specific complex recruitment.

5.3.3 Summary, limitations and conclusions In this chapter we found that SENP3 SUMO protease knockdown is insufficient for affecting SUMO binding to chromatin. It is possible that while we know of no other SUMO protease that associates itself with this complex, redundancies might still exist. One candidate for SENP3 replacement might be SENP5, a highly similar SUMO protease with a close function to that of SENP3.

We opted not to use deSUMOylation inhibitors when working with cross-linked chromatin extracts, as the cell population was large enough and signal levels were sufficient for our purposes, and as the cross-linking of chromatin might be sufficient to avoid losing the SUMO signal. It is possible that deSUMOylation and detachment process by 5FMC is fast enough for us to miss it, though. As we did not use a synchronised cell

133 population for this set of experiments, we presumably see cells at different stages, and pick up signals from all of those. Perhaps using synchronised cells would enable us to pinpoint SUMO recruitment in a better way. This problem is partly addressed by the induced cell time course we performed, as the induction causes a relatively uniform response that we could easily visualise in our results.

In conclusion, to better understand the dynamics of SUMO recruitment to chromatin and determine the role of the 5FMC complex in SUMO-chromatin binding, the following experiments are required:

1. Another SUMO pull-down experiment with siRNA knockdown of SENP3, and with a combination of multiple 5FMC complex member knockdowns, followed by mass spectrometry will possibly reveal the SUMO protease replacing SENP3 in the complex. 2. Testing SUMO levels after 5FMC complex member knockdowns, either individual or multiple knockdowns, will reveal whether 5FMC is indeed a deSUMOylating complex. 3. Endogenous knockdown of each 5FMC complex protein (apart from SENP3) followed by SUMO CHIP-seq will enable better understanding of which complex protein is required for SUMO turnover on chromatin. 4. ChIP-seq of each of the complex members. Combination of data from these experiments could provide information of recruitment to chromatin of each member individually and in complex.

134

6 General Discussion

6.1 Overview The reversible post translation modification SUMO is a crucial regulatory mechanism for many cellular processes, including DNA damage repair, transcription regulation, cellular proliferation and ribosome biogenesis. There are five known SUMO paralogs in humans known to date, with SUMO2 and SUMO3 almost identical, and SUMO1 sharing about 50% homology with them. SUMO4 and 5 are less characterised as of yet. SUMO is conjugated to consensus binding sites in target proteins following an E1, E2, E3 enzymatic cascade, similar to that of ubiquitin. Once SUMOylated, a reader protein can be recruited to the SUMOylated target through a SUMO interacting motif known as SIM (reviewed in Geiss-Friedlander et al. (2007)). A target protein can be SUMOylated in a number of ways: It can be SUMOylated on a single site, presenting one SUMO; it can be SUMOylated by a poly SUMO2/3 chain (Tatham et al., 2001), or it can present several SUMO moieties on different lysines on its surface (Aguilar-Martinez et al., 2015).

In this study, we found reader proteins and complexes that preferably bind to multi- SUMOylated targets (Table 3‎ .1, Figure 3‎ .4B). We concentrated on the five friends of methylated CHTOP complex, as the whole complex is not found bound to single SUMO conjugated targets, but several subunits are, as previously indicated (Hendriks et al., 2016). The 5FMC complex is recruited to chromatin bound transcription factors and possibly deSUMOylates them, thus removing repressive complexes and driving transcription forward (Fanis et al., 2012), as represented in Figure 6‎ .1A. The same proteins constituting the complex are involved in other complexes as well, participating in numerous other processes, the common denominator for which is maintaining pluripotency and driving proliferation (Castle et al., 2012, Ding et al., 2015, Yan et al., 2010). We confirmed that 5FMC complex preferably binds to multi-SUMOylated targets (Figure 3‎ .5) and that all the members of the complex are necessary for that recruitment (Figure 3‎ .7). This study aimed not only to assess the recruitment of the 5FMC complex to multi-SUMOylated targets, but also to specifically localise its potential recruitment to chromatin-bound SUMO, and its effects on global gene expression.

135

Cross comparison of the effects of knocking-down each member of the 5FMC complex has uncovered possible inter-complex relations (Figure 4‎ .7). Though our previous results indicated that recruitment of the complex to chromatin requires the attendance of all complex members, we could not see any gene expression inhibiting effects following disassembly of the 5FMC complex, not genome wide nor specifically in the known 5FMC activated transcription factor Zbp-89-activated genes. The results did indicate that the complex is possibly comprised of two associated sub-complexes, one consisting of SENP3, WDR18 and possible PELP1, and the other consisting of LAS1L and TEX10. We therefore supplemented our approach by mapping the recruitment of SUMO to chromatin. By knocking-down the assumed deSUMOylating unit of the complex, the same sub-unit we found to affect complex recruitment to multi-SUMOylated targets, we should have been able to show the 5FMC complex activity on chromatin. While the involvement of SENP3 in complex recruitment to SUMO on chromatin appears to be either highly gene specific, so important that it has ample redundancies in place, or utterly unnecessary, we did notice that SUMO recruitment to chromatin seems to be global, and correlates well with open chromatin patterns in the same cells (Figure 5‎ .6). As chromatin opens in response to stimuli (reviewed in Clapier et al. (2017)), we chose to look at SUMO recruitment to chromatin in response to EGF induction over time (Figure 5‎ .8). SUMO recruitment to chromatin was found to be highly dynamic in early gene activation, changing chromatin occupation pattern within the first 30 minutes from activation. The change from promoter/enhancer location to a gene-wide spread (Figure 5‎ .9) might indicate the presence of SUMO is necessary for transcription and RNA processing, either by direct binding to RNA pol II or accompanying factors. These findings are supported by previous observations indicating SUMO involvement in many mRNA processing events (reviewed in Richard et al. (2017)).

136

6.2 5FMC is recruited to multi-SUMOylated targets only when all complex members are present Only one previously published mass spectrometry study of SUMO-bound proteins found all 5FMC complex members bound to SUMO (Aguilar-Martinez et al., 2015). This study utilised a multi-SUMOylated trap rather than the single SUMO trap used by others. By creating this trap, the novel notion of the multi-SUMOylation of proteins and the specific recruitment of readers to such protein targets has been successfully explored. In the current study we aimed to exemplify the multi-SUMOylated recruitment of protein complexes through investigation of the 5FMC complex. The 5FMC complex is comprised of five proteins, PELP1, TEX10, LAS1L, SENP3 and WDR18. Members of this complex have been demonstrated to participate in transcription and ribosome biogenesis in more than one study, both in the 5FMC complex form and in other complexes as well (Castle et al., 2012, Ding et al., 2015, Fanis et al., 2012, Finkbeiner et al., 2011b).

While recruitment of 5FMC complex to methylated CHTOP is already known (Fanis et al., 2012), the exclusive recruitment of the 5FMC complex to multi-SUMOylated targets adds another piece to the puzzle. In addition, the 5FMC complex recruitment to multi- SUMOylated targets can only occur when all of the complex members are present.

We encountered several problems while trying to decipher the 5FMC complex mode of action. Antibody quality of some of the complex members was less than ideal, with the only TEX10 antibody in existence only sporadically working, and WDR18 antibody binding SUMO3 traps as well as WDR18, a fact that made it very hard to quantify. This has prompted our use of epitope tagged proteins both in cellular environment and outside of it. We did not manage to isolate the complex members and create the complex in-vitro which could have served to investigate its structure and assembly. The use of epitope tagged penta-transfections to determine the complex recruitment to multi-SUMOylated targets, however, has proven extremely informative though hard to consistently achieve. These experiments should be repeated by creating an endogenously tagged cell lines, if possible for two or more of the 5FMC complex proteins, which will enable detection and capture of the 5FMC complex members in a more efficient way.

137

A.

B.

Figure 6.1 – SUMO recruitment to chromatin involvement in gene expression. A. The 5FMC complex is recruited to multi-SUMOylated targets, possibly to deSUMOylate them and drive transcription forward by removing repressive complexes. B. EGF induction results in the addition of

SUMO to chromatin, and stimulation of transcription in early genes.

138

6.3 5FMC complex is possibly comprised of two sub-complexes 5FMC complex stimulates transcription by possibly deSUMOylating transcription factor Zbp-89 (Fanis et al., 2012), a transcription modulator which had been established as multi-SUMOylated (Chupreta et al., 2007). ZBP-89 has a dual role in gene expression. It has been shown to activate genes such as Stromelysin (MMP), T-Cell receptor, and type I collagen (COL1A1/2), or inhibit others by competing with Sp proteins on GC-rich sequences of promoters (Law et al., 1998) or by independently binding to promoters (Zhang et al., 2003). In accordance the model described by Fanis et al. (2012) (Figure 1‎ .5), where 5FMC complex recruitment to chromatin causes deSUMOylation of Zbp-89, both roles seem to correlate with the SUMOylation status of ZBP-89, namely activation of gene expression is achieved through deSUMOylation of ZBP-89. Therefore, when knocking-down a members of the 5FMC complex, we expected to see lower expression levels of ZBP-89 controlled genes, and possibly others as well, that will indicate whether there are other transcription factors regulated by the 5FMC complex as well. Our preliminary experiments indeed indicated that the recruitment of the 5FMC complex to multi-SUMOylated targets requires the presence of all the members of the complex, and as each of these are also active in other processes as well, we asserted that a cross reference of whole transcriptome data of individual complex member knockdowns will supply us with the desired information. However the cross reference of knockdowns gave quite a different result. Knockdown of LAS1L and especially TEX10 had only minor effects on gene expression, while SENP3 and WDR18 knockdowns gave pronounced effects – causing either up- or down regulation of genes in a highly correlated fashion. PELP1, the supposed core of the complex, was seen to sometimes act in conjunction with WDR18 and SENP3, but not always. As we have mentioned before, the complex members are all involved in a myriad of different cellular processes. We were hoping to obtain information about the specific collaboration of the 5FMC complex members and the complex effects on gene expression, and possibly to find which other transcription factors the 5FMC complex affects, but could find no evidence of such collaboration. It is possible that the complex is more specialised than we expected, and only works as a complex in the presence of very specific genes. Also the possibility of 5FMC complex member redundancy still needs to be explored. A further scan of other cell-lines, especially embryonic cells, and perhaps synchronisation to cell cycle stage may

139 be in order. It is probable that not all members of the complex are necessary at any given time, which might well allow for other functions to be performed by complex members at different stages in cell-cycle. Such events would mask the information we are seeking in a mixed population of cells.

6.4 Dynamic SUMO2/3 binding to chromatin SUMOylation and deSUMOylation of transcription factors is an important part of regulating gene expression. In recent studies, over 300 transcription factors associated with RNA Pol II have been identified as SUMO conjugates (Hendriks et al., 2017), and of those many have been shown to be inhibited by SUMOylation through recruitment of repressor complexes (Gill, 2005). The 5FMC complex discussed in this study might deSUMOylate such a transcription factor to remove repressor complexes and activate transcription (Fanis et al., 2012). We therefore knocked-down a member of the complex we found to be crucial for 5FMC complex recruitment to multi-SUMOylated targets, and mapped SUMO localisation to chromatin. We expected that a SENP3 knockdown would provide a different set of SUMO-bound peaks than the control. We would then have been able to look for common motifs in these differential peaks, and to find the associated transcription factors. This would have led to continued investigation of the multi-SUMOylated state of these transcription factors, and their ability to recruit the 5FMC complex. However, a knockdown of SENP3, the supposedly active deSUMOylating unit of the 5FMC complex, produced no difference in SUMO2/3 binding to chromatin. This result potentially serves to emphasise the importance of SUMO in cellular processes, as it is possible that other SUMO proteases are compensating for the missing one, or perhaps the assumption that this complex serves to deSUMOylate proteins is wrong, and SENP3 only serves for complex recruitment to multi-SUMO bound targets.

We also looked at the dynamics of SUMO binding to chromatin over a 3 hour time course following an external stimulus (Figure 5‎ .8). The dynamics of SUMO binding was evident in two major groups – some that are bound to SUMO and lose it after induction, and others that gain SUMO after induction. Within these groups, sub divisions were observed, further dividing the genomic loci into slow and fast response genes, with some that return to the original SUMO occupancy over time and some that do not. We were particularly interested to see the fast response genes gaining SUMO across gene bodies

140 at 15 minutes after induction, only to lose it minutes later (Figure 5‎ .9). It is possible that SUMO is deposited across gene bodies, or on the polymerase itself (Figure 6‎ .1B). This indicated to us the importance of fast SUMO recruitment to chromatin, and the potential SUMO role in immediate early gene activation.

6.5 Perspectives The 5FMC complex members are active in several complexes. However, when assembled into the 5FMC complex, it is recruited specifically to multi-SUMOylated targets. The effects of the 5FMC complex members on gene expression seem to be wide spread and diverse and it would be interesting to investigate whether it is a general activator/repressor complex or if it might have a more localised effect, and also to discern the nature of that effect. While it is known that the SUMOylation status of transcription factors is controlled by SUMO proteases, it is not entirely clear that the SUMO protease involved in the 5FMC complex – SENP3 - acts as a deSUMOylating agent, rather than a recruiter to SUMOylated proteins. It would thus be interesting to investigate what is the 5FMC relation to SUMOylation status of proteins, and if it is not a deSUMOylating complex - what the 5FMC complex does and what is the mechanism by which it operates. As mentioned before, parts of the 5FMC complex are involved in maintaining pluripotency during development (Ding et al., 2015), and so expanding these experiments to embryonic stem cells might be in order.

With regards to the dynamics of SUMO binding to chromatin, the amount of information retrieved from our experiments leaves much for our imagination and will provide years of further investigations. For one, the role of transcription in generating SUMO dynamics could be investigated using transcription elongation inhibitors, such as CDK-9 inhibitors, for example, in the same time course. In addition, the contribution of SUMOylation dynamics to the control of individual genes and gene groups is another interesting area to investigate.

141

References

Aguilar-Martinez, E., Chen, X., Webber, A., Mould, A. P., Seifert, A., Hay, R. T. & Sharrocks, A. D. 2015. Screen for Multi-SUMO–Binding Proteins Reveals a Multi-Sim–Binding Mechanism for Recruitment of the Transcriptional Regulator Zmym2 to Chromatin. Proceedings of the National Academy of Sciences, 112, E4854-E4863.

Aguilar-Martínez, E. & Sharrocks, A. D. 2016. The Use of Multimeric Protein Scaffolds for Identifying Multi-SUMO Binding Proteins. In: Rodriguez, M. S. (ed.) SUMO: Methods and Protocols. New York, NY: Springer New York.

Alkuraya, F. S., Saadi, I., Lund, J. J., Turbe-Doan, A., Morton, C. C. & Maas, R. L. 2006. Sumo1 Haploinsufficiency Leads to Cleft Lip and Palate. Science, 313, 1751.

Allen, B. L. & Taatjes, D. J. 2015. The Mediator Complex: A Central Integrator of Transcription. Nature reviews Molecular cell biology, 16, 155.

Allfrey, V. G., Faulkner, R. & Mirsky, A. E. 1964. Acetylation and Methylation of Histones and Their Possible Role in the Regulation of Rna Synthesis. Proceedings of the National Academy of Sciences, 51, 786-794.

Anders, S. & Huber, W. 2010. Differential Expression Analysis for Sequence Count Data. Genome Biology, 11, R106.

Anders, S., Pyl, P. T. & Huber, W. 2015. Htseq--a Python Framework to Work with High- Throughput Sequencing Data. Bioinformatics (Oxford, England), 31, 166-169.

Andrews, S. 2014. Fastqc a Quality Control Tool for High Throughput Sequence Data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

Ardley, H. C. & Robinson, P. A. 2005. E3 Ubiquitin Ligases. Essays In Biochemistry, 41, 15-30.

Azevedo, C. & Saiardi, A. 2015. Why Always Lysine? The Ongoing Tale of One of the Most Modified Amino Acids. Advances in Biological Regulation, 60, 144-150.

Baek, S. H. 2006. A Novel Link between SUMO Modification and Cancer Metastasis. Cell Cycle, 5, 1492-1495.

Bahrami, S. & Drabløs, F. 2016. Gene Regulation in the Immediate-Early Response Process. Advances in Biological Regulation, 62, 37-49.

Bai, L. & Merchant, J. L. 2000. Transcription Factor Zbp-89 Cooperates with Histone Acetyltransferase P300 During Butyrate Activation of P21 Waf1 Transcription in Human Cells. Journal of Biological Chemistry, 275, 30725-30733.

142

Bailey, T. L., Boden, M., Buske, F. A., Frith, M., Grant, C. E., Clementi, L., Ren, J., Li, W. W. & Noble, W. S. 2009. Meme Suite: Tools for Motif Discovery and Searching. Nucleic Acids Research, 37, W202-W208.

Bannister, A. J. & Kouzarides, T. 2011. Regulation of Chromatin by Histone Modifications. Cell Research, 21, 381-395.

Bao, Y. & Shen, X. 2007. Ino80 Subfamily of Chromatin Remodeling Complexes. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 618, 18-29.

Bayer, P., Arndt, A., Metzger, S., Mahajan, R., Melchior, F., Jaenicke, R. & Becker, J. 1998. Structure Determination of the Small Ubiquitin-Related Modifier SUMO-1. Journal of Molecular Biology, 280, 275-286.

Becker, J., Barysch, S. V., Karaca, S., Dittner, C., Hsiao, H.-H., Diaz, M. B., Herzig, S., Urlaub, H. & Melchior, F. 2013. Detecting Endogenous SUMO Targets in Mammalian Cells and Tissues. Nature Structural &Amp; Molecular Biology, 20, 525.

Ben-Saadon, R., Zaaroor, D., Ziv, T. & Ciechanover, A. 2006. The Polycomb Protein Ring1b Generates Self Atypical Mixed Ubiquitin Chains Required for Its in Vitro Histone H2a Ligase Activity. Molecular Cell, 24, 701-711.

Boggio, R., Colombo, R., Hay, R. T., Draetta, G. F. & Chiocca, S. 2004. A Mechanism for Inhibiting the SUMO Pathway. Molecular Cell, 16, 549-561.

Boggio, R., Passafaro, A. & Chiocca, S. 2007. Targeting SUMO E1 to Ubiquitin Ligases: A Viral Strategy to Counteract Sumoylation. The Journal of Biological Chemistry, 282, 15376-15382.

Bohren, K. M., Nadkarni, V., Song, J. H., Gabbay, K. H. & Owerbach, D. 2004. A M55v Polymorphism in a Novel SUMO Gene (SUMO-4) Differentially Activates Heat Shock Transcription Factors and Is Associated with Susceptibility to Type I Diabetes Mellitus. Journal of Biological Chemistry, 279, 27233-27238.

Borun, T. W., Pearson, D. & Paik, W. K. 1972. Studies of Histone Methylation During the Hela S-3 Cell Cycle. The Journal of Biological Chemistry, 247, 4288-4298.

Bruderer, R., Tatham, M. H., Plechanovova, A., Matic, I., Garg, A. K. & Hay, R. T. 2011. Purification and Identification of Endogenous Polysumo Conjugates. EMBO reports, 12, 142-148.

Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. 2015. Atac-Seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Current protocols in molecular biology, 109, 21.29.1-21.29.9.

143

Buganim, Y., Faddah, D. A. & Jaenisch, R. 2013. Mechanisms and Models of Somatic Cell Reprogramming. Nature Reviews Genetics, 14, 427.

Butterfield, R. J., Stevenson, T. J., Xing, L., Newcomb, T. M., Nelson, B., Zeng, W., Li, X., Lu, H.- M., Lu, H., Farwell Gonzalez, K. D., Wei, J.-P., Chao, E. C., Prior, T. W., Snyder, P. J., Bonkowsky, J. L. & Swoboda, K. J. 2014. Congenital Lethal Motor Neuron Disease with a Novel Defect in Ribosome Biogenesis. Neurology, 82, 1322-1330.

Byvoet, P., Shepherd, G. R., Hardin, J. M. & Noland, B. J. 1972. The Distribution and Turnover of Labeled Methyl Groups in Histone Fractions of Cultured Mammalian Cells. Archives of Biochemistry and Biophysics, 148, 558-567.

Calo, E. & Wysocka, J. 2013. Modification of Enhancer Chromatin: What, How, and Why? Molecular Cell, 49, 825-837.

Cao, Y., Yao, Z., Sarkar, D., Lawrence, M., Sanchez, G. J., Parker, M. H., Macquarrie, K. L., Davison, J., Morgan, M. T., Ruzzo, W. L., Gentleman, R. C. & Tapscott, S. J. 2010. Genome- Wide Myod Binding in Skeletal Muscle Cells: A Potential for Broad Cellular Reprogramming. Developmental Cell, 18, 662-674.

Castle, C. D., Cassimere, E. K. & Denicourt, C. 2012. LAS1L Interacts with the Mammalian Rix1 Complex to Regulate Ribosome Biogenesis. Molecular biology of the cell, 23, 716- 728.

Castle, C. D., Cassimere, E. K., Lee, J. & Denicourt, C. 2010. Las1l Is a Nucleolar Protein Required for Cell Proliferation and Ribosome Biogenesis. Molecular and Cellular Biology, 30, 4404-4414.

Chang, C. C., Naik, M. T., Huang, Y. S., Jeng, J. C., Liao, P. H., Kuo, H. Y., Ho, C. C., Hsieh, Y. L., Lin, C. H. & Huang, N. J. 2011. Structural and Functional Roles of Daxx Sim Phosphorylation in SUMO Paralog-Selective Binding and Apoptosis Modulation. Mol Cell, 42, 62-74.

Chang, P.-C., Cheng, C.-Y., Campbell, M., Yang, Y.-C., Hsu, H.-W., Chang, T.-Y., Chu, C.-H., Lee, Y.- W., Hung, C.-L., Lai, S.-M., Tepper, C. G., Hsieh, W.-P., Wang, H.-W., Tang, C.-Y., Wang, W.-C. & Kung, H.-J. 2013. The Chromatin Modification by SUMO-2/3 but Not SUMO-1 Prevents the Epigenetic Activation of Key Immune-Related Genes During Kaposi’s Sarcoma Associated Herpesvirus Reactivation. BMC Genomics, 14, 824.

Chupreta, S., Brevig, H., Bai, L., Merchant, J. L. & Iñiguez-Lluhí, J. A. 2007. Sumoylation- Dependent Control of Homotypic and Heterotypic Synergy by the Krüppel-Type Zinc Finger Protein Zbp-89. Journal of Biological Chemistry, 282, 36155-36166.

Clapier, C. R. & Cairns, B. R. 2009. The Biology of Chromatin Remodeling Complexes. Annual Review of Biochemistry, 78, 273-304.

144

Clapier, C. R., Iwasa, J., Cairns, B. R. & Peterson, C. L. 2017. Mechanisms of Action and Regulation of Atp-Dependent Chromatin-Remodelling Complexes. Nature reviews Molecular cell biology, 18, 407.

Coleman, H. G., Xie, S.-H. & Lagergren, J. 2018. The Epidemiology of Esophageal Adenocarcinoma. Gastroenterology, 154, 390-405.

Cong, L., Ran, F. A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P. D., Wu, X., Jiang, W., Marraffini, L. A. & Zhang, F. 2013. Multiplex Genome Engineering Using Crispr/Cas Systems. Science, 339, 819-823.

Conway, J. R., Gehlenborg, N. & Lex, A. 2017. Upsetr: An R Package for the Visualization of Intersecting Sets and Their Properties. Bioinformatics, 33, 2938-2940.

Corona, D. F. V. & Tamkun, J. W. 2004. Multiple Roles for Iswi in Transcription, Chromosome Organization and DNA Replication. Biochimica et Biophysica Acta, 1677, 113-119.

Danielsen, J. R., Povlsen, L. K., Villumsen, B. H., Streicher, W., Nilsson, J., Wikström, M., Bekker-Jensen, S. & Mailand, N. 2012. DNA Damage–Inducible Sumoylation of Herc2 Promotes Rnf8 Binding Via a Novel SUMO-Binding Zinc Finger. The Journal of Cell Biology, 197, 179-187.

Desterro, J. M., Rodriguez, M. S. & Hay, R. T. 1998. SUMO-1 Modification of Ikappabalpha Inhibits Nf-Kappab Activation. Molecular Cell, 2, 233-239.

Ding, J., Huang, X., Shao, N., Zhou, H., Lee, D.-F., Faiola, F., Fidalgo, M., Guallar, D., Saunders, A. & Shliaha, P. V. 2015. Tex10 Coordinates Epigenetic Control of Super-Enhancer Activity in Pluripotency and Reprogramming. Cell stem cell, 16, 653-668.

Ding, N., Zhou, H., Esteve, P.-O., Chin, H. G., Kim, S., Xu, X., Joseph, S. M., Friez, M. J., Schwartz, C. E., Pradhan, S. & Boyer, T. G. 2008. Mediator Links Epigenetic Silencing of Neuronal Gene Expression with X-Linked Mental Retardation. Molecular Cell, 31, 347-359.

Dobin, A., Davis, C. A., Zaleski, C., Schlesinger, F., Drenkow, J., Chaisson, M., Batut, P., Jha, S. & Gingeras, T. R. 2012. Star: Ultrafast Universal Rna-Seq Aligner. Bioinformatics, 29, 15-21.

Dou, H., Huang, C., Van Nguyen, T., Lu, L.-S. & Yeh, E. T. H. 2011. Sumoylation and De- Sumoylation in Response to DNA Damage. FEBS Letters, 585, 2891-2896.

Dou, Y., Milne, T. A., Tackett, A. J., Smith, E. R., Fukuda, A., Wysocka, J., Allis, C. D., Chait, B. T., Hess, J. L. & Roeder, R. G. 2005. Physical Association and Coordinate Function of the H3 K4 Methyltransferase Mll1 and the H4 K16 Acetyltransferase Mof. Cell, 121, 873-885.

145

Eckermann, K. 2013. SUMO and Parkinson’s Disease. NeuroMolecular Medicine, 15, 737- 759.

Fanis, P., Gillemans, N., Aghajanirefah, A., Pourfarzad, F., Demmers, J., Esteghamat, F., Vadlamudi, R. K., Grosveld, F., Philipsen, S. & Van Dijk, T. B. 2012. Five Friends of Methylated Chromatin Target of Protein-Arginine-Methyltransferase[Prmt]-1 (Chtop), a Complex Linking Arginine Methylation to Desumoylation. Molecular & Cellular Proteomics, 11, 1263-1273.

Finkbeiner, E., Haindl, M. & Muller, S. 2011a. The SUMO System Controls Nucleolar Partitioning of a Novel Mammalian Ribosome Biogenesis Complex. The EMBO journal, 30, 1067-1078.

Finkbeiner, E., Haindl, M., Raman, N. & Muller, S. 2011b. SUMO Routes Ribosome Maturation. Nucleus, 2, 527-532.

Flaus, A., Martin, D. M. A., Barton, G. J. & Owen-Hughes, T. 2006. Identification of Multiple Distinct Snf2 Subfamilies with Conserved Structural Motifs. Nucleic Acids Research, 34, 2887-2905.

Fuda, N. J., Ardehali, M. B. & Lis, J. T. 2009. Defining Mechanisms That Regulate Rna Polymerase Ii Transcription in Vivo. Nature, 461, 186.

Galanty, Y., Belotserkovskaya, R., Coates, J., Polo, S., Miller, K. M. & Jackson, S. P. 2009. Mammalian SUMO E3-Ligases Pias1 and Pias4 Promote Responses to DNA Double- Strand Breaks. Nature, 462, 935-939.

Garcia-Dominguez, M. & Reyes, J. C. 2009. SUMO Association with Repressor Complexes, Emerging Routes for Transcriptional Control. Biochimica et Biophysica Acta, 1789, 451- 459.

Geiss-Friedlander, R. & Melchior, F. 2007. Concepts in Sumoylation: A Decade On. Nature reviews Molecular cell biology, 8, 947-956.

Gelderman, K. A., Hultqvist, M., Holmberg, J., Olofsson, P. & Holmdahl, R. 2006. T Cell Surface Redox Levels Determine T Cell Reactivity and Arthritis Susceptibility. Proceedings of the National Academy of Sciences, 103, 12831-12836.

Genome Project Data Processing Subgroup, Wysoker, A., Handsaker, B., Marth, G., Abecasis, G., Li, H., Ruan, J., Homer, N., Durbin, R. & Fennell, T. 2009. The Sequence Alignment/Map Format and Samtools. Bioinformatics, 25, 2078-2079.

Gill, G. 2005. Something About SUMO Inhibits Transcription. Current Opinion in Genetics & Development, 15, 536-541.

146

Girard, B. J., Daniel, A. R., Lange, C. A. & Ostrander, J. H. 2014. PELP1: A Review of PELP1 Interactions, Signaling, and Biology. Molecular and cellular endocrinology, 382, 642-651.

Girdwood, D. W. H., Tatham, M. H. & Hay, R. T. 2004. SUMO and Transcriptional Regulation. Seminars in Cell & Developmental Biology, 15, 201-210.

Glotzer, M., Murray, A. W. & Kirschner, M. W. 1991. Cyclin Is Degraded by the Ubiquitin Pathway. Nature, 349, 132-138.

Goldknopf, I. L., Taylor, C. W., Baum, R. M., Yeoman, L. C., Olson, M. O., Prestayko, A. W. & Busch, H. 1975. Isolation and Characterization of Protein A24, a "Histone-Like" Non- Histone Chromosomal Protein. The Journal of Biological Chemistry, 250, 7182-7187.

Goldstein, G., Scheid, M., Hammerling, U., Schlesinger, D. H., Niall, H. D. & Boyse, E. A. 1975. Isolation of a Polypeptide That Has Lymphocyte-Differentiating Properties and Is Probably Represented Universally in Living Cells. Proceedings of the National Academy of Sciences, 72, 11-15.

Golebiowski, F., Matic, I., Tatham, M. H., Cole, C., Yin, Y. & Nakamura, A. 2009a. System-Wide Changes to SUMO Modifications in Response to Heat Shock. Science Signaling, 2, ra24.

Golebiowski, F., Matic, I., Tatham, M. H., Cole, C., Yin, Y., Nakamura, A., Cox, J., Barton, G. J., Mann, M. & Hay, R. T. 2009b. System-Wide Changes to SUMO Modifications in Response to Heat Shock. Science Signaling, 2, ra24-ra24.

Gonugunta, V. K., Miao, L., Sareddy, G. R., Ravindranathan, P., Vadlamudi, R. & Raj, G. V. 2014. The Social Network of PELP1 and Its Implications in Breast and Prostate Cancers. Endocrine-Related Cancer, 21, T79-T86.

Grossman, R. L., Heath, A. P., Ferretti, V., Varmus, H. E., Lowy, D. R., Kibbe, W. A. & Staudt, L. M. 2016. Toward a Shared Vision for Cancer Genomic Data. New England Journal of Medicine, 375, 1109-1112.

Guenther, M. G., Barak, O. & Lazar, M. A. 2001. The Smrt and N-Cor Corepressors Are Activating Cofactors for Histone Deacetylase 3. Molecular and Cellular Biology, 21, 6091- 6101.

Guo, B. & Sharrocks, A. D. 2009. Extracellular Signal-Regulated Kinase Mitogen- Activated Protein Kinase Signaling Initiates a Dynamic Interplay between Sumoylation and Ubiquitination to Regulate the Activity of the Transcriptional Activator Pea3. Molecular and Cellular Biology, 29, 3204-3218.

Guzzo, C. M., Ringel, A., Cox, E., Uzoma, I., Zhu, H., Blackshaw, S., Wolberger, C. & Matunis, M. J. 2014. Characterization of the SUMO-Binding Activity of the Myeloproliferative and

147

Mental Retardation (Mym)-Type Zinc Fingers in Znf261 and Znf198. PLoS ONE, 9, e105271.

Habashy, H. O., Powe, D. G., Rakha, E. A., Ball, G., Macmillan, R. D., Green, A. R. & Ellis, I. O. 2010. The Prognostic Significance of PELP1 Expression in Invasive Breast Cancer with Emphasis on the Er-Positive Luminal-Like Subtype. Breast Cancer Research and Treatment, 120, 603-612.

Hagiwara, T., Nakashima, K., Hirano, H., Senshu, T. & Yamada, M. 2002. Deimination of Arginine Residues in Nucleophosmin/B23 and Histones in Hl-60 Granulocytes. Biochemical and Biophysical Research Communications, 290, 979-983.

Hayden, M. S. & Ghosh, S. 2012. Nf-Κb, the First Quarter-Century: Remarkable Progress and Outstanding Questions. Genes & Development, 26, 203-234.

He, X., Riceberg, J., Pulukuri, S. M., Grossman, S., Shinde, V., Shah, P., Brownell, J. E., Dick, L., Newcomb, J. & Bence, N. 2015. Characterization of the Loss of SUMO Pathway Function on Cancer Cells and Tumor Proliferation. PLoS ONE, 10, e0123882.

Hecker, C.-M., Rabiller, M., Haglund, K., Bayer, P. & Dikic, I. 2006. Specification of Sumo1- and Sumo2-Interacting Motifs. The Journal of Biological Chemistry, 281, 16117-16127.

Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P., Cheng, J. X., Murre, C., Singh, H. & Glass, C. K. 2010. Simple Combinations of Lineage-Determining Transcription Factors Prime Cis-Regulatory Elements Required for Macrophage and B Cell Identities. Molecular Cell, 38, 576-589.

Hempel, K., Lange, H. W. & Birkofer, L. 1968. ∈-N-Trimethyllysin, Eine Neue Aminosäure in Histonen. Naturwissenschaften, 55, 37-37.

Hendriks, I. & C.O. Vertegaal, A. 2015a. Sumo in the DNA Damage Response.

Hendriks, I. A., D'souza, R. C., Yang, B., Verlaan-De Vries, M., Mann, M. & Vertegaal, A. C. 2014. Uncovering Global Sumoylation Signaling Networks in a Site-Specific Manner. Nature Structural & Molecular Biology, 21, 927-936.

Hendriks, I. A., Lyon, D., Young, C., Jensen, L. J., Vertegaal, A. C. O. & Nielsen, M. L. 2017. Site- Specific Mapping of the Human SUMO Proteome Reveals Co-Modification with Phosphorylation. Nature Structural &Amp; Molecular Biology, 24, 325.

Hendriks, Ivo a., Treffers, Louise w., Verlaan-De vries, M., Olsen, Jesper v. & Vertegaal, Alfred c. O. 2015b. SUMO-2 Orchestrates Chromatin Modifiers in Response to DNA Damage. Cell Reports, 10, 1778-1791.

148

Hendriks, I. A. & Vertegaal, A. C. O. 2016. A Comprehensive Compilation of SUMO Proteomics. Nature reviews Molecular cell biology, 17, 581-595.

Hickey, C. M., Wilson, N. R. & Hochstrasser, M. 2012. Function and Regulation of SUMO Proteases. Nature reviews Molecular cell biology, 13, 755-766.

Hilz, H., Adamietz, P., Bredehorst, R. & Wielckens, K. 1979. Adp-Ribosylation of Nuclear Proteins. Advances in Enzyme Regulation, 17, 195-211.

Ho, L. & Crabtree, G. R. 2010. Chromatin Remodelling During Development. Nature, 463, 474.

Hochstrasser, M. 2001. Sp-Ring for SUMO: New Functions Bloom for a Ubiquitin-Like Protein. Cell, 107, 5-8.

Hodawadekar, S. C. & Marmorstein, R. 2007. Chemistry of Acetyl Transfer by Histone Modifying Enzymes: Structure, Mechanism and Implications for Effector Design. Oncogene, 26, 5528-5540.

Hoege, C., Pfander, B., Moldovan, G. L., Pyrowolakis, G. & Jentsch, S. 2002. Rad6-Dependent DNA Repair Is Linked to Modification of Pcna by Ubiquitin and SUMO. Nature, 419, 135- 141.

Hsu, P. D., Scott, D. A., Weinstein, J. A., Ran, F. A., Konermann, S., Agarwala, V., Li, Y., Fine, E. J., Wu, X., Shalem, O., Cradick, T. J., Marraffini, L. A., Bao, G. & Zhang, F. 2013. DNA Targeting Specificity of Rna-Guided Cas9 Nucleases. Nature biotechnology, 31, 827.

Huang, W.-C., Ko, T.-P., Li, S. S. L. & Wang, A. H. J. 2004. Crystal Structures of the Human SUMO-2 Protein at 1.6 a and 1.2 a Resolution: Implication on the Functional Differences of SUMO Proteins. European Journal of Biochemistry, 271, 4114-4122.

Impens, F., Radoshevich, L., Cossart, P. & Ribet, D. 2014. Mapping of SUMO Sites and Analysis of Sumoylation Changes Induced by External Stimuli. Proceedings of the National Academy of Sciences, 111, 12432-12437.

Ishov, A. M., Sotnikov, A. G., Negorev, D., Vladimirova, O. V., Neff, N., Kamitani, T., Yeh, E. T. H., Strauss, J. F. & Maul, G. G. 1999. Pml Is Critical for Nd10 Formation and Recruits the Pml- Interacting Protein Daxx to This Nuclear Structure When Modified by Sumo-1. The Journal of Cell Biology, 147, 221-234.

Jackson, S. P. & Durocher, D. 2013. Regulation of DNA Damage Responses by Ubiquitin and SUMO. Molecular Cell, 49, 795-807.

149

Kashiwaya, K., Nakagawa, H., Hosokawa, M., Mochizuki, Y., Ueda, K., Piao, L., Chung, S., Hamamoto, R., Eguchi, H., Ohigashi, H., Ishikawa, O., Janke, C., Shinomura, Y. & Nakamura, Y. 2010. Involvement of the Tubulin Tyrosine Ligase-Like Family Member 4 Polyglutamylase in PELP1 Polyglutamylation and Chromatin Remodeling in Pancreatic Cancer Cells. Cancer Research, 70, 4024-4033.

Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. E. 2015. The Phyre2 Web Portal for Protein Modeling, Prediction and Analysis. Nat. Protocols, 10, 845-858.

Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., Haussler & David 2002. The Human Genome Browser at Ucsc. Genome Research, 12, 996-1006.

Kim, H. T., Kim, K. P., Lledias, F., Kisselev, A. F., Scaglione, K. M., Skowyra, D., Gygi, S. P. & Goldberg, A. L. 2007. Certain Pairs of Ubiquitin-Conjugating Enzymes (E2s) and Ubiquitin-Protein Ligases (E3s) Synthesize Nondegradable Forked Ubiquitin Chains Containing All Possible Isopeptide Linkages. Journal of Biological Chemistry, 282, 17375- 17386.

Kirisako, T., Kamei, K., Murata, S., Kato, M., Fukumoto, H., Kanie, M., Sano, S., Tokunaga, F., Tanaka, K. & Iwai, K. 2006. A Ubiquitin Ligase Complex Assembles Linear Polyubiquitin Chains. The EMBO journal, 25, 4877-4887.

Knipscheer, P., Van Dijk, W. J., Olsen, J. V., Mann, M. & Sixma, T. K. 2007. Noncovalent Interaction between Ubc9 and SUMO Promotes SUMO Chain Formation. The EMBO journal, 26, 2797-2807.

Kornberg, R. D. 1974. Chromatin Structure: A Repeating Unit of Histones and DNA. Science, 184, 868-871.

Kouzarides, T. 2007. Chromatin Modifications and Their Function. Cell, 128, 693-705.

Kumar, R., Zhang, H., Holm, C., Vadlamudi, R. K., Landberg, G. & Rayala, S. K. 2009. Extranuclear Coactivator Signaling Confers Insensitivity to Tamoxifen. Clinical Cancer Research, 15, 4123-4130.

Kunz, K., Piller, T. & Müller, S. 2018. SUMO-Specific Proteases and Isopeptidases of the SENP Family at a Glance. Journal of Cell Science, 131, jcs211904.

Kuo, H.-Y., Chang, C.-C., Jeng, J.-C., Hu, H.-M., Lin, D.-Y., Maul, G. G., Kwok, R. P. S. & Shih, H.-M. 2005. SUMO Modification Negatively Modulates the Transcriptional Activity of Creb- Binding Protein Via the Recruitment of Daxx. Proceedings of the National Academy of Sciences of the United States of America, 102, 16973-16978.

150

Lamoliatte, F., Caron, D., Durette, C., Mahrouche, L., Maroui, M. A., Caron-Lizotte, O., Bonneil, E., Chelbi-Alix, M. K. & Thibault, P. 2014. Large-Scale Analysis of Lysine Sumoylation by SUMO Remnant Immunoaffinity Profiling. Nature Communications, 5, 5409.

Langmead, B. & Salzberg, S. L. 2012. Fast Gapped-Read Alignment with Bowtie 2. Nature Methods, 9, 357.

Law, G. L., Itoh, H., Law, D. J., Mize, G. J., Merchant, J. L. & Morris, D. R. 1998. Transcription Factor Zbp-89 Regulates the Activity of the Ornithine Decarboxylase Promoter. Journal of Biological Chemistry, 273, 19955-19964.

Lee, L., Sakurai, M., Matsuzaki, S., Arancio, O. & Fraser, P. 2013a. SUMO and Alzheimer’s Disease. NeuroMolecular Medicine, 15, 720-736.

Lee, Tong i. & Young, Richard a. 2013b. Transcriptional Regulation and Its Misregulation in Disease. Cell, 152, 1237-1251.

Lee, Y., Chun, S. K. & Kim, K. 2015. Sumoylation Controls Clock-Bmal1-Mediated Clock Resetting Via Cbp Recruitment in Nuclear Transcriptional Foci. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research, 1853, 2697-2708.

Lemercier, C., Verdel, A., Galloo, B., Curtet, S., Brocard, M.-P. & Khochbin, S. 2000. Mhda1/Hdac5 Histone Deacetylase Interacts with and Represses Mef2a Transcriptional Activity. Journal of Biological Chemistry, 275, 15594-15599.

Lettice, Laura a., Williamson, I., Wiltshire, John h., Peluso, S., Devenney, Paul s., Hill, Alison e., Essafi, A., Hagman, J., Mort, R., Grimes, G., Deangelis, Carlo l. & Hill, Robert e. 2012. Opposing Functions of the Ets Factor Family Define Shh Spatial Expression in Limb Buds and Underlie Polydactyly. Developmental Cell, 22, 459-467.

Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. 2014. Upset: Visualization of Intersecting Sets. IEEE transactions on visualization and computer graphics, 20, 1983-1992.

Li, W. & Ye, Y. 2008. Polyubiquitin Chains: Functions, Structures, and Mechanisms. Cellular and Molecular Life Sciences, 65, 2397-2406.

Liang, Y.-C., Lee, C.-C., Yao, Y.-L., Lai, C.-C., Schmitz, M. L. & Yang, W.-M. 2016. Sumo5, a Novel Poly-SUMO Isoform, Regulates Pml Nuclear Bodies. Scientific reports, 6, 26509.

Lin, Charles y., Lovén, J., Rahl, Peter b., Paranal, Ronald m., Burge, Christopher b., Bradner, James e., Lee, Tong i. & Young, Richard a. 2012. Transcriptional Amplification in Tumor Cells with Elevated C-Myc. Cell, 151, 56-67.

151

Livak, K. J. & Schmittgen, T. D. 2001. Analysis of Relative Gene Expression Data Using Real-Time Quantitative Pcr and the 2−Δδct Method. Methods, 25, 402-408.

Ma, X., Yang, T., Luo, Y., Wu, L., Jiang, Y., Song, Z., Pan, T., Liu, B., Liu, G., Liu, J., Yu, F., He, Z., Zhang, W., Yang, J., Liang, L., Guan, Y., Zhang, X., Li, L., Cai, W., Tang, X., Gao, S., Deng, K. & Zhang, H. 2019. Trim28 Promotes Hiv-1 Latency by Sumoylating Cdk9 and Inhibiting P- Tefb. eLife, 8, e42426.

Mahajan, R., Gerace, L. & Melchior, F. 1998. Molecular Characterization of the SUMO-1 Modification of Rangap1 and Its Role in Nuclear Envelope Association. The Journal of Cell Biology, 140, 259-270.

Malovannaya, A., Lanz, Rainer b., Jung, Sung y., Bulynko, Y., Le, Nguyen t., Chan, Doug w., Ding, C., Shi, Y., Yucer, N., Krenciute, G., Kim, B.-J., Li, C., Chen, R., Li, W., Wang, Y., O'malley, Bert w. & Qin, J. 2011. Analysis of the Human Endogenous Coregulator Complexome. Cell, 145, 787-799.

Mann, M., Cortez, V. & Vadlamudi, R. 2013. PELP1 Oncogenic Functions Involve Carm1 Regulation. Carcinogenesis, 34, 1468-1475.

Marfella, C. G. A. & Imbalzano, A. N. 2007. The Chd Family of Chromatin Remodelers. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 618, 30-40.

Matic, I., Schimmel, J., Hendriks, I. A., Van Santen, M. A., Van De Rijke, F., Van Dam, H., Gnad, F., Mann, M. & Vertegaal, A. C. O. 2010. Site-Specific Identification of SUMO-2 Targets in Cells Reveals an Inverted Sumoylation Motif and a Hydrophobic Cluster Sumoylation Motif. Molecular Cell, 39, 641-652.

Mclean, C. Y., Bristor, D., Hiller, M., Clarke, S. L., Schaar, B. T., Lowe, C. B., Wenger, A. M. & Bejerano, G. 2010. Great Improves Functional Interpretation of Cis-Regulatory Regions. Nature biotechnology, 28, 495-501.

Mellacheruvu, D., Wright, Z., Couzens, A. L., Lambert, J.-P., St-Denis, N. A., Li, T., Miteva, Y. V., Hauri, S., Sardiu, M. E., Low, T. Y., Halim, V. A., Bagshaw, R. D., Hubner, N. C., Al-Hakim, A., Bouchard, A., Faubert, D., Fermin, D., Dunham, W. H., Goudreault, M., Lin, Z.-Y., Badillo, B. G., Pawson, T., Durocher, D., Coulombe, B., Aebersold, R., Superti-Furga, G., Colinge, J., Heck, A. J. R., Choi, H., Gstaiger, M., Mohammed, S., Cristea, I. M., Bennett, K. L., Washburn, M. P., Raught, B., Ewing, R. M., Gingras, A.-C. & Nesvizhskii, A. I. 2013. The Crapome: A Contaminant Repository for Affinity Purification-Mass Spectrometry Data. Nat Methods, 10, 730-736.

Messner, S., Altmeyer, M., Zhao, H., Pozivil, A., Roschitzki, B., Gehrig, P., Rutishauser, D., Huang, D., Caflisch, A. & Hottiger, M. O. 2010. Parp1 Adp-Ribosylates Lysine Residues of the Core Histone Tails. Nucleic Acids Research, 38, 6350-6362.

152

Mohideen, F., Capili, A. D., Bilimoria, P. M., Yamada, T., Bonni, A. & Lima, C. D. 2009. A Molecular Basis for Phosphorylation-Dependent SUMO Conjugation by the E2 Ubc9. Nature Structural & Molecular Biology, 16, 945-952.

Morris, J. R., Boutell, C., Keppler, M., Densham, R., Weekes, D., Alamshah, A., Butler, L., Galanty, Y., Pangon, L., Kiuchi, T., Ng, T. & Solomon, E. 2009. The SUMO Modification Pathway Is Involved in the Brca1 Response to Genotoxic Stress. Nature, 462, 886-890.

Mukhopadhyay, D. & Dasso, M. 2007. Modification in Reverse: The SUMO Proteases. Trends in Biochemical Sciences, 32, 286-295.

Muller, S., Ledl, A. & Schmidt, D. 2004. SUMO: A Regulator of Gene Expression and Genome Integrity. Oncogene, 23, 1998-2008.

Murray, K. 1964. The Occurrence of Iε-N-Methyl Lysine in Histones. Biochemistry, 3, 10- 15.

Muse, G. W., Gilchrist, D. A., Nechaev, S., Shah, R., Parker, J. S., Grissom, S. F., Zeitlinger, J. & Adelman, K. 2007. Rna Polymerase Is Poised for Activation across the Genome. Nature Genetics, 39, 1507.

Musselman, C. A., Lalonde, M.-E., Cote, J. & Kutateladze, T. G. 2012. Perceiving the Epigenetic Landscape through Histone Readers. Nature Structural & Molecular Biology, 19, 1218-1227.

Nair, S. S., Nair, B. C., Cortez, V., Chakravarty, D., Metzger, E., Schüle, R., Brann, D. W., Tekmal, R. R. & Vadlamudi, R. K. 2010. PELP1 Is a Reader of Histone H3 Methylation That Facilitates Oestrogen Receptor‐Α Target Gene Activation by Regulating Lysine Demethylase 1 Specificity. EMBO reports, 11, 438-444.

Nakagawa, K. & Kuzumaki, N. 2005. Transcriptional Activity of Megakaryoblastic Leukemia 1 (Mkl1) Is Repressed by SUMO Modification. Genes to Cells, 10, 835-850.

Nathan, C. & Cunningham-Bussel, A. 2013. Beyond Oxidative Stress: An Immunologist'S Guide to Reactive Oxygen Species. Nature Reviews Immunology, 13, 349.

Nishida, T., Tanaka, H. & Yasuda, H. 2000. A Novel Mammalian Smt3-Specific Isopeptidase 1 (Smt3ip1) Localized in the Nucleolus at Interphase. European Journal of Biochemistry, 267, 6423-6427.

Paik, W. K. & Kim, S. 1967. E-N-Dimethyllysine in Histones. Biochemical and Biophysical Research Communications, 27, 479-483.

153

Peng, J., Schwartz, D., Elias, J. E., Thoreen, C. C., Cheng, D., Marsischky, G., Roelofs, J., Finley, D. & Gygi, S. P. 2003. A Proteomics Approach to Understanding Protein Ubiquitination. Nature biotechnology, 21, 921-926.

Pfleger, C. M. & Kirschner, M. W. 2000. The Ken Box: An Apc Recognition Signal Distinct from the D Box Targeted by Cdh1. Genes & Development, 14, 655-665.

Pichler, A., Knipscheer, P., Oberhofer, E., Van Dijk, W. J., Korner, R., Olsen, J. V., Jentsch, S., Melchior, F. & Sixma, T. K. 2005. SUMO Modification of the Ubiquitin-Conjugating Enzyme E2-25k. Nature Structural & Molecular Biology, 12, 264-269.

Pickart, C. M. & Fushman, D. 2004. Polyubiquitin Chains: Polymeric Protein Signals. Current Opinion in Chemical Biology, 8, 610-616.

Pilon, A. M., Ajay, S. S., Kumar, S. A., Steiner, L. A., Cherukuri, P. F., Wincovitch, S., Anderson, S. M., Mullikin, J. C., Gallagher, P. G., Hardison, R. C., Margulies, E. H. & Bodine, D. M. 2011. Genome-Wide Chip-Seq Reveals a Dramatic Shift in the Binding of the Transcription Factor Erythroid Kruppel-Like Factor During Erythrocyte Differentiation. Blood, 118, e139-e148.

Prlić, A., Bradley, A. R., Duarte, J. M., Rose, P. W., Rose, A. S. & Valasatava, Y. 2018. Ngl Viewer: Web-Based Molecular Graphics for Large Complexes. Bioinformatics, 34, 3755- 3758.

Prudden, J., Pebernard, S., Raffa, G., Slavin, D. A., Perry, J. J. P., Tainer, J. A., Mcgowan, C. H. & Boddy, M. N. 2007. SUMO‐Targeted Ubiquitin Ligases in Genome Stability. The EMBO journal, 26, 4089-4101.

Psakhye, I. & Jentsch, S. 2012. Protein Group Modification and Synergy in the SUMO Pathway as Exemplified in DNA Repair. Cell, 151, 807-820.

Quinlan, A. R. & Hall, I. M. 2010. Bedtools: A Flexible Suite of Utilities for Comparing Genomic Features. Bioinformatics, 26, 841-842.

R Core Team 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

Raman, N., Weir, E. & Müller, S. 2016. The Aaa Atpase Mdn1 Acts as a SUMO-Targeted Regulator in Mammalian Pre-Ribosome Remodeling. Molecular Cell, 64, 607-615.

Ramírez, F., Ryan, D. P., Grüning, B., Bhardwaj, V., Kilpert, F., Richter, A. S., Heyn, S., Dündar, F. & Manke, T. 2016. Deeptools2: A Next Generation Web Server for Deep-Sequencing Data Analysis. Nucleic Acids Research, 44, W160-W165.

154

Ran, F. A., Hsu, P. D., Wright, J., Agarwala, V., Scott, D. A. & Zhang, F. 2013. Genome Engineering Using the Crispr-Cas9 System. Nature Protocols, 8, 2281.

Reverter, D. & Lima, C. D. 2004. A Basis for SUMO Protease Specificity Provided by Analysis of Human Senp2 and a Senp2-SUMO Complex. Structure, 12, 1519-1531.

Reynolds, A. P., Johnson, A. K., Haugen, E., Rynes, E., Vierstra, J., Stamatoyannopoulos, J. A., Kuehn, M. S., Maurano, M. T., Humbert, R., Sandstrom, R., Thurman, R. E., Thomas, S. & Neph, S. 2012. Bedops: High-Performance Genomic Feature Operations. Bioinformatics, 28, 1919-1920.

Ribet, D., Hamon, M., Gouin, E., Nahori, M.-A., Impens, F., Neyret-Kahn, H., Gevaert, K., Vandekerckhove, J., Dejean, A. & Cossart, P. 2010. Listeria Monocytogenes Impairs Sumoylation for Efficient Infection. Nature, 464, 1192-1195.

Richard, P., Vethantham, V. & Manley, J. L. 2017. Roles of Sumoylation in Mrna Processing and Metabolism. Advances in experimental medicine and biology, 963, 15-33.

Rodriguez, M. S., Dargemont, C. & Hay, R. T. 2001. SUMO-1 Conjugation in Vivo Requires Both a Consensus Modification Motif and Nuclear Targeting. The Journal of Biological Chemistry, 276, 12654-12659.

Rogers, G. E. 1962. Occurrence of Citrulline in Proteins. Nature, 194, 1149-1151.

Rosendorff, A., Sakakibara, S., Lu, S., Kieff, E., Xuan, Y., Dibacco, A., Shi, Y., Shi, Y. & Gill, G. 2006. Nxp-2 Association with SUMO-2 Depends on Lysines Required for Transcriptional Repression. Proceedings of the National Academy of Sciences, 103, 5308- 5313.

Rosonina, E., Akhter, A., Dou, Y., Babu, J. & Sri Theivakadadcham, V. S. 2017. Regulation of Transcription Factors by Sumoylation. Transcription, 8, 220-231.

Ross-Innes, C. S., Stark, R., Teschendorff, A. E., Holmes, K. A., Ali, H. R., Dunning, M. J., Brown, G. D., Gojis, O., Ellis, I. O., Green, A. R., Ali, S., Chin, S.-F., Palmieri, C., Caldas, C. & Carroll, J. S. 2012. Differential Oestrogen Receptor Binding Is Associated with Clinical Outcome in Breast Cancer. Nature, 481, 389.

Sahin, U., Ferhi, O., Jeanne, M., Benhenda, S., Berthier, C., Jollivet, F., Niwa-Kawakita, M., Faklaris, O., Setterblad, N., De Thé, H. & Lallemand-Breitenbach, V. 2014. Oxidative Stress– Induced Assembly of Pml Nuclear Bodies Controls Sumoylation of Partner Proteins. The Journal of Cell Biology, 204, 931-945.

Saitoh, H. & Hinchey, J. 2000. Functional Heterogeneity of Small Ubiquitin-Related Protein Modifiers SUMO-1 Versus SUMO-2/3. The Journal of Biological Chemistry, 275, 6252-6258.

155

Sampson, D. A., Wang, M. & Matunis, M. J. 2001. The Small Ubiquitin-Like Modifier-1 (SUMO-1) Consensus Sequence Mediates Ubc9 Binding and Is Essential for SUMO-1 Modification. The Journal of Biological Chemistry, 276, 21664-21669.

Savic, D., Partridge, E. C., Newberry, K. M., Smith, S. B., Meadows, S. K., Roberts, B. S., Mackiewicz, M., Mendenhall, E. M. & Myers, R. M. 2015. Cetch-Seq: Crispr Epitope Tagging Chip-Seq of DNA-Binding Proteins. Genome Research, 25, 1581-1589.

Schimmel, J., Eifler, K., Sigurðsson, Jón o., Cuijpers, Sabine a. G., Hendriks, Ivo a., Verlaan- De vries, M., Kelstrup, Christian d., Francavilla, C., Medema, René h., Olsen, Jesper v. & Vertegaal, Alfred c. O. 2014. Uncovering Sumoylation Dynamics During Cell-Cycle Progression Reveals Foxm1 as a Key Mitotic SUMO Target Protein. Molecular Cell, 53, 1053-1066.

Schindelin, J., Arganda-Carreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B., Tinevez, J.-Y., White, D. J., Hartenstein, V., Eliceiri, K., Tomancak, P. & Cardona, A. 2012. Fiji: An Open-Source Platform for Biological-Image Analysis. Nature Methods, 9, 676-682.

Schrödinger, L. 2015. The Pymol Molecular Graphics System. 1.8 ed.

Schulz, S., Chachami, G., Kozaczkiewicz, L., Winter, U., Stankovic-Valentin, N., Haas, P., Hofmann, K., Urlaub, H., Ovaa, H., Wittbrodt, J., Meulmeester, E. & Melchior, F. 2012. Ubiquitin-Specific Protease-Like 1 (Uspl1) Is a SUMO Isopeptidase with Essential, Non-Catalytic Functions. EMBO reports, 13, 930-938.

Seeler, J. S., Bischof, O., Nacerddine, K. & Dejean, A. 2007. SUMO, the Three Rs and Cancer. In: Pandolfi, P. & Vogt, P. (eds.) Acute Promyelocytic Leukemia. Springer Berlin Heidelberg.

Segal, E., Fondufe-Mittendorf, Y., Chen, L., Thastrom, A., Field, Y., Moore, I. K., Wang, J.-P. Z. & Widom, J. 2006. A Genomic Code for Nucleosome Positioning. Nature, 442, 772-778.

Sekiyama, N., Ikegami, T., Yamane, T., Ikeguchi, M., Uchimura, Y., Baba, D., Ariyoshi, M., Tochio, H., Saitoh, H. & Shirakawa, M. 2008. Structure of the Small Ubiquitin-Like Modifier (SUMO)-Interacting Motif of Mbd1-Containing Chromatin-Associated Factor 1 Bound to SUMO-3. The Journal of Biological Chemistry, 283, 35966-35975.

Shiio, Y. & Eisenman, R. N. 2003. Histone Sumoylation Is Associated with Transcriptional Repression. Proceedings of the National Academy of Sciences, 100, 13225-13230.

Shin, E. J., Shin, H. M., Nam, E., Kim, W. S., Kim, J. H., Oh, B. H. & Yun, Y. 2012. Desumoylating Isopeptidase: A Second Class of SUMO Protease. EMBO reports, 13, 339-346.

156

Silver, H. R., Nissley, J. A., Reed, S. H., Hou, Y.-M. & Johnson, E. S. 2011. A Role for SUMO in Nucleotide Excision Repair. DNA Repair, 10, 1243-1251.

Smith, D. L., Chen, C.-C., Bruegger, B. B., Holtz, S. L., Halpern, R. M. & Smith, R. A. 1974. Characterization of Protein Kinases Forming Acid-Labile Histone Phosphates in Walker-256 Carcinosarcoma Cell Nuclei. Biochemistry, 13, 3780-3785.

Song, G. G., Choi, S. J., Ji, J. D. & Lee, Y. H. 2012. Association between the Sumo4 M55v (A163g) Polymorphism and Susceptibility to Type 1 Diabetes: A Meta-Analysis. Human Immunology, 73, 1055-1059.

Song, J., Durrin, L. K., Wilkinson, T. A., Krontiris, T. G. & Chen, Y. 2004. Identification of a SUMO-Binding Motif That Recognizes SUMO-Modified Proteins. Proceedings of the National Academy of Sciences, 101, 14373-14378.

Soule, H. D., Maloney, T. M., Wolman, S. R., Peterson, W. D., Brenz, R., Mcgrath, C. M., Russo, J., Pauley, R. J., Jones, R. F. & Brooks, S. C. 1990. Isolation and Characterization of a Spontaneously Immortalized Human Breast Epithelial Cell Line, Mcf-10. Cancer Research, 50, 6075-6086.

Spitz, F. & Furlong, E. E. M. 2012. Transcription Factors: From Enhancer Binding to Developmental Control. Nature Reviews Genetics, 13, 613.

Sriramachandran, A. M. & Dohmen, R. J. 2014. SUMO-Targeted Ubiquitin Ligases. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research, 1843, 75-85.

Stielow, B., Sapetschnig, A., Kruger, I., Kunert, N., Brehm, A., Boutros, M. & Suske, G. 2008a. Identification of SUMO-Dependent Chromatin-Associated Transcriptional Repression Components by a Genome-Wide Rnai Screen. Mol Cell, 29, 742-754.

Stielow, B., Sapetschnig, A., Wink, C., Krüger, I. & Suske, G. 2008b. SUMO-Modified Sp3 Represses Transcription by Provoking Local Heterochromatic Gene Silencing. EMBO reports, 9, 899-906.

Strahl, B. D. & Allis, C. D. 2000. The Language of Covalent Histone Modifications. Nature, 403, 41-45.

Sun, Z.-W. & Allis, C. D. 2002. Ubiquitination of Histone H2b Regulates H3 Methylation and Gene Silencing in Yeast. Nature, 418, 104-108.

Swatek, K. N. & Komander, D. 2016. Ubiquitin Modifications. Cell Research, 26, 399.

Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K. P., Kuhn, M., Bork, P., Jensen, L. J. & Von mering,

157

C. 2015. String V10: Protein–Protein Interaction Networks, Integrated over the Tree of Life. Nucleic Acids Research, 43, D447-D452.

Tait, S. W. G., De Vries, E., Maas, C., Keller, A. M., D'santos, C. S. & Borst, J. 2007. Apoptosis Induction by Bid Requires Unconventional Ubiquitination and Degradation of Its N- Terminal Fragment. The Journal of Cell Biology, 179, 1453-1466.

Tammsalu, T., Matic, I., Jaffray, E. G., Ibrahim, A. F., Tatham, M. H. & Hay, R. T. 2014. Proteome-Wide Identification of Sumo2 Modification Sites. Science Signaling, 7, rs2.

Tang, L., Nogales, E. & Ciferri, C. 2010. Structure and Function of Swi/Snf Chromatin Remodeling Complexes and Mechanistic Implications for Transcription. Progress in biophysics and molecular biology, 102, 122-128.

Tatham, M. H., Geoffroy, M. C., Shen, L., Plechanovova, A., Hattersley, N. & Jaffray, E. G. 2008. Rnf4 Is a Poly-SUMO-Specific E3 Ubiquitin Ligase Required for Arsenic-Induced Pml Degradation. Nature Cell Biology, 10, 538-546.

Tatham, M. H., Jaffray, E., Vaughan, O. A., Desterro, J. M. P., Botting, C. H., Naismith, J. H. & Hay, R. T. 2001. Polymeric Chains of SUMO-2 and SUMO-3 Are Conjugated to Protein Substrates by Sae1/Sae2 and Ubc9. Journal of Biological Chemistry, 276, 35368-35374.

Thrower, J. S., Hoffman, L., Rechsteiner, M. & Pickart, C. M. 2000. Recognition of the Polyubiquitin Proteolytic Signal. The EMBO journal, 19, 94-102.

Tripathi, S., Pohl, M. O., Zhou, Y., Rodriguez-Frandsen, A., Wang, G., Stein, D. A., Moulton, H. M., Dejesus, P., Che, J., Mulder, L. C. F., Yángüez, E., Andenmatten, D., Pache, L., Manicassamy, B., Albrecht, R. A., Gonzalez, M. G., Nguyen, Q., Brass, A., Elledge, S., White, M., Shapira, S., Hacohen, N., Karlas, A., Meyer, T. F., Shales, M., Gatorano, A., Johnson, J. R., Jang, G., Johnson, T., Verschueren, E., Sanders, D., Krogan, N., Shaw, M., König, R., Stertz, S., García-Sastre, A. & Chanda, S. K. 2015a. Meta- and Orthogonal Integration of Influenza "Omics" Data Defines a Role for Ubr4 in Virus Budding. Cell Host & Microbe, 18, 723-735.

Tripathi, S., Pohl, Marie o., Zhou, Y., Rodriguez-Frandsen, A., Wang, G., Stein, David a., Moulton, Hong m., Dejesus, P., Che, J., Mulder, Lubbertus c. F., Yángüez, E., Andenmatten, D., Pache, L., Manicassamy, B., Albrecht, Randy a., Gonzalez, Maria g., Nguyen, Q., Brass, A., Elledge, S., White, M., Shapira, S., Hacohen, N., Karlas, A., Meyer, Thomas f., Shales, M., Gatorano, A., Johnson, Jeffrey r., Jang, G., Johnson, T., Verschueren, E., Sanders, D., Krogan, N., Shaw, M., König, R., Stertz, S., García-Sastre, A. & Chanda, Sumit k. 2015b. Meta- and Orthogonal Integration of Influenza “Omics” Data Defines a Role for Ubr4 in Virus Budding. Cell Host & Microbe, 18, 723-735.

Turner, B. M. 2000. Histone Acetylation and an Epigenetic Code. BioEssays, 22, 836-845.

Ulrich, H. D. 2008. The Fast-Growing Business of SUMO Chains. Molecular Cell, 32, 301- 305.

158

Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B. C., Remm, M. & Rozen, S. G. 2012. Primer3--New Capabilities and Interfaces. Nucleic Acids Research, 40, e115-e115.

Vadlamudi, R. K. & Kumar, R. 2007. Functional and Biological Properties of the Nuclear Receptor Coregulator PELP1/Mnar. Nuclear Receptor Signaling, 5, e004.

Vadlamudi, R. K., Manavathi, B., Balasenthil, S., Nair, S. S., Yang, Z., Sahin, A. A. & Kumar, R. 2005. Functional Implications of Altered Subcellular Localization of PELP1 in Breast Cancer Cells. Cancer Research, 65, 7724-7732.

Vadlamudi, R. K., Wang, R.-A., Mazumdar, A., Kim, Y.-S., Shin, J., Sahin, A. & Kumar, R. 2001. Molecular Cloning and Characterization of PELP1, a Novel Human Co-Regulator of Estrogen Receptor Alpha. Journal of Biological Chemistry, 276, 38272-38279.

Van Dijk, T. B., Gillemans, N., Stein, C., Fanis, P., Demmers, J., Van De Corput, M., Essers, J., Grosveld, F., Bauer, U.-M. & Philipsen, S. 2010. Friend of Prmt1, a Novel Chromatin Target of Protein Arginine Methyltransferases. Molecular and Cellular Biology, 30, 260-272.

Venkatesh, S. & Workman, J. L. 2015. Histone Exchange, Chromatin Structure and the Regulation of Transcription. Nature reviews Molecular cell biology, 16, 178-189.

Vertegaal, Alfred c. O. 2010. SUMO Chains: Polymeric Signals. Biochemical Society Transactions, 38, 46-49.

Wang, J., Chen, L., Wen, S., Zhu, H., Yu, W., Moskowitz, I. P., Shaw, G. M., Finnell, R. H. & Schwartz, R. J. 2011. Defective Sumoylation Pathway Directs Congenital Heart Disease. Birth Defects Research Part A: Clinical and Molecular Teratology, 91, 468-476.

Wang, L., Wansleeben, C., Zhao, S., Miao, P., Paschen, W. & Yang, W. 2014. Sumo2 Is Essential While Sumo3 Is Dispensable for Mouse Embryonic Development. EMBO reports, 15, 878-885.

Wang, X., Herr, R. A., Chua, W.-J., Lybarger, L., Wiertz, E. J. H. J. & Hansen, T. H. 2007. Ubiquitination of Serine, Threonine, or Lysine Residues on the Cytoplasmic Tail Can Induce Erad of Mhc-I by Viral E3 Ligase Mk3. The Journal of Cell Biology, 177, 613-624.

Wilkinson, K. A. & Henley, J. M. 2010. Mechanisms, Regulation and Consequences of Protein Sumoylation. Biochemical Journal, 428, 133-145.

Witty, J., Aguilar-Martinez, E. & Sharrocks, A. D. 2010. Senp1 Participates in the Dynamic Regulation of Elk-1 Sumoylation. Biochemical Journal, 428, 247-254.

Wu M, G. L. 2019. Tcseq: Time Course Sequencing Data Analysis. R package version 1.6.1.

159

Xiang, X., Deng, L., Xiong, R., Xiao, D., Chen, Z., Yang, F., Liu, K. & Feng, G. 2018. Tex10 Is Upregulated and Promotes Cancer Stem Cell Properties and Chemoresistance in Hepatocellular Carcinoma. Cell Cycle, 17, 1310-1318.

Xiao, Z., Chang, J. G., Hendriks, I. A., Sigurethsson, J. O., Olsen, J. V. & Vertegaal, A. C. 2015. System-Wide Analysis of Sumoylation Dynamics in Response to Replication Stress Reveals Novel Small Ubiquitin-Like Modified Target Proteins and Acceptor Lysines Relevant for Genome Stability. Molecular & Cellular Proteomics, 14, 1419-1434.

Yamada, A., Takaki, S., Hayashi, F., Georgopoulos, K., Perlmutter, R. M. & Takatsu, K. 2001. Identification and Characterization of a Transcriptional Regulator for the Lck Proximal Promoter. Journal of Biological Chemistry, 276, 18082-18089.

Yan, S., Sun, X., Xiang, B., Cang, H., Kang, X., Chen, Y., Li, H., Shi, G., Yeh, E. T., Wang, B., Wang, X. & Yi, J. 2010. Redox Regulation of the Stability of the SUMO Protease Senp3 Via Interactions with Chip and Hsp90. The EMBO journal, 29, 3773-3786.

Yan, S. & Willis, J. 2013. Wd40-Repeat Protein Wdr18 Collaborates with Topbp1 to Facilitate DNA Damage Checkpoint Signaling. Biochemical and Biophysical Research Communications, 431, 466-471.

Yang, S. H., Galanis, A., Witty, J. & Sharrocks, A. D. 2006. An Extended Consensus Motif Enhances the Specificity of Substrate Modification by SUMO. The EMBO journal, 25, 5083-5093.

Zhang, L. J., Vogel, W. K., Liu, X., Topark-Ngarm, A., Arbogast, B. L., Maier, C. S., Filtz, T. M. & Leid, M. 2012. Coordinated Regulation of Transcription Factor Bcl11b Activity in Thymocytes by the Mitogen-Activated Protein Kinase (Mapk) Pathways and Protein Sumoylation. The Journal of Biological Chemistry, 287, 26971-26988.

Zhang, X.-D., Goeres, J., Zhang, H., Yen, T. J., Porter, A. C. G. & Matunis, M. J. 2008a. SUMO-2/3 Modification and Binding Regulate the Association of Cenp-E with Kinetochores and Progression through Mitosis. Molecular Cell, 29, 729-741.

Zhang, X., Diab, I. H. & Zehner, Z. E. 2003. Zbp-89 Represses Vimentin Gene Transcription by Interacting with the Transcriptional Activator, Sp1. Nucleic Acids Research, 31, 2900- 2914.

Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., Nusbaum, C., Myers, R. M., Brown, M., Li, W. & Liu, X. S. 2008b. Model-Based Analysis of Chip-Seq (Macs). Genome Biology, 9, R137-R137.

Zhao, J. 2007. Sumoylation Regulates Diverse Biological Processes. Cellular and Molecular Life Sciences, 64, 3017-3033.

160

Zhao, Q., Xie, Y., Zheng, Y., Jiang, S., Liu, W., Mu, W., Liu, Z., Zhao, Y., Xue, Y. & Ren, J. 2014. Gps-SUMO: A Tool for the Prediction of Sumoylation Sites and SUMO-Interaction Motifs. Nucleic Acids Research, 42, W325-W330.

Zhao, W., Huang, Y., Zhang, J., Liu, M., Ji, H., Wang, C., Cao, N., Li, C., Xia, Y., Jiang, Q. & Qin, J. 2017. Polycomb Group Ring Finger Protein 3/5 Activate Transcription Via an Interaction with the Pluripotency Factor Tex10 in Embryonic Stem Cells. Journal of Biological Chemistry, 292, 21527-21537.

Zimmermann, L., Stephens, A., Nam, S.-Z., Rau, D., Kübler, J., Lozajic, M., Gabler, F., Söding, J., Lupas, A. N. & Alva, V. 2018. A Completely Reimplemented Mpi Bioinformatics Toolkit with a New Hhpred Server at Its Core. Journal of Molecular Biology, 430, 2237-2243.

161