Understanding the Architecture, Assembly, and Interactome of the Nucleosome Remodelling and Deacetylase (NuRD) Complex

Mehdi Sharifi Tabar

A thesis submitted in fulfilment of the requirements for the Degree of Doctor of Philosophy

School of Life and Environmental Sciences Faculty of Science University of Sydney Sydney, Australia

June 2019

i

DECLARATION

The work described in this Thesis was performed between August 2015 and December 2018 in the School of Life and Environmental Sciences at the University of Sydney. The experiments were carried out by the author, unless stated otherwise. This work has not been submitted, in part or in full, for a higher degree at any other institution.

Mehdi Sharifi Tabar

June 2019

ii

ABSTRACT

This thesis presents data to enhance our understanding of the composition, assembly, and structure of the nucleosome remodeling and deacetylase (NuRD) complex, a transcriptional coregulator that is strongly linked to cancer. Understanding the structure and function of the regulating complexes will lead to a better understanding of mechanisms of gene regulation. This should ultimately both allow insight into disease processes that originate from gene regulatory network malfunction and might enable reprogramming of the for application in health, agriculture and industry. The Thesis is broken into four sections: the first Chapter is a general introduction to gene regulation, the second chapter is materials and methods. Chapter 3 is a manuscript that is written about the stoichiometry and composition of the complex in cancer cells. Chapters 4 and 5 are results chapters investigate the structure of the subcomplexes. Chapter 6 discuss the finding of this thesis.

Chapter 1 represents a general introduction to gene regulation and the NuRD complex. The NuRD complex is a 1mDa multisubunit transcriptional coregulator complex, which is widely expressed and harbours both remodelling and histone deacetylase activities. Abnormalities in a number of its components are associated with various types of cancers and neurodegenerative diseases. Although the core components of the complex have been well established, the structural basis of the complex has poorly been studied. Chapter 2 describes the details of the materials and methods used in this research. Chapter 3 investigates NuRD complex stoichiometry and interactome using a label-free quantitative proteomics method and provides biochemical evidence for two new direct interactors of the complex. Chapter 4 and 5 investigate the recombinantly produced subcomplexes using biochemical and a mass spectrometry-based hybrid method for structural modelling of the NuRD complex. Chapter 6 discuss the findings and outlines the future directions.

iii

ACKNOWLEDGEMENT

Three and a half years after starting on my PhD adventure I have finally reached the end. This journey would not have been possible without the support of my supervisor Joel Mackay and lab mates.

First of all, I would like to sincerely thank Joel Mackay whose guidance has been invaluable in all aspects of the thesis production. His generosity with both time and knowledge is deeply appreciated. I am highly grateful for his patience and encouragement in the face of the many bewilderment of research. It was a real privilege and an honour for not only to undertake a PhD under his supervision but also of his extraordinary human qualities.

Enormous thanks are owed to Dr. Jason Low for teaching and demonstrating me mass spectrometry based proteomics and also being an inexhaustible source of useful discussions. I have no doubt that without Joel and Jason I would not be able to smoothly undertake my PhD.

Special thanks also goes to Dr. Mario Torrado Delray, for his abundant assistance with protein-protein interaction and pulldown experiments, and Dr. Ana Silva for providing advice and help at various stages of my PhD and in various forms.

I also would like to thank the entire Mackay & Matthews lab members and everybody in Structural Biology group and also the Mass Spectrometry core facility at CPC. The lab environment has been a fantastic place to work, with everyone always willing to help each other out.

Last, but definitely not least, thank you very much to my lovely and kind wife, Diba, for her tremendous help and for always putting a smile on my face and making this journey stress free. I also extend my gratitude to my wonderful family for their unconditional support throughout my PhD, and always.

iv

CONTENTS

Chapter1. Introduction ...... 1

1.1 Gene regulation ...... 1

1.2 Chromatin ...... 2

1.3 The histone code hypothesis and chromatin remodelling ...... 2

1.4 Histone acetylation and deacetylation ...... 3

1.5 Chromatin remodelling protein complexes ...... 5

1.6 Our structural understanding of chromatin remodellers ...... 5

1.7 The NuRD complex ...... 8

1.8 Recruitment of the NuRD to target sites ...... 9

1.9 Molecular function of the NuRD ...... 10

1.10 NuRD in pluripotency ...... 11

1.11 NuRD in disease ...... 12

1.12 NuRD subunits ...... 14

1.12.1 MTA1-3 ...... 15

1.12.2 HDAC1/2 ...... 16

1.12.3 CHD3-5 ...... 17

1.12.4 GATAD2A/B ...... 19

1.12.5 MBD2/3 ...... 20

1.12.6 RBBP4/7 ...... 20

1.12.7 CDK2AP1/DOC-1 ...... 22

1.13 Aims of this thesis ...... 22

Chapter 2: Materials and Methods ...... 24

2.1 Materials ...... 24

2.2 Methods ...... 26

v

2.2.1 Cloning ...... 26

2.2.1.1 Primer design ...... 26

2.2.1.2 RNA isolation & cDNA synthesis ...... 26

2.2.1.3 Polymerase chain reaction (PCR) ...... 26

2.2.1.4 Gibson assembly ...... 27

2.2.1.5 Colony PCR ...... 27

2.2.2 Cell culture ...... 28

2.2.2.1 Expi293 cells ...... 28

2.2.2.2 NTERA2, PC3 and MCF7 cell culture ...... 28

2.2.3 siRNA knock down experiment ...... 28

2.2.4 Quantitative real-time PCR (qRT-PCR, qPCR) ...... 29

2.2.5 Protein production and FLAG-affinity purification ...... 29

2.2.5.1 Protein production in mammalian EXPI293 cells ...... 30

2.2.5.2 Protein production and purification in cell free system (Rabbit Reticulocyte Lysate) ...... 30

2.2.5.3 Preparing sucrose gradients (tilting method) ...... 31

2.2.5.4 Protein concentration and buffer exchange ...... 31

2.2.6 Native NuRD complex pull down ...... 31

2.2.6.1 Preparation of bead-bound-GST-FOG ...... 31

2.2.6.2 NuRD extraction from nucleus ...... 32

2.2.6.3 NuRD pulldown experiment ...... 32

2.2.7 Protein analysis ...... 32

2.2.7.1 SDS-PAGE ...... 33

2.2.7.2 Western blot ...... 33

2.2.8 Proteomics studies ...... 33

vi

2.2.8.1 Sample preparation for LC-MS/MS ...... 33

2.2.8.1.1 Cross-linking the protein complexes ...... 34

2.2.8.1.2 Size exclusion chromatography for crosslinked peptide enrichment ...... 35

2.2.8.1.3 Peptide labeling with iTRAQ reagents ...... 35

2.2.8.1.4 Reversed Phase HPLC (rpHPLC) ...... 36

2.2.8.1.5 Acetylated peptide enrichment ...... 36

2.2.8.1.6 Sample preparation for absolute protein quantification ...... 36

2.2.9 LC-MS/MS ...... 37

2.2.9.1 Shotgun proteomics method ...... 38

2.2.9.2 SWATH mass spectrometry method ...... 38

2.2.10 Data Analysis ...... 38

2.2.10.1 Converting RAW files to Mascot Generic Format Files (MGF files) ...... 38

2.2.10.2 XL-MS/MS data analysis using pLink software ...... 38

2.2.10.3 Annotating tandem mass spectra of the crosslinked peptides ...... 39

2.2.10.4 Quantitative proteomics data analysis ...... 39

2.2.10.4.1 MaxQuant ...... 39

2.2.10.4.2 SWATH-MS data using Skyline software tool ...... 39

Chapter 3. Comparison of the composition and interaction partners of the Nucleosome Remodeling and Deacetylase (NuRD) complex in cancer cells ...... 41

3.1 Abstract ...... 41

3.2 Introduction ...... 42

3.3 Results ...... 43

3.3.1 NuRD subunit expression across several cell lines ...... 43

3.3.2 Affinity purification followed by sucrose gradient centrifugation allows isolation of specific species ...... 45

3.3.3 NuRD complex stoichiometry is conserved across NT2, PC3, and MCF7 cells . 47

vii

3.3.4 ZNF219 and SLC25A5 can interact with NuRD subunits ...... 49

3.3.5 ZNF219 contacts NuRD via its four N-terminal zinc fingers ...... 52

3.6. Discussion...... 55

3.4 Materials and methods ...... 56

3.4.1 Constructs...... 56

3.4.2 Cell culture ...... 56

3.4.3 Nuclear extract isolation and NuRD complex affinity purification ...... 57

3.4.4 Sucrose density gradient ultracentrifugation ...... 57

3.4.5 Sample preparation and nano-flow liquid chromatography-tandem mass spectrometry ...... 58

3.4.6 Analysis of MS/MS data ...... 59

3.4.7 Co-immunoprecipitation analysis in HEK293 cells and rabbit reticulocyte lysate ...... 60

Acknowledgments ...... 61

Chapter 4. Understanding the assembly and stoichiometry of MMH and MMHG complexes ...... 62

4.1 Introduction ...... 62

4.1.1 What are DIA-MS and SWATH-MS?...... 63

4.1.2 Why do we need to perform absolute protein quantification? ...... 64

4.1.3 What is XL-MS/MS and what can it tell us? ...... 65

4.2 Results ...... 67

4.2.1 Experimental design ...... 67

4.2.2 MMH and MMHG complexes were successfully expressed and purified from EXPI293F™ cells ...... 68

4.2.3 High salt washing buffer seems to be ideal to get a pure complex ...... 69

viii

4.2.4 Large scale overexpression and purification of MMH and MMHG complexes ... 70

4.2.5 Proteotypic peptide selection ...... 75

4.2.6 Stoichiometry analysis ...... 76

4.2.6.1 The MMH complex showed an unexpected 2:2:2 stoichiometry ...... 78

4.2.6.2 Co-immunopurification followed by western blotting confirmed two copies of MBD in MMH complex...... 80

4.2.6.3 GATAD2 subunit may prevent binding of a second MBD subunit in the MMHG complex...... 81

4.2.7 Crosslinking mass-spectrometry data analysis ...... 82

4.2.8 Modeling the MMH complex with PyMOL ...... 85

4.2.9 Modeling of the MMH complex using SPEM and XL-MS/MS data...... 87

4.3 Discussion...... 88

Chapter 5. The MTA1 subunit can recruit 2 copies of RBBP4 ...... 90

5.1 Introduction ...... 90

5.2 Results ...... 92

5.2.1 Multiple alignment of a portion of MTA1 protein across eukaryotes shows two possible binding sites...... 92

5.2.2 Biochemical analysis to confirm sequence alignment results ...... 93

5.2.3 Large scale expression and purification of MTA:RBBP subcomplexes ...... 94

5.2.4 Stoichiometry analysis ...... 97

5.2.5 Crosslinking mass-spectrometry data analysis ...... 99

5.2.6 Structural evaluation of the crosslinks in Pymol ...... 103

5.2.7 Modeling of the MR complex using SPEM data ...... 105

5.3 Discussion...... 107

Chapter 6. Discussion ...... 109

ix

6.1 High level expression of a subunit does not guarantee its integration into the NuRD complex...... 110

6.2 Composition and stoichiometry of the NuRD complex ...... 113

6.3 NuRD complexes with different paralogue compositions are likely to have distinct functions ...... 116

6.4 NuRD may contact its substrate through multiple interfaces ...... 117

6.5 Interacting partners of the complex ...... 118

6.5.1 ZNF219 may target NuRD to specific loci...... 119

6.5.2 NuRD subcomplexes can be found in the cytoplasm and/or cytosol ...... 120

6.5.3 RSRC2 as a possible NuRD partner ...... 120

6.6 The N-terminal part of the MTA may bring the DNA translocase moiety into the deacetylase core ...... 122

6.7 RBBP4/7 appears to have a dynamic interaction with the MTA protein ...... 123

6.8 The MR complex gives specificity to NuRD to bind histone H3 but not H4 ...... 124

6.9 Concluding remarks ...... 125

x

ABBREVIATIONS

D3 Ampicillin ATP Adenosine triphosphate BS3 Bis(sulfosuccinimidyl)suberate BSA Bovine serum albomin CAF1 Chromatin assembly factor-1 CDK2AP Cyclin Dependent Kinase 2 Associated Protein 1 CHD 3, 4, and 5 Chromodomain and helicase-domain containing 3, 4 and 5 CR1 and CR2 Conserved regions 1 and 2 D. melanogaster Drosophila melanogaster DIA-MS Data independent mass spectrometry DDA-MS Data dependent mass spectrometry DMEM Dulbecco's Modified Eagle's medium DNA Deoxyribonucleic acid dNTP Deoxyribonucleotide triphosphate DSS Disuccinimidyl suberate DTT Dithiothreitol E. coli Escherichia coli EDTA Ethylenediaminetetraacetic acid ELM2 Egl-27 and MTA1 homology2 FOG1 Friend of GATA protein 1 GATAD2A and B GATA Zinc Finger Domain Containing 2 A and B GSH Reduced glutathione GST Glutathione S-transferase H3 Histone 3 H4 Histone 4 HA Haemagglutinin (tag) HAT Histone acetyltransferase HDAC1 and 2 Histone deacetylase 1 and 2 HEK293 Human embryonic kidney INO80 Inositol-requiring 80 KD Knock down kDa kilo dalton KDAC Lysine deacetylase MBD2 and 3 Methylated-CpG binding domain containing protein MEL Mouse erythroid lukaemia

xi

MQW Milli Q water mRNA Massenger RNA MS Mass spectrometry MTA1, 2 and 3 Metastasis associated factor NMR Nuclear magnetic resonance NuRD Nucleosome remodelling and deacetylase OD Optical density PDB PEI Polyethyleneimine PHD Plant homodomain PI Isoelectric point PPM Parts per million PVDF Polyvinylidene difluoride Rb Retinoblast RBBP4 and 7 Retinoblastoma-associated proteins 4 and 7 RPM Revolutions per minute SANT Ada2, N-Cor, and TFIIIB SDS-PAGE Sodium dodecylsulphate polyacrylamide gel electrophoresis SEC Size exclusion chromatography SWATH-MS Sequential window acquisition of all theoretical fragment ion spectra WT Wild type XL-MS/MS Cross linking mass spectrometry Znf Zinc finger

xii

Publications

Submitted and published work are listed below

Journal article

1. Sharifi Tabar, M., Mackay J, Low JKK, The stoichiometry and interactome of the Nucleosome Remodeling and Deacetylase (NuRD) complex are conserved across multiple cell lines. (2019), FEBS J.

2. Silva APG, Low JKK, Sharifi Tabar M., Torrado M, Sarah RW, Brilliant L, Landsberg M, Mackay J, The Nucleosome Remodelling and Deacetylase (NURD) complex has an asymmetric, dynamic and modular architecture, in preparation.

3. Schmidberger, J. W.,* Sharifi Tabar, M.,* Torrado, M., Silva, A. P., Landsberg, M. J., Brillault, L., AlQarni, S., Zeng, Y. C., Parker, B. L., Low, J. K. & Mackay, J. P. The MTA1 subunit of the nucleosome remodeling and deacetylase complex can recruit two copies of RBBP4/7. (2016), Protein Sci. *equally contributed

4. Torrado, M., Low, J. K. K., Silva, A. P. G., Schmidberger, J. W., Sana, M., Sharifi Tabar, M., Isilak, M. E., Winning, C. S., Kwong, C., Bedward, M. J., Sperlazza, M. J., Williams, D. C., Jr., Shepherd, N. E. & Mackay, J. P. Refinement of the subunit interaction network within the nucleosome remodelling and deacetylase (NuRD) complex. (2017), FEBS J.

Poster Presentations 1. Sharifi Tabar, M, Low JKK, Schmidberger, Torrado M, Mackay J. Comparison of the Composition and Interaction Partners of the Nucleosome Remodeling and Deacetylase (NuRD) Complex Stoichiometry Across Cancer Cells. (2018), EMBL conference; Transcription and Chromatin. Heidelberg, Germany.

2. Sharifi Tabar, M, Low JKK, Torrado M, Silva A, Mackay J. The stoichiometry and interaction map of the MBD3-MTA2-HDAC1 (MMH) subcomplex of the NuRD complex. (2017), 41st Lorne conference; Protein structure and function. Lorne, Australia. 3. Sharifi Tabar, M, Low JKK, Torrado M, Schmidberger J, Mackay J. The stoichiometry of the RBBP4-MTA1 subcomplex of the NuRD complex. (2016), 42nd Lorne conference; Protein structure and function. Lorne, Australia.

Chapter 1: Introduction

Chapter1. Introduction

The complex molecular networks of mammalian cells are modulated in part through spatiotemporal regulation of gene expression. Gene regulation occurs at many different levels, including at the level of transcription. The NuRD complex is a multisubunit protein complex in higher organisms that directly regulates gene transcription through its dual enzyme activities: (i) nucleosome remodelling and (ii) protein deacetylation. Our understanding of how (and which) NuRD subunits assemble to form the complex and how this molecular machine functions remains limited. This thesis seeks to shed more light on the structure and function of the NuRD complex using biochemical, molecular & cellular biology and proteomics approaches.

1.1 Gene regulation

How does a gene, which is composed simply of a sequence of DNA, ‘know’ when it should expose itself to the transcription machinery and as a result be transcribed into mRNA? How does a gene know how to escape from that machinery? How does this transcribed mRNA undergo appropriate processing to produce a string of amino acids called a protein? How do cells in different tissues know which they must express and at what times? The answers to such questions lie in the study of mechanisms and factors involved in gene regulation.

The extent to which proteins are expressed depends on rates of gene transcription, the stability of mRNA, translation rates and protein stability [1-3]. In prokaryotes, transcription of several genes occurs under a unique promoter at the same time within blocks of genes named “operons”. Then the single transcribed mRNA can encode several proteins with different functions [4-6]. However, eukaryotic genomes are bigger and more complex than prokaryotic ones, thereby, gene regulation mechanisms in eukaryotes are more intricate [7-10]. Eukaryotic gene transcription is controlled by mechanisms that include the following: (i) transcription factors (TFs) and co-regulating proteins that bind to specific regulatory DNA sequences (e.g., promoters and enhancers) and consequently modulate the activity of RNA polymerase [11]; (ii) reversible covalent modifications of histone proteins and DNA sequences (e.g., histone lysine acetylation and DNA cytosine methylation), which together have been referred

1

Chapter 1: Introduction to as epigenetic modifications and which are used as a platform to recruit various regulators and chromatin remodelling enzymes (the latter are herein termed remodellers) [12, 13] and (iii) changes in chromatin composition arising from replacement of canonical histones with histone variants. The coordination between epigenetic modifications and transcriptional regulators can guide remodellers to alter the structure of the chromatin and activate or repress transcription of nearby genes [14-16].

1.2 Chromatin

In humans, each cell contains 6.4 billion base pairs (bp) of DNA sequence, which stretched end-to-end would exceed one meter in length [17, 18]. Not only must each human cell fit all this genetic information within a nucleus that is ~10 µm across, but the cell must also properly replicate and express that genetic information [18]. To achieve these goals, cells have developed a nucleoprotein structure, named chromatin, which greatly facilitates the storage of eukaryotic genome within cells. Chromatin fibers are made of a series of nucleosome modules which in turn are made of 147 bp of DNA wrapped around a histone octamer that comprises two copies of each core histone protein (i.e., H2A, H2B, H3 and H4) [18, 19]. In general, DNA is found in transcriptionally active (open) euchromatin or repressed (tightly packed) heterochromatin states. The conversion between euchromatin and heterochromatin is controlled by a range of mechanisms, including the remodelling of chromatin structure by remodellers using energy derived from ATP hydrolysis and through the effects of specific combinations of covalent modifications of both histones (e.g., acetylation, methylation, phosphorylation, and ubiquitylation) and DNA [20, 21].

1.3 The histone code hypothesis and chromatin remodelling

Histone proteins are highly conserved from yeast to human, a measure of their invaluable structural and functional roles in regulating genome structure and gene expression [22, 23]. Histones are highly positively charged proteins of 15-20 kDa in size that feature unstructured N-terminal tails that protrude out of the nucleosome. The positive charge of histones facilitates their interaction with negative charged DNA. The histone tails contain numerous residues, including basic lysines and arginines,

2

Chapter 1: Introduction that are subject to a variety of reversible posttranslational modifications (PTMs) by specific enzymes (Fig 1-1) [24]. Particular combinations of PTMs on a range of histones in a nucleosome appear to create a signal to the cell that the transcriptional status of a gene should be altered in a particular way, seemingly via the recruitment of enzymes and other proteins that recognize these PTMs. For example, the methylation of arginine and acetylation of lysine residues in the histone tails are both associated with gene activation. In contrast, lysine methylation and serine phosphorylation are associated with either activation or repression, depending on the location of these modifications [25]. Thus, trimethylation of Lys4 of histone H3 (referred to as H3K4me3) promotes gene repression, whereas the H3K9me3 modification is more prevalent at repressed genes [26]. It has also been shown that crosstalk between histone modifications is necessary to fine tune the transcriptional state of a particular . As one example, the methylation of H3K4 by the COMPASS protein complex and of H3K79 by the methyltransferase Dot1 are completely dependent upon the ubiquitylation of H2BK123 by scRad6/Bre1 [26].

Fig 1-1. Histone codes. Schematic representation of histone modifications on histone tails. The posttranslational modifications are shown with different colour codes.

1.4 Histone acetylation and deacetylation

Histone lysine acetylation first was reported in 1964 by Alfrey et al [27]. Since then it has been shown that this PTM is highly dynamic and regulated by the opposing action

3

Chapter 1: Introduction of histone acetyltransferases (HATs) [27, 28] and deacetylases (HDACs) [28]. These enzymes also act on many proteins other than histones, and so have been renamed lysine acetyltransferases (KATs) and lysine deacetylases (KDACs).

In humans, there are 18 proteins annotated as KDACs, which are classified into four distinct classes: class I, IIa, IIb, III, IV [28-30]. There are 11 enzymes (HDAC1-11) in classes I, II, and IV together; these are known as classical deacetylase enzymes as their activities are inhibited by the small-molecule inhibitor trichostatin A (TSA) and activity of the enzymes is dependent on metals such as zinc. Crystallography studies of mammalian HDACs have shown that the amino acid sequences are conserved in the active sites of class I and II and that structure of catalytic domains are very similar [31]. In class III, there are 7 deacetylase enzymes (Sir 1-7); these are not inhibited by TSA and are known as sirtuins.

On the other hand, the KATs are composed of three major protein families: GNAT, MYST, and p300 enzymes. Each of these families contains several members that share a very similar structure containing a core region [32-35].

Histone deacetylation contributes to maintaining a transcriptionally permissive euchromatic state. In part, this outcome might be achieved by the reduction in positive charge that results from acetylation. Primarily though, acetylated lysines are thought to serve as a recognition signals for protein domains known as bromodomains, which are found in ~40 human transcription coactivators and chromatin remodelling complexes [36].

For example, Ferreira et al. showed that acetylation of Lys14 of H3 (H3K14ac) can enhance binding affinity of a remodeller named Remodelling the Structure of Chromatin (RSC) to the nucleosomes, while acetylated H4 reduces the activity of Chd1 and Isw2 remodellers by reducing catalytic activity without affecting recruitment [37]. In an in vitro study, Chandy et al. demonstrated that the SWI/SNF complex can preferentially displace acetylated histones, demonstrating the necessity of histone acetylation in chromatin remodelling [38]. Moreover, Bdf1 protein, a bromodomain- containing subunit of the TFIID complex is able to recognize acetylated H4 and consequently regulate the transcription of target genes [39]. Most recently, it has been

4

Chapter 1: Introduction demonstrated that the activity of CoREST complex (a chromatin silencing complex) is limited in the presence of the H3K4ac mark, highlighting the functional importance of even a single lysine modification in gene regulation [40].

1.5 Chromatin remodelling protein complexes

There are numerous enzymes that have the ability to directly alter nucleosome structure and composition, hinting at the importance of this process for the correct expression of the genome [20]. In many cases, these remodellers are embedded in multi-subunit protein complexes, giving rise to remodelling complexes.

Chromatin remodelling complexes slide, assemble, remove and substitute histones on DNA using energy derived from ATP hydrolysis [15, 41]. This activity takes place at DNA elements such as enhancers and promoters, which control gene transcription. However, remodellers are indispensable regulators of almost every chromosomal process (e.g., DNA replication and repair), and remodeller deregulation leads to a variety of diseases, including cancer and neurodegenerative diseases [41].

Phylogenetic studies have defined four families of chromatin remodellers: CHD (chromodomain helicase DNA binding, Mi-2) [42], SWI/SNF (switching defective/sucrose non-fermenting) [43], ISWI (imitation switch) [44] and INO80 (inositol requiring 80) [45]. Interestingly, the same remodeller can promote both activation and repression of gene transcription under different circumstances. For instance, Brahma-associated factor (BAF) complex, the human homologue of SWI/SNF, is capable of switching between these two modes of action even at the same gene [46]. It is thus likely that the interaction of chromatin remodelling complexes with various transcription factors across tissues and cells allow them to take on context-dependent functions. The mechanisms responsible for these seemingly opposed activities are not yet well understood.

1.6 Our structural understanding of chromatin remodellers

Structural and biochemical studies on ATP-dependent chromatin remodellers reveal that they prefer different nucleosomes as substrates to carry out nucleosome assembly, editing, and organization, suggesting that different mechanisms of action

5

Chapter 1: Introduction are employed by various remodellers. For instance, the INO80 remodeller is well known for its histone exchange function. INO80 exchanges canonical H2A and H3 with H2A.Z and H3.3 variants, respectively [45, 47]. It has also been shown that Chd1 and ISWI enzymes are involved in nucleosome assembly and spacing. They are also involved in the maturation of pre-nucleosomes into octameric nucleosomes [48, 49]. The SWI/SNF subfamily regulates the accessibility of the chromatin to transcription factors and other regulating molecules by sliding or ejecting histone octamers from chromatin [50].

Despite the abovementioned diversity in mechanisms of action, remodellers share particular properties. As a common example, they show more affinity toward nucleosomes rather than naked DNA. All remodeller subfamilies contain a catalytic motor ATPase/translocase domain that encompasses two RecA-like lobes (lobe 1 and 2). Despite their commonality in the ATPase domain they have their own specific partner proteins and/or domains that regulate the ATPase activity. As one example, a published crystal structure of Chd1 suggests that the activity of the ATPase domain is blocked through its double chromodomain, which blocks the binding of the enzyme to DNA [51]. The chromodomains also allow for discrimination between nucleosome and naked DNA. In addition, the basal ATPase activity of the enzyme is inhibited by an N- terminal auxiliary domain, named dCD [51]. Similarly, the crystal structure of the ISWI enzyme reveals that the catalytic function of the enzyme is regulated by the AutoN domain. The AutoN domain binds lobe 1, leading to inactivation of the enzyme. Conversely, H4 histone can compete away the AutoN domain and activate the ATPase domain [52]. Unlike Chd1 and ISWI, the catalytic activity of Snf2 is not inhibited by the regulatory elements. Snf2 has evolved a distinct mechanism whereby the two lobes are stacked together and keep the necessary elements away from each other, inhibiting ATP hydrolysis [53].

Most recently published cryo-EM structures of the INO80 complex with a nucleosome illustrates that it makes multivalent contacts with DNA and the histones of one gyre of nucleosome [54] (Fig 1-2A). The translocase domain binds to nucleosomal DNA at superhelical location (SHL) -6 and disrupts the H2A-DNA contacts, unwrapping 15

6

Chapter 1: Introduction bp of DNA [54]. The structural data suggest a distinct ratchet mechanism of histone editing and sliding by INO80 complex.

Cryo-EM data for the Snf2 remodeller indicate that upon interaction with nucleosome the two lobes are realigned and their relative orientation is rotated by 80°, leading to activation of the translocase domain (Fig 1-2B ) [55]. Snf2 interacts with the first DNA gyre through its helicase motifs in the major domain and the second gyre via a positively charged surface. The structure of Snf2 bound to the nucleosome shows that Snf2 binds the nucleosome at SHL+2 and SHL+6. In addition to interaction of the translocase domain with nucleosomal DNA at SHL2, the basic patch of the H4 tail and acidic surface the Snf2 interact with each other. A “wave-ratchet-wave” model has been suggested for Snf2 remodeller.

Similar to Snf2, the Chd1 translocase domain can bind the DNA at SHL+2 and anchor to the H4 tail (Fig 1-2C) [56]. Chd1 is able to detach two turns of DNA from the histone octamer and bind between the two DNA gyres, facilitating the catalytic activity. The DNA binding domains (SANT and SLIDE) grab the detached DNA at SHL-7. The structural data suggest a translocation model in which ratcheting between lobe1 and lobe 2 of the ATPase underlies the remodeling process. The model indicates that in the presence of one ATP molecule binding of the ATPase to DNA in a partially closed conformation leads to complete closure of the ATPase and movement of lobe 2, which initiates DNA translocation by one . ATP hydrolysis resets the ATPase to the previous stage at the new DNA position [56].

Taken together, despite the available structures for the abovementioned remodellers, there is no structural data available for the NuRD complex, even at low resolution. Therefore, it seems crucial to investigate the structure and function of the NuRD complex.

7

Chapter 1: Introduction

Fig 1-2. Comparison of Cryo-EM structures of chromatin remodellers bound to a nucleosome. (A) Structure of Ino80 translocase bound to SHL -6. (B) Snf2 translocase bound to SHL + 6. (C) Chd1 translocase bound to SHL + 2. Different colours were used to show the structure; cyan (remodellers) and pink (histones).

1.7 The NuRD complex

The Nucleosome Remodelling and Deacetylase (NuRD) complex is perhaps the most enigmatic of the major remodellers present in mammals. NuRD is found throughout the animal kingdom and plays a critical role in gene silencing and activation during normal development, reprogramming of pluripotent stem cells, neuronal connectivity, and cancer progression [53, 57-60]. The NuRD complex was first purified two decades ago from both human and amphibian cells and was found to exhibit both histone deacetylase and ATP-dependent nucleosome remodelling activities [61, 62]. In addition, proteins with homology to individual NuRD components have been identified in organisms as diverse as nematode worms and plants, although it is currently unclear whether the full complex exists in plants.

In mammalian cells, the canonical NuRD complex appears to consist of six core subunits (Fig 1-3) and each of these exists as two or more closely related paralogues. Two of the subunits have enzymatic activities: (i) CHD3, CHD4, and CHD5 which utilize the energy of ATP hydrolysis to move histone octamers relative to DNA; and (ii) HDAC1 and HDAC2, which remove acetyl groups from lysine residues of histones and non- histone proteins. The non-enzymatic subunits are MBD2/MBD3, GATAD2A/GATAD2B,

8

Chapter 1: Introduction

MTA1/MTA2/MTA3, and RBBP4/RBBP7, and DOC-1 or CDK2AP1. The HDAC and RBBP subunits are also found in another transcriptional regulatory complexes, including the SIN3 [63] and COREST complexes [64, 65], whereas MTA, MBD and GATAD2 proteins are largely confined to the NuRD complex.

Fig 1-3. NuRD complex subunits. The NuRD complex is composed of MTA1-3 (metastasis associated proteins), HDAC1/2 (histone deacetylases), RBBP4/7 (retinoblastoma-associated proteins), MBD2/3 (methyl-CpG-binding domain proteins), and GATAD2A/B, and CHD3-5 (chromodomain-helicase-DNA-binding proteins). All components have been repeatedly identified as part of the NuRD complex. Question marks on the figure indicate the interaction, assembly and stoichiometry of the subunits have not been well understood. A homodimeric SANT-ELM2 domain make an interaction with the HDAC protein, leading to the formation of a 2 x MTA-HDAC heterodimer. Some data suggest that BAH domain is the contact surface of the MBD protein, but yet to be characterized. The C-terminal half of MTA interacts with RBBP4 though a single α-helix. MBD and GATAD2 interact with each other through an antiparallel heterodimeric helices.

1.8 Recruitment of the NuRD to target sites

9

Chapter 1: Introduction

The canonical subunits of the NuRD complex have no sequence specific DNA-binding domains but have domains and/or proteins for interaction with sequence specific transcription factors or chromatin chaperons and proteins. These properties enable selective action of the complex on nucleosomes at specific locations. Site specific transcription factors guide the complex to specific regions of the genome during development and under different physiological conditions. For example, the interaction of ZMYND8 and the NuRD complex can recruit the NuRD complex to sites of DNA damage during homologous recombination [57]. In addition, it has been shown that the Zinc-finger protein of the cerebellum 2 (Zic2) recruits NuRD to transcriptional enhancers in mouse embryonic stem cells (mESCs) and regulates the chromatin state and transcriptional output of genes responsible for differentiation [66, 67]. WDR5 and UpSET have been shown to recruit NuRD to promoter regions in ESCs [68, 69]. Various transcription factors involved in a range of signaling pathways such as friend of GATA protein 1 or FOG-1, Snail, Ikaros, and TIF1β have been shown to recruit NuRD to regulate a subset of genes in different signaling pathways [70-72]. Similarly, the association of chromatin-interacting proteins with NuRD mediates NuRD recruitment to regions with varying epigenetic modifications. For instance, BEND3 recruits NuRD to H3K9me3 enriched heterochromatin [73]. Most recently, a H2A.Z-specific chromatin binding protein named PWWP2A/B has been shown to make a variant NuRD complex by competing away the MBD2/3, GATAD2/A, and CHD3/4 proteins. The PWWP2A/B- NuRD module is most enriched at actively transcribed genes [74]. In summary, molecular machines and remodellers such as NuRD need to interact with various proteins to exert their function spatially and temporally genome-wide.

1.9 Molecular function of the NuRD

Due to the presence of HDAC1/2 in the NuRD complex, several studies published in 1998 reported that NuRD complex is a transcriptional repressor complex [61, 62, 75]. The notion that NuRD is primarily a co-repressor has persisted since that time. However, as noted above, ATP-dependent chromatin remodelling complexes have been demonstrated to regulate transcriptional activation, and NuRD is no exception. In 2004 [76] and 2007 [77], CHD4 was reported to be a transcriptional activator because it was shown that CHD4 occupies the regulatory region of CD4 gene and that

10

Chapter 1: Introduction its removal leads to a significant reduction in CD4 expression. However, whether CHD4 acted in the context of the entire NuRD complex remained unclear until 2010, when Miccio et al. for the first time using a combination of ChIP-Seq and knock-out studies showed that the NuRD components MTA2, RBBP4, and CHD4 were present at both active and repressed FOG-1 target genes in vivo [58]. They concluded that NuRD is widely used by transcription factors to fine-tune the transcription program in a given cellular context.

More recently, Bornelov et al. made use of an inducible transgenic cell line system in which MBD3 was fused to the ligand binding domain of the murine estrogen receptor (ER) both at the N- and C-terminal ends. This domain localizes MBD3 into the cytoplasm, and thus led to disruption of NuRD in the nucleus. Addition of tamoxifen into the cell culture medium translocates MBD3 into the nucleus and therefore restores NuRD function. Consistent with previous data [78, 79], they demonstrated that both Chd4 and Mbd3 occupy active promoters and enhancers. Despite the presence of NuRD at active sites, loss of the Mbd3/NuRD resulted in only a moderate change in gene expression. Therefore, they concluded that NuRD fine-tunes the gene transcription program in embryonic stem (ES) cells by either activating or silencing genes [80].

They further investigated NuRD function in chromatin structure and nucleosome positioning (the localization of nucleosomes with respect to the genomic DNA sequence) at two genes responsive to NuRD activity: Ppp2r2c and Bmp4. NuRD target genes showed positioned nucleosomes at enhancer regions following tamoxifen treatment, suggesting that changes in transcription levels of the target genes were due to the nucleosome remodelling activity of NuRD. They further investigated the binding pattern of pluripotency-associated transcription factors (e.g., Klf4 and Nanog) upon tamoxifen treatment and realized that NuRD does not globally decrease the binding of TFs but in some cases enhances their binding to target sites.

1.10 NuRD in pluripotency

Embryonic stem cells (ESCs) are both pluripotent (they can differentiate into any tissue type) and are capable of self-renewal [81, 82]. These abilities are controlled by

11

Chapter 1: Introduction complex mechanisms that involve epigenetic regulators, a network of transcription factors and a range of signaling pathways [82, 83]. ESCs constantly need to silence or activate particular sets of genes to either maintain their pluripotency or differentiate towards specific lineages. The NuRD complex has been repeatedly implicated as an important player in ESC biology [84, 85]. For example, Hendrich and colleagues showed that a NuRD complex containing MBD3 but not MBD2 silences pluripotency genes and thus allow ESCs to differentiate in a defined medium [86]. In apparent contrast, however, Zhu et al., have shown that NuRD is involved in maintaining the pluripotent state of mouse ESCs by repressing genes responsible for trophectoderm differentiation [87]. In 2013, Rais et al. conducted a loss of function study for several epigenetic repressor factors in mouse and human ESCs and noted that MBD3 inhibition in the presence of the Oct4, Sox2, Klf4 and Myc transcription factors (OSKM factors are reprogramming factors responsible for converting terminally differentiated cells back into pluripotent cells) stopped somatic cell proliferation and enabled near 100% efficient reversion of somatic cells into pluripotent cells, a feat that had not been achieved before [60]. In 2018, the same group demonstrated that GATAD2A but not GATAD2B depletion has an even more striking effect; it can enable deterministic reprogramming without affecting the proliferation of somatic cells [88]. These findings may give a clue to answer questions such as “Why is NuRD absent in less complex eukaryotes (e.g., yeast) but present in eukaryotes with a complex gene regulatory program?”. It appears that the NuRD complex need to be present to fine tune the transcriptional program of higher eukaryotes throughout embryogenesis and development.

1.11 NuRD in disease

As one might expect for a transcriptional regulatory complex, dysregulation of the activity of NuRD complex subunits has been associated with a number of human pathologies, such as different types of cancers and neurodegenerative diseases [89- 91]. Regarding cancer, there are lines of evidence implicating the complex both as a tumor co-suppressor and as a promoter of cancer [92-94]. NuRD is unique in that it carries both MTA1, a tumor activator, and CDK2AP1, a tumor suppressor protein. The MTA family proteins are the subunits most frequently associated with cancer. They

12

Chapter 1: Introduction are linked with poor prognosis and are widely overexpressed in tumors derived from different tissues including breast, pancreas, ovary, and prostate [94-97]. The MTA family proteins appear to modulate a range of cancer promoting processes, including invasion, epithelial-mesenchymal transition (EMT), DNA-damage, metastasis, and inflammation [97, 98]. For example, it has been shown that MTA1 and MTA2 promote breast cancer progression and metastasis but MTA3 has an opposing function because it inhibits the epithelial to mesenchymal transition (EMT) by repressing the expression of the Snail gene, a key transcription regulator of EMT pathway, and consequently restoring the expression of E-cadherin [99]. Expression of E-cadherin increases epithelial cell adhesion, which potentially leads to the suppression of metastasis.

The MTA proteins regulate the expression and activity of target genes through recruiting NuRD to target genomic sites. In addition to transcription repression, MTAs can directly interact with transcription factors and deacetylate transcription factors involved in cancer, and thus regulate their stability or localization post-translationally [100]. For example, deacetylation of p53, a tumor suppressor transcription factor, by MTA1- and MTA2-NuRD leads to apoptosis and cell death [101]. Interaction of MTA1/HDAC1 with ATF4 leads to deacetylation and increased stability and activity of ATF4 which promotes osteosarcoma progression [100]. In addition, MTA1/HDAC1 interacts with hypoxia inducible factor 1α transcription factor (HIF-1α), leading to an increase in its stability and an increase in tumor angiogenesis in a mouse model [102].

In addition to the studies implicating MTA subunits in cancer progression, exome sequencing of endometrial tumors has reported high frequencies of somatic mutation in CHD4 (17%) [103]. Likewise, a genetic study on the expression pattern of CHD family members in 28 gastric cancer and 35 colorectal cancer (CC) samples revealed that the CHD4 protein is depleted in >50% of both types of cancer [104]. However, the role of CHD4 in cancer is not clear-cut, and it may be involved in either tumor progression or suppression under different circumstances. For example, in 2015 Sperlazza and colleagues showed that acute myeloid leukemia (AML) cells lacking CHD4 failed to show tumorigenesis behavior because the AML cells were unable to form xenografts in mice [105].

13

Chapter 1: Introduction

The function of CHD family remodellers has been linked to transcription, which is the first step of the central dogma of molecular biology. It can therefore be concluded that dysregulation of these proteins can led to aberrations in gene expression and cause various diseases. For example, the Cancer Gene Census (CGC) consortium has catalogued CHD4 as an oncogene, meaning that CHD4 is mutated in cancerous tissues [106]. Several studies have investigated the role of CHD4 in genome integrity and DNA damage repair. For example, Burd et al. showed that the expression of CHD4 is increased upon irradiation with UV, and that the protein is subsequently localized to DNA damage sites [107]. It is likely that in this situation, it assists in DNA repair by making the surrounding DNA more accessible to the repair machinery, but there are no clear data that define its role.

1.12 NuRD subunits

As noted above, each NuRD subunit has several paralogues, leading potentially to the creation of variants of the complex in different cells and tissues or at different times. As shown in Fig 1-4, some of these paralogues share a very high sequence identity. For example, RBBP4/7 and HDAC1/2 are 89% and 85% identical. In contrast, some subunit paralogues share identities that are as low as 45%.

14

Chapter 1: Introduction

Fig 1-4. NuRD subunit homology. The graph represents the protein sequence identity between paralogues. BLAST was used to determine identity [108].

1.12.1 MTA1-3

The MTA family include three genes and multiple alternatively spliced variants. MTA1, MTA2 and MTA3 have ~60% sequence identity to each other and are 80 kDa, 75 kDa, and 65 kDa, respectively (Fig 1-5A). All three MTA paralogues display fundamentally the same domain structure [109], with the exception that MTA3 lacks one RBBP4 binding site. The N-terminal bromo-adjacent homology domain (BAH), Egl-27 and MTA1 homology2 (ELM2), Swi3, Ada2, N-Cor, and TFIIIB (SANT) and zinc finger (ZnF) domains of these proteins are very conserved, whereas the C-terminal regions are more divergent [110]. Phylogenetic analysis of the MTA family proteins indicates that the BAH, the ELM2, the SANT, and ZnF domains are also well-conserved across evolutionary space [108].

All of these domains have been implicated in chromatin recognition. The BAH domain is generally found in proteins that interact with nucleosomes. For example, a crystal structure of yeast Sir3 BAH domain bound to nucleosome is available (Fig 1-5B) [111]. Interestingly, most ELM2 domain-containing proteins also have a SANT domain, which suggests a functional and/or structural association between these two domains [112]. It has also been shown that the ELM2-SANT domains together interact with HDAC proteins [112, 113]. The structure of ELM-SANT has actually been determined in complex with HDAC1 (Fig 1-5C). The structure of the MTA-HDAC complex is a critical part aspect this thesis because we normalize the determined stoichiometry based on this structure (see Results) [111].

15

Chapter 1: Introduction

Fig 1-5. Structural domains of MTA family proteins. (A) BAH, ELM2, SANT, and ZNF domains located at the N-terminus are highly similar, whereas the C-terminal half of MTA proteins is divergent. (B) High resolution crystal structure of Sir3 BAH domain with two views. Different colours was used to show the structure: Orange (H2A), green (H4), light pink (H2B), blue (H3), and yellow (H2A) ((B) was adapted from [111]. (C) Crystal structure of the MTA1-HDAC1 complex. Subunit colours are as in Fig 1-3.

1.12.2 HDAC1/2

HDAC1 and HDAC2 are highly conserved (~85% sequence identity) proteins and belong to the family of class I histone deacetylases (Fig 1-6). It has been shown that HDACs cannot act independently but function as subunits of multiprotein complexes [113, 114]. HDAC1/2 are highly and widely expressed across fetal and adult tissues.

16

Chapter 1: Introduction

In addition to the NuRD complex, they are found in other multisubunit protein complexes, including Sin3 [63], CoREST [65], and NCoR [113, 115]. HDAC1/2 use a zinc-containing deacetylase catalytic domain remove acetyl groups (O=C-CH3) from a ε-N-acetyl lysine in histones, transcription factors and other proteins. They remove the acetyl mark with the aid of the cofactor inositol-tetraphosphate [Ins (1, 4, 5, 6) P4] [112].

There are lines of evidence indicating that HDAC1 and HDAC2 may have distinct functions [116]. For example, during oogenesis HDAC2 has been suggested to play a more important role than HDAC1, whereas HDAC1 may play a more critical role than HDAC2 in oocyte development at the preimplantation stage [117]. In addition, it has been shown that HDAC2 expression is dysregulated significantly compared to HDAC1 in schizophrenia patients [118].

Fig 1-6. A schematic of HDAC1 domain structure. HDAC1 and HDAC2 share a highly conserved N-terminal HDAC domain that is essential for homo- and hetero- dimerization. HDAC2 contains a predicted coiled-coil domain at the C terminus.

1.12.3 CHD3-5

In humans, the CHD family of ATP-dependent DNA translocase proteins comprises three subfamilies that display specific domain architectures in addition to the DNA- translocase and pair of chromodomains: (i) CHD1/2; these remodellers contain a DNA- binding domain at the C-terminal region that is capable of binding to AT-rich DNA motifs [51, 119] (ii) CHD3/4/5; these remodellers appear to lack the DNA-binding domain at the C-terminal part but harbour a pair of PHD-finger like domains. There is a discrepancy about the classification of CHD5 as it contains both the CHD fingers and a SANT-like domain in the C-terminus [120] (iii) CHD6-9; the signature of this family is a SANT-like domain in the C-terminus [120, 121].

The plant homeodomains (PHDs) are exclusively found in CHD3/4/5, which might indicate a specific mode of action to these remodellers compared to other remodellers

17

Chapter 1: Introduction

[120]. Structural studies have demonstrated that PHD1 and PHD2 bind methylated Lys9 in peptides corresponding to the N-terminal tail of histone H3 [122, 123]. It was concluded that CHD4 can coordinate the binding of the NuRD complex to two separate H3 N-terminal tails on the same nucleosome or alternatively on a dinucleosome. Therefore, PHD1 and PHD2 fingers might be critical for the remodelling activity of the NuRD complex. Our laboratory determined the structure of an N-terminal region of CHD4. The N-terminal domain is a four-helix bundle, which is structurally similar to the mobility group box (HMG-box), a domain that is well known as a DNA-binding module. Unlike other regions, the C-terminal part of the CHD4 protein has not been well studied yet. Mutational analysis on the C-terminal domain (CTD) of CHD4 shows that it is important for NuRD complex function but not for remodelling activity [124].

Structural studies using a combination of small-angle X-ray scattering (SAXS) and crosslinking (XL) mass spectrometry (MS) or XL-MS/MS indicate that CHD4 PHD and CHD domains make an extensive contact with each other [125]. This interaction appears to limit the activity of the ATPase domain and also interaction of the CHD4 with nucleosomes. Therefore, it appears that the tandem PHD-CHD region can play a regulatory function and that a rearrangement in the structure of CHD4 might be required to stimulate its enzymatic activity (Fig 1-7).

Genome-wide expression studies show that CHD3 and CHD4 are ubiquitously expressed across human tissues but at different levels; CHD4 is found at higher levels than CHD3 [126]. In contrast, CHD5 expression is restricted to the and cerebellum. Consistent with expression data, biochemical studies demonstrate that CHD4 is the dominant NuRD associated translocase, whereas CHD5 is only purified with NuRD isolated from brain tissue [127].

Fig 1-7. Structural domains of CHD4. CHD family proteins contain an HMG-box domain involved in DNA binding [128], two PHD domains, two chromo domains (CDs),

18

Chapter 1: Introduction and a SNF2-type DNA translocase domain. C-terminal domains (CTDs) of unknown structure or function also exist.

1.12.4 GATAD2A/B

GATA Zinc Finger Domain containing 2A and 2B (GATAD2A/B) share 52% sequence identity and are widely expressed in various tissues [87]. GATAD2A/B share two highly conserved coiled-coil regions, named CC1 and CC2, as well as a GATA-type zinc finger (GATA ZnF) [129] (Fig 1-8A, B). It has been shown that the CC1 region in GATAD2A/B is responsible for direct interaction with the MBD2/3 proteins [130, 131]. MBD2 and MBD3 have the same CC– and GATAD2A and B both do too.

Biochemical assays have demonstrated that the CC2 region and GATA ZnF mediate the binding of GATAD2 proteins to all histone tails but detailed structural analysis is needed to further validate these findings [130]. In addition, a single amino acid mutation (K149R) in GATAD2A disrupts the interaction with MBD2 and consequently losing its repressive function [130]. Similarly, the depletion of GATAD2 proteins interferes with the repression mediated by MBD2 protein. In 2011, Gnanapragasam et al. solved the coiled-coil structure formed between GATAD2A and MBD2 [132].

Interestingly, two recent studies have pointed out a paralogue specific function for GATAD2-NuRD. Spruijt et al. have shown that GATAD2A but not GATAD2B interacts with ZMYND8 via a short motif (PPPL in GATAD2A) [57], suggesting that the two paralogues define mutually exclusive NuRD subcomplexes. Furthermore, as noted above, Mor et al. have demonstrated that GATAD2A but not GATAD2B is a determining factor for the production of iPS cells from somatic cells [88].

19

Chapter 1: Introduction

Fig 1-8. Structural domains of GATAD2A/B. (A) GATAD2A/B contain two distinct coiled-coil domains to interact with MBD2/3 and GATA domain. The structure of the CC1 is available and is indicated in green, but the structure of CC2 has not been solved yet. (B) Solution structure of the p66α (green)–MBD2 (brown) coiled-coil complex. Subunit colours are as in Fig 1-3.

1.12.5 MBD2/3

MBD2/3 belongs to the methyl-CpG binding domain (MBD) protein family, which contains other MBD proteins as well (e.g., MBD1-4) [133]. MBD proteins have been reported to bind methylated DNA (Fig 1-9). MBD3 lacks the N-terminal part of MBD2, which contains glycine-arginine repeats. In a very recent finding Liu et al. showed that, in addition to mCG sites, MBD domain can also bind mCG, mCAT, mCAC as well as unmethylated CA or TG sites in dsDNA. Similar to GATAD2A and –B, the MBD2 and MBD3 proteins are also mutually exclusive in NuRD, defining different subpopulations of the complex [134]. MBD3 knock-out in mice is embryonic lethal, while in contrast MBD2 deficient mice are viable with only minor defects [59]. It thus appears that the MBD3-NuRD complex plays a more important role in embryogenesis than MBD2-NuRD. One of the epigenetic mechanisms that cells have developed to suppress transcription of genes is the addition of methylation marks on promoters and enhancer DNA [135, 136]. The NuRD complex has been proposed to read DNA with a 5-methycytosine modification, through the action of the methyl CpG-binding domains (MBDs) of MBD2/3 proteins [136].

Fig 1-9. Schematic overview of the MBD2/3 proteins. MBD2 and MBD3 both contain MBD and coiled-coil (CC) domains. MBD2 contains an additional N-terminal region of unknown function.

1.12.6 RBBP4/7

RBBP4/7 are highly similar (with ~92% identity) in amino acid sequence and belong to a structural class of proteins known as the WD40 repeat proteins (Fig 1-10). They

20

Chapter 1: Introduction were characterized for the first time through their interaction with the [137]. Structural studies have shown that RBBP4 make a strong contact through its negatively charged binding pocket with the 15 N-terminal amino acids of the FOG-1 transcription factor [138]. Interaction of the RBBP4 and FOG-1 is of importance because we make use of this interaction to purify endogenous NuRD complexes. In addition, the first crystal structure showing the interaction of MTA1 with RBBP4 was published in 2014 by Alqarni et al [139]. Negative staining electron microscopy (EM) structures reveal an extensive interface between the MTA1 and RBBP4 proteins and that the C-terminal half of MTA1 contains two RBBP4 binding sites (Fig 1-10). RBBPs are found in several chromatin-modifying complexes, including NuRD and Sin3 [63], NURF (nucleosome remodelling factor) [140] and PRC2 [141]. Although they are highly similar, it is possible that they might carry out at least partially distinct functions because they are, in some cases, found in distinct complexes. For example, the CAF-1 complex contains only RBBP4 but not RBBP7 [142], whereas the HAT1 complex contains only RBBP7 [143].

Fig 1-10. RBBP4/7 domain topology and structure. (A) Domain structure of RBBP4. (B) Both RBBP4 and RBBP7 form seven-bladed beta-propellers. The structure of RBBP4 (blue) bound to the R2 region of MTA1 (magenta) is shown. Subunit colours are as in Fig 1-3.

21

Chapter 1: Introduction

1.12.7 CDK2AP1/DOC-1

In 1995, Todd et al. discovered DOC-1, a small 115-residue protein in oral cancer cells [144]. The structure of DOC1 resolved by NMR shows a homodimeric structure that includes an intrinsically disordered 60-residue N-terminal region and a four-helix bundle dimeric structure with reduced Cys105 in the C-terminal part [145]. DOC-1 was characterized as a tumor suppressor protein which is downregulated or absent in majority of cancers. For example, DOC-1 is highly downregulated in oral and colon cancers. In 2006, Le Guezennec and colleagues could demonstrate that DOC-1 is a bona fide member of MBD3-NuRD and MBD2-NuRD complexes [134]. Recently, Mohd- Sarip and colleagues showed that DOC-1-NuRD reverses the epithelial to mesenchymal transition (EMT) in human squamous cell carcinomas by initiating epigenetic reprogramming and transcriptional silencing [146].

1.13 Aims of this thesis

Since the discovery of NuRD in 1998, the biological function of the complex has been investigated extensively. Structural and biochemical data have also been reported for individual subunit domains or several subcomplexes. However, the overall architecture and the biochemical details underlying the function of the complex has not been addressed, which is largely due to difficulties in purification of a homogenous complex, as well as the fact that the complex is not found in yeast (the protein source for almost all structural studies of chromatin remodelling complexes). The goals of this thesis are to begin to address the deficits in our knowledge of NuRD complex structure and interactions.

Chapter 2 describes materials and methods used in this thesis. Chapter 3 aims to unveil the stoichiometry of the intact complex because previous studies reported different stoichiometries for the complex, and therefore the stoichiometry of the complex is open for discussion. Another aim of chapter 3 is to unveil the stoichiometry of the complex across multiple cell lines. We also aim to search for cell type specific or new interacting partners of the NuRD complex. Chapter 4 and 5 aim to draw a better picture of the structure and assembly of core subcomplexes of the NuRD

22

Chapter 1: Introduction complex by probing subunit interactions, stoichiometry and topology using chemical crosslinking combined with mass spectrometry (XL-MS/MS), DIA-MS, and single particle electron microscopy (SPEM). Chapter 6 contains a final discussion that brings the experimental data into context.

23

Chapter 2: Materials and Methods

Chapter 2: Materials and Methods

2.1 Materials

This section covers materials used for Chapters 3, 4 and 5. All buffers and solutions were made up in Milli-Q water (MQW) unless otherwise stated. The below tables (Tables 1 and 2) contain list of chemicals, reagents and consumables used, and their suppliers.

Table 2-1. Chemicals, reagents and consumables Reagents Suppliers 2,2-dimethyl-2-silapentane-5-sulfonic acid (DSS) Fluka Piperazineethanesulfonic acid (HEPES) Gymea 2-mercaptoethanol Sigma-Aldrich 2-log DNA standard NEB Phenylmethanesulfonyl fluoride (PMSF) Sigma Anti-FLAG® M2 Antibody Cell Signalling HA-Tag (C29F4) Rabbit mAb Cell Signalling IGEPAL Sigma Agar Amyl media Amylose resin NEB Bovine serum albumin (BSA, monomeric) Sigma Bovine serum albumin, multimeric (BSA multimeric) Sigma 3× FLAG peptide (MDYKDHDGDYKDHDIDYKDDDDK) Sigma Complete® EDTA-free protease inhibitor cocktail tablet Roche Deoxyribonucleotide triphosphates (dNTP for PCRs) Bioline Dithiothreitol (DTT) GOLD Biotechnology Deoxyribonuclease I (DNaseI) Roche Glutathione resin GE Healthcare RevertAid First Strand cDNA synthesis kit Thermo Fisher Scientific Gibco Dulbecco’s Modified Eagle Medium Sigma Lysozyme Sigma Mark 12 Protein standard Life Technologies Syringe driven filter (0.22 µM) Millipore Thermosensitive shrimp alkaline phosphatase Promega

24

Chapter 2: Materials and Methods

Tris(2-carboxyethyl)phosphine hydrochloride (TCEP-HCl) Sigma Glutathione Sepharose® 4B beads Sigma Anti-FLAG M2 Affinity Resin Sigma PowerUp™ SYBR™ Green Master Mix Thermo Fisher Scientific SMARTpool: ON-TARGETplus MTA1 siRNA Dharmacon SMARTpool: ON-TARGETplus MTA2 siRNA Dharmacon SMARTpool: ON-TARGETplus MTA3 siRNA Dharmacon Glutathione Sepharose 4B GST-tagged protein purification resin GE Health Care

Table 2-2. Buffers and solutions Buffers Constituents SEC mobile phase 30% acetonitrile (300 mL of 100% ACN per liter), 0.1% trifluoroacetic acid (1 mL of 100% TFA per liter) 150 mg/mL 3× FLAG peptide, 10 mM HEPES-KOH, 150 3× FLAG Peptide Elution Buffer mM NaCl 5% Sucrose Buffer 50 mM HEPES-KOH, 150 mM NaCl, 5% v/v sucrose 35% Sucrose Buffer 50 Mm HEPES-KOH, 150 mM NaCl, 35% v/v sucrose 50 mM HEPES-KOH, pH 7.4 150 mM NaCl, 1% Triton X- Binding Buffer 100 Expi293™ Expression Media Purchased from Thermo Fisher Scientific (Scoresby, VIC) Bead Recovery Buffer 0.1 M glycine HCl PH 3.5 20 mM HEPES-Na, pH 7.6, 300 mM NaCl, 10% v/v GSH Bead Wash Buffer glycerol 1 mM DTT, 0.5 mM PMSF 50 mM HEPES-KOH pH 8.0, 500 mM NaCl, 1% v/v Triton High Salt Binding Buffer X-100, 1 mM DTT High Salt IP Wash Buffer 50 mM Tris pH 7.9, 500 mM NaCl 0.5% v/v IGEPAL IP Wash Buffer 50 mM Tris pH 7.9, 150 mM NaCl, 0.5% v/v IGEPAL Tris Buffer 20 mM Tris-HCl pH 7.9, Luria-Bertani broth (LB) Media 1% w/v peptone 0.5% w/v, yeast extract 0.5% w/v NaCl 50 mM Tris-HCl pH 7.9, 150 mM NaCl, 1% v/v Triton X- Lysis Buffer 100, 1 mM PMSF, 0.2 mM DTT PBST 100 ml PBS 10×, 1 ml Tween 20 pH 7.4 PBS (1×) 100 ml PBS 10× pH 7.4

25

Chapter 2: Materials and Methods

50 mM HEPES-KOH, 500 mM NaCl, 1% Triton X-100, pH PreScission Cleavage Buffer 8.0, 1 mM DTT, added right before use

2.2 Methods

2.2.1 Cloning

2.2.1.1 Primer design

Primers were designed using online Oligo Analyzer 3.1 tool from Integrated DNA Technology. Regions complementary to the template had melting temperatures of ~60 ºC. Universal sequences encoding FLAG and HA tags were also added to the sequences. Forward primers for both HA and FLAG were the same and had a stop codon (TGATCAGAATTCTGCAGATATCCATCAC). The reverse primers for HA and FLAG were specific as follows: HA reverse primer: CCT ATT ATG TGT TCT AGT GGA TCC TCT AGA AGC GTA ATC, FLAG reverse primer: CCT ATT ATG TGT TCT AGT GGA TCC CTT GTC ATC GTC. Nucleotides shown in Italic are linker regions for Gibson assembly [147], while the underlined sequences are HA and FLAG specific sequences, respectively.

2.2.1.2 RNA isolation & cDNA synthesis

RNA was extracted from 1 million fresh or frozen cells using PureLink RNA Mini Kit (Thermo Fisher Scientific). All materials and reagents used for RNA extraction were RNAse free and sterile. In brief, cells were lysed in the lysis buffer with 1% (v/v) 2- mercaptoethanol (Sigma Aldrich). Then binding, washing and elution were performed using Spin Cartridges according to the manufacturer’s protocol. Two micrograms of the total RNA was treated with DNase I at 37 °C for 30 min, and then inactivated by incubation with 1 µL 25 mM EDTA at 65 °C for 10 min. RNA was then transcribed into complementary DNA (cDNA) using RevertAid First Strand cDNA Synthesis Kit (Thermo Fisher Scientific) using Olig dT primers and incubation at 25 °C for 5 min, 42 °C for

60 min, and 70 °C for 5 min. The cDNA was further diluted 6× with dH2O for use in real-time polymerase chain reactions (RT-PCR).

2.2.1.3 Polymerase chain reaction (PCR) 26

Chapter 2: Materials and Methods

PCR was conducted in 25 μL reactions containing 2 U Phusion DNA polymerase in Phusion buffer (20 mM Tris pH 8.8, 10 mM KCl, 10 mM (NH4)2SO4, 2 mM MgSO4, 0.1% (v/v) Triton X-100), 10% (v/v) DMSO, 0.4 μM each of forward and reverse primers (Appendix A) and ~15–50 ng plasmid template (pcDNA3.1) or 100 ng cDNA template. PCR programs consisting of 30 cycles of 35 s denaturation (98 °C), 30 s annealing (60 °C, depending on primer set) and 30 s/kb extension (72 °C) were run on an Eppendorf Mastercycler Nexus Thermal Cycler (Sigma Aldrich).

2.2.1.4 Gibson assembly

A vector backbone with overhangs matched to the PCR products was produced with PCR. The PCR products were DpnI digested and column purified (DpnI digests the methylated parental vector used as the template and increases the cloning efficiency). PCRs on cDNA with gene specific primers were not DpnI digested as there was no parental vector. For those PCR products that contained non-specific products, the band of the correct size was cut from an agarose gel, otherwise PCR products were column purified. The purified gene fragments and vector backbones were combined in a molar ratio of 3:1 and assembled in a one-step isothermal reaction at 50 °C for 4 h with buffer (25% (v/v) PEG-8000, 500 mM Tris-HCl pH 7.5, 50 mM MgCl2, 50 mM DTT, 4 mM dNTPs, 5 mM nicotinamide adenine dinucleotide) and Gibson master mix containing a cocktail of three enzymes: (i) T5 exonuclease, (ii) Phusion DNA polymerase, and (iii) Taq DNA ligase.

2.2.1.5 Colony PCR

Colony PCR is a convenient high-throughput method for rapid screening for the presence or absence of insert DNA in the plasmid of interest, following a cloning procedure. Following bacterial transformation, seven colonies were randomly selected for PCR screening to find correct clones for further analysis. Primers complementary to the pcDNA3.1 vector backbone on either side of the insert location were used for screening because it prevents false positives arising from amplification of free insert. Colonies were picked up with white crystal tips and dissolved in PCR reaction buffer by pipetting and vortexing. The PCR program was the same as mentioned above with

27

Chapter 2: Materials and Methods the exception that preheating at 98 °C was done for 5 min to ensure bacterial cell wall disruption.

2.2.2 Cell culture

2.2.2.1 Expi293 cells

Expi293 cells are HEK293 cells that have been adapted to grow to a higher density than normal HEK293 cells and give a higher expression yield when used for overexpression of a transiently transfected construct. Expi293 cells are grown in either autoclaved sterilized conical flasks or sterile 6-well plates, depending on the scale of the experiment. Maintenance cultures were grown in Expi293™ Expression Medium with the addition of 1:1000 (U/mL) penicillin and streptomycin. They were incubated at 37 °C with 5% CO2 on a shaking platform spinning at 130 RPM.

2.2.2.2 NTERA2, PC3 and MCF7 cell culture

NTERA2 cells were cultured in modified Eagle medium (DMEM)/F12 (Thermo Fisher Scientific), whereas PC3 and MCF7 cells were cultured and maintained in RPMI-1640 medium (Thermo Fisher Scientific). All media were supplemented with 200 μM L- glutamine, 50 U/mL penicillin, 50 μg/mL streptomycin, and supplemented with 10%

2 FBS. Cells were cultured in 75-cm flasks under conditions of 5% CO2 and 37 °C, and subcultured 2 times per week at a dilution of 1:5–1:10, depending on the cell line. For subculture, cells were detached from the plate floor by trypsinization. All cell culture work was performed under sterile conditions. All transfection mixes were prepared using the relevant medium without antibiotics and serum.

2.2.3 siRNA knock down experiment

For siRNA knock down, reagents and tips and tubes were RNase free and sterile. NTERA-2 cells were used for MTA1/2/3 knock-down because our western blotting analysis on the cell lysate using MTA1/2/3 specific antibodies showed that these genes express at a high level. For siRNA knock down, SMARTpool ON-TARGETplus siRNA reagents containing a pool of four siRNA duplexes specific for each MTA gene were purchased (Dharmacon Inc). The number of cells seeded was calculated such that the cells would be ~50% confluent on the transfection day. In brief, for a 6-well plate

28

Chapter 2: Materials and Methods transfection 214 µL of 5 µM siRNA was mixed with 1045 µL serum-free medium in one tube. In another tube 50 µL of dFECT1 reagent was mixed with 1210 µL serum-free medium in separate tubes and incubated for 5 min at room temperature (Table 2-4). Then tubes 1 and 2 were mixed and incubated for 20 min at room temperature. The mixed solutions were added into 8.2 mL complete medium (FBS+, P/S-) and mixed well. About 2 mL of the siRNA solution was added into each well in a 6-well plate. One day after transfection, the medium was changed to reduce the toxicity of the transfection reagents. Cells were collected 72 h post-transfection and washed 2 × with cold PBS and stored at -80 °C.

Table 2-4. Amount of siRNA and reagents used for knock down in a 6-well plate (Volumes are in µL) siRNA Tube 1 Tube 2 Mix of Complete siRNA (5 µM) Serum-Free dFECT Serum-Free Tube 1&2 medium (P/S-) medium medium MTA1 214 1045 50 1210 2520 8200 MTA2 214 1045 50 1210 2520 8200 MTA3 214 1045 50 1210 2520 8200 MTA1/2/3 214 of each 1045 50 1210 2945 7800 Mock - - 50 1210 1260 9440

2.2.4 Quantitative real-time PCR (qRT-PCR, qPCR)

For qPCR analysis, a 20 µL PCR reaction mixture [10 µL PowerUp™ SYBR™ Green Master Mix (Thermo Fisher Scientific) containing Dual-Lock™ DNA polymerase enzyme, dNTPs, 2 µL of 6× diluted cDNA, and 5 µM of each forward and reverse primers (Appendix B)] were made. The following PCR protocol was used for gene expression analysis: denaturation for 2 min at 95 °C, annealing for 30 s at 60 °C, and amplification at 72 °C for 20 s. This series was repeated for 40 cycles. For qRT–PCR data analysis, the ddCt method (2-(ΔΔCt)) was used and GAPDH was used as an internal control.

2.2.5 Protein production and FLAG-affinity purification

29

Chapter 2: Materials and Methods

2.2.5.1 Protein production in mammalian EXPI293 cells

For transfection, ExpiI293 cells at a density of ~1.5–2 × 106 cells/mL was used. The cells were cultured at 37 °C, 130 rpm and 5% CO2. For transfection of 1 mL of culture, plasmid DNA (2 µg) and PBS (104 µL) were first mixed. Then PEI (4 µg, Polyscience Inc) was added and the mixture was vortexed for 10 s. After incubation for 20 min, the mixture was added to the cells. The transfected cells were incubated for 72 h at 37 °C, 130 rpm and 5% CO2. At collection time, the cells were counted, transferred to falcon tubes and centrifuged at 300 g and 4 °C for 5 min. The pellet was washed with cold PBS twice and then flash frozen in liquid nitrogen for storage at –80 °C. For pull down experiments, 20 µL of anti-FLAG Sepharose 4B beads [Biotool, pre- equilibrated with 50 mM Tris-HCl, 150 mM NaCl, 0.1% (v/ v) Triton X-100, 1× complete EDTA-free protease inhibitor, 0.2 mM DTT, pH 7.5] was added to cleared HEK cell lysate. The mixtures were incubated overnight at 4 °C with orbital rotation. Post-incubation, the beads were washed with 1 mL wash buffer (50 mM Tris-HCl, 150 NaCl, 0.5% (v/v) IGEPAL CA630, 0.2 mM DTT, pH 7.5). Bound proteins were eluted three times for 1 h in each round at 4 °C using 20 µL elution buffer (10 mM HEPES, 150 mM NaCl, 300 µg 3× FLAG peptide, pH 7.5). Elution fractions were pooled for downstream analyses.

2.2.5.2 Protein production and purification in cell free system (Rabbit Reticulocyte Lysate)

Each rabbit reticulocyte lysate (RRL) was set up by adding 1.5 µg plasmid, 2 U RNAse inhibitor (Bioline), and 2 µL methionine to 80 µL RRL lysate (Promega) and incubating the mixture for 3 h at 30 °C. Twenty microliters of anti-FLAG Sepharose 4B beads [Biotool, pre-equilibrated with 50 mM Tris-HCl, 150 mM NaCl, 0.1% (v/v) Triton X- 100, 1× complete EDTA-free protease inhibitor, 1 mM DTT, pH 7.5] was added to the RRL reactions. Then, 400 µL of mixing buffer (200 mM NaCl, 70 mM Tris-HCl, pH 7.5) was added to RRL+beads mixture. The mixtures were incubated overnight at 4 °C with end-over-end rotation at 25 rpm. Post-incubation, the beads were washed 3 times with 800 µL wash buffer (50 mM Tris-HCl, 150 mM NaCl, 0.5% (v/v) IGEPAL CA630, and 1 mM DTT, pH 7.5). Bound proteins were eluted three times by 20 µL of elution buffer (50 mM Tris- HCl, 150 mM NaCl, 10% (v/v) glycerol, 1 mM DTT, 300 µg/mL 3×

30

Chapter 2: Materials and Methods

FLAG peptide, pH 7.5) for 30 min to 1 h on a shaking platform at 150 rpm in a cold room. Elution fractions were pooled for downstream analyses.

2.2.5.3 Preparing sucrose gradients (tilting method)

In two different falcon tubes, buffers (50 mM HEPES-KOH pH 8.2 and 150 mM NaCl) were made containing either 5% (w/v) or 35% (w/v) sucrose; both buffers were filtered. The 5% sucrose buffer was added to a 12 mL ultracentrifugation tube (Beckman Coulter) using a syringe with a Pasteur pipette attached to it. Then, using the same syringe, 6 mL of 35% sucrose buffer was added onto the bottom of the tube very gently with care not to introduce any bubbles, and the tube was sealed with parafilm. The tube was slowly tilted onto its side (exactly horizontal) for 4 h at room temperature, and then were left standing at 4 °C for at least 1 h prior to usage. The NuRD complex sample (200 µL) was then layered on top of the gradient and centrifuged (186,000 g, 4 °C, 18 h). The gradients were fractionated as 200-µL aliquots.

2.2.5.4 Protein concentration and buffer exchange

For protein concentration and buffer exchange, samples from pull down experiments were concentrated using Amicon Ultra Centrifugal cellulose filters (Millipore) with molecular weight cut-offs of 50 kDa or 100 kDa, depending on the size of the protein complex. Protein concentration was done according to the manufacturer’s protocol but at low speed to prevent precipitation (700–1000 g). For larger volumes, 4-mL falcon tubes were used.

2.2.6 Native NuRD complex pull down

2.2.6.1 Preparation of bead-bound-GST-FOG

An aliquot of 5 × 108 bacterial cells was used to purify GST-FOG1 (1-45) protein to be used as a handle/bait protein. Complete protease inhibitor cocktail (1×), lysozyme (100 µg/mL) and DNase I (10 µg/mL) were added prior to sonication (2 min at 25% output). The sonicated using Ultrasonic Homogenizer, lysate was cleared by centrifugation (16000 g, 20 min, and 4 °C). The soluble fraction was applied to 200 µL GSH beads (Sigma Aldrich) that had been pre-equilibrated with 5 column volumes

31

Chapter 2: Materials and Methods of binding buffer (50 mM HEPES-KOH pH 7.4, 150 mM NaCl, 1% Triton X-100). The beads and soluble fraction were incubated together for 2 h with end-to-end rotation. Beads were collected at 800 g at 4 °C and washed three times with five column volumes of wash buffer (20 mM HEPES-Na pH 7.6, 300 mM NaCl, 10% glycerol, 1 mM DTT, 0.5 mM PMSF).

2.2.6.2 NuRD extraction from nucleus

About 200 × 106 NTERA-2, PC3, and MCF7 cells were resuspended in lysis buffer (10 mM HEPES-KOH pH 7.9, 1.5 mM MgCl2, 10 mM KCl) and swelled on ice for 10 min before addition of 0.6% v/w IGEPAL and further swelling for 15 min with occasional gentle pipetting. After swelling, nuclei were pelleted by centrifugation at 3000 g for 5 min at 4 °C, then the cytoplasmic fraction (the supernatant) was collected into a new Eppendorf tube. The pelleted nuclei were resuspended in a further 10 mL lysis buffer with 0.6% IGEPAL and once again pelleted by centrifugation and the supernatant was removed as before to remove cytoplasmic residuals. The nuclei were then resuspended in 1 mL binding buffer (50 mM HEPES-KOH pH 7.4, 150 mM NaCl, 1% Triton X-100) and sonicated on ice at 30% output for 1 min. The disrupted nuclei were incubated on ice for 1 h before centrifugation at 20,000 g for 10 min at 4 °C to remove insoluble components (the chromatin fraction). The resulting supernatant was the nuclear fraction.

2.2.6.3 NuRD pulldown experiment

The cleared nuclear lysate was incubated with GST-FOG1 bait protein for 2 h to overnight at 4 °C with rotating to trap the NuRD complexes. After incubation, the beads were washed with 1 mL high-salt binding buffer and once with 1 mL binding buffer without Triton-X. NuRD complex was eluted with GSH elution buffer (50 mM HEPES-KOH pH 8, 150 mM NaCl, 1 mM DTT 50 mM, glutathione (0.015 g/mL). The resulting elutions were analyzed by SDS-PAGE to confirm the presence of NuRD components.

2.2.7 Protein analysis

32

Chapter 2: Materials and Methods

2.2.7.1 SDS-PAGE

SDS-PAGE analysis was used to check the purity and concentration of the purified proteins. Before running SDS-PAGE, all samples were mixed with 4× LDS (Life Technologies), heated at 90 °C for 5–10 min. The gels (Bolt® Bis-Tris Plus) and buffer (Bolt® MES SDS) were pre-made by Life Technologies. Generally, 10-15 percent of the eluted materials alongside with 50 ng BSA were loaded into the gel, and the gel was run at 165 V for 45 min. Gels were stained with 40 mL SYPRO Rubby overnight at room temperature in dark, and scanned at 550 V with 100 dpi resolution using a Typhoon 9400 scanner.

2.2.7.2 Western blot

After SDS-PAGE the protein was blotted from the gel onto a PVDF membrane (Millipore) via a Bolt™ Mini Blot Module (Life Technologies) according to the manufacturer’s instructions, using Transfer buffer (25 mM Bicine, 25 mM (Bis) Tris, 1 mM EDTA, 10% methanol, pH 7.2). The western blot was transferred at 20 V for 1 h. After the transfer, the membrane was washed in PBS-T (1× PBS (8 g/L NaCl, 0.2 g/L KCl, 1.42 g/L Na2HPO4, 0.24 g/L KH2PO4, pH 7.4, 0.1% Tween 20) at a rocking table for 5 min five times and then blocked in 5% skim milk in PBS-T on a rocking table for 1 h at room temperature. Next, the membrane was incubated at 4 °C at a rocking table overnight with primary antibodies (Appendix C) in 5% skim milk in PBS-T. Secondary antibody was added for 2 h to overnight at 4 °C. After antibody incubation(s) the membrane was washed in PBS-T at a rocking table for 40 min at room temperature. The membrane was then treated with 800 µL Western Lightning Chemiluminescence Reagent (Perkin Elmer) and scanned with a LI-COR C-DiGit Blot Scanner for chemiluminescent western blots.

2.2.8 Proteomics studies

2.2.8.1 Sample preparation for LC-MS/MS

After SDS-PAGE analysis and calculation of protein concentration using Image J, protein samples were digested in solution for MS analysis. All proteolytic digestions in this thesis was performed in solution by dissolving the dried protein samples in 8 M

33

Chapter 2: Materials and Methods urea. The dissolved, denatured proteins were then reduced (5 mM TCEP, 37 °C, 30 min) and alkylated (10 mM iodoacetamide, 20 min, room temperature in the dark). Before tryptic digestion, the samples were then diluted to 6 M urea with 200 mM NH4HCO3. Trypsin/LysC mix (Promega) was added to an enzyme:substrate ratio of 1:25 (w/w). The solution was incubated at 37 °C for 4 h. Following this digestion step, the sample was further diluted to 0.75 M urea using 50 mM Tris-HCl pH 8 and additional Trypsin (Promega) was added at an enzyme:substrate ratio of 1:50 (w/w). The sample was then incubated at 37 °C overnight (16 h). Following overnight digestion, the samples were acidified by addition of formic acid to a final concentration of 2% (v/v) and centrifuged at 16,000 g for 10 min. Peptide clean-up was done using homemade stage-tips. For concentrated samples such as crosslinked peptides, the supernatant were desalted using 50 mg Sep-Pak C18 cartridges (Waters), eluted in 50:50:0.1 acetonitrile:water:formic acid (v/v/v), and dried in a vacuum centrifuge.

2.2.8.1.1 Cross-linking the protein complexes

For each crosslinking experiment, about 40-50 µg of purified protein complex at a concentration of 0.15 mg/mL was used. Prior to crosslinking, the excess 3× FLAG peptide used during the elution step was removed using Zeba Spin desalting gel filtration columns (7K MWCO; Thermo Fisher Scientific), and the complex was exchanged into a buffer comprising 25 mM HEPES pH 7.5, and 75 mM NaCl. The purified samples were then prepared for XL-MS/MS essentially as described elsewhere [148] with some modifications. Briefly, for disuccinimidyl suberate (DSS) crosslinking, H12- and D12-DSS (1:1 ratio, 25 mM stock solution in anhydrous dimethylformamide; Creative Molecules) were added to a final concentration of 1 mM and incubated for 30 min at 37 °C with constant mixing at 750 rpm. The excess DSS was then quenched with 100 mM NH4HCO3 (50 mM final concentration) and further incubated at 37 °C for 20 min. For adipic acid dihydrazide (ADH) crosslinking, H8/D8-ADH (1:1 ratio, 100 mg/mL stock solution in 20 mM HEPES pH 7.4; Creative Molecules) and 4-(4,6- dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride (DMTMM) (144 mg/mL stock solution in 20 mM HEPES pH 7.4; Sigma Aldrich) were added to final concentrations of 8.3 mg/mL and 12 mg/mL, respectively. The sample was then incubated for 1.5 h at 37 °C with constant mixing. The excess DSS was then removed

34

Chapter 2: Materials and Methods by using the Zeba Spin desalting gel filtration columns (7K MWCO; Thermo Fisher Scientific) into 20 mM HEPES pH 7.5, 150 mM NaCl. For crosslinking with Bis sulfosuccinimidyl suberate (BS3 D0/D12; Creative Molecules), samples were incubated with 25 mM BS3, dissolved in water, in the cold room for 10 min and then at RT for 1.5 h. The reaction was quenched with 50 mM NH4HCO3 for 20 min at 25 °C. Finally, samples were dried in a speedyVac centrifuge, and subjected to digestion and clean- up for Mass Spectrometry (MS) analysis.

2.2.8.1.2 Size exclusion chromatography for crosslinked peptide enrichment

For size exclusion chromatography fractionation (SEC), the dried desalted peptides were resuspended in 150 mL of SEC mobile phase (acetonitrile:water:trifluoroacetic acid, 70:30:0.1 (v/v/v)) and loaded onto a Superdex Peptide HR 10/30 column connected to a Biologic DuoFlowTM system (Bio-Rad) with a BioFracTM fraction collector (Bio-Rad). A flow rate of 0.5 mL/min was used and the separation was monitored by UV absorption at 215, 225, 254, and 280 nm by a Biologic QuadTecTM UV/Vis detector (Bio-Rad). Fractions were collected as 2-min windows over 1.25 column volumes (30 mL). Based on the UV absorption traces, fractions of interest (retention volumes 12–17 mL) for MS were dried in a vacuum centrifuge.

2.2.8.1.3 Peptide labeling with iTRAQ reagents

The iTRAQ labeling of peptide samples derived from knock down and mock samples was performed using the iTRAQ Reagent-4plex Multiplex Kit (Applied Biosystems) according to the manufacturer’s protocol. Prior to iTRAQ labeling, 100 µg total protein pellet was resuspended in digestion buffer (0.05% w/v sodium dodecyl sulfate (SDS)) to a final concentration of 1 mg/mL (total protein measured by bicinchoninic acid assay (Sigma Aldrich). Equal aliquots (100 μg) from each lysate were denatured with 0.02% denaturant solution from the kit, and were reduced with 1 mM TCEP for 1 h at 60 °C, and were cysteine-blocked using cysteine blocking reagent for 10 min at room temperature. Samples were then digested with trypsin (1:40 w/w) overnight at 37 °C in a sealed tube. In order to maximize the labeling efficiency, the volume of the sample was kept less than 50 µL. Post-digestion, peptides were labeled with respective

35

Chapter 2: Materials and Methods isobaric tags (114 for knock down and 117 for mock samples) for 1 h at room temperature, and then samples were mixed.

2.2.8.1.4 Reversed Phase HPLC (rpHPLC)

Labeled peptides were then fractionated using a Zorbax 300 Extend-C18 (9.4 × 250 mm, 300Å, 5 µm) running on an HPLC system (GBC) at a flow rate of 1 mL min−1. Retained peptides were eluted using Buffer A (5 mM ammonium formate (pH 10.0) in 2% (v/v) acetonitrile) and Buffer B (5 mM ammonium formate (pH 10), in 90% (v/v) acetonitrile. The following chromatographic gradients were applied: 0–7 min 100% Buffer A; 7–45 min 92% Buffer A and 8% Buffer B; 45–49 min 73% Buffer A and 27% Buffer B; 49–65 min 69% Buffer A and 31% Buffer B; 65–72 min 73% Buffer A and 27% Buffer B; 72–75 min 100% Buffer A. Chromatograms were recorded at 214 nm. The sample was fractionated into 70 fractions and then selected non-contiguous fractions were combined into 6 samples; these were lyophilized.

2.2.8.1.5 Acetylated peptide enrichment

The PTMScan Acetyl-Lysine Motif [AC-K] kit (Cell Signaling) was used for the immunoaffinity enrichment of Lys-acetylated peptides, based on the manufacturer’s protocol. Prior to enrichment, the beads (to which are bound a set of antibodies that each recognize acetyllysine in a different sequence context) were washed three times with 1.5 mL of immunoaffinity purification (IAP) buffer (50 mM MOPS pH 7.2, 10 mM sodium phosphate, 50 mM NaCl). The iTRAQ labeled and dried desalted peptides were resuspended in 1.4 mL IAP buffer, and the pH was checked by spotting 2 µL of sample on pH indicator paper (pH should be close to neutral). The solution was cleared at 10000 g for 5 min at 4 °C. The soluble peptides were incubated with 10 µL of antibody beads for 2 h at 4 °C with end-over-end rotation. Following enrichment, anti-Kac antibody beads were washed three times in 1.5 mL chilled MQW grade water. Peptides were eluted from antibody beads with 50 μL TFA (0.15% in MQW) and desalted using homemade C18 stage tips.

2.2.8.1.6 Sample preparation for absolute protein quantification

36

Chapter 2: Materials and Methods

After sucrose gradient fractionation, 10% of the fractions containing NuRD complex along with 30 ng of monomeric BSA were run onto SDS-PAGE and stained with SYPRO- Ruby. The concentration of the complexes were calculated by Image J software. Then synthetic 15N labeled peptides were mixed with the samples so that we had 100 fmol of each synthetic peptide and about 20-50 ng of the complex, depending on the size of the complex in each injection. Samples were then dried down in a vacuum centrifuge. Dried samples were dissolved in a solution comprising 8 M urea in 50 mM Tris-HCl (pH 7.5). Digestion and peptide purification were done based on the above mentioned in solution digestion protocol. Samples were reduced with 50 mM TCEP in water for 30 min at 37 °C, and alkylated using 100 mM iodoacetamide at room temperature in the dark for 20 min. Prior to proteolytic digestion, samples were diluted to 6 M urea with 0.2 M NH4HCO3. Endo-LysC/trypsin mix was added to add a 1:25 enzyme:substrate ratio and the mixture incubated for 3-4 h at 37 °C. Samples again were diluted to <1 M urea with 50 mM NH4HCO3 and trypsin was added at a 1:50 ratio (wt:wt); the mixture was then incubated at 37 °C for 16 h. Digested peptides were first cleaned up using stage tips and 0.1% (v/v) formic acid, and were then eluted with 50% ACN/0.1% FA. Finally, eluted peptides were dried in a vacuum centrifuge.

2.2.9 LC-MS/MS

LC-MS/MS analysis was carried out on an Easy nLC-1000 UHPLC (Thermo Fisher Scientific) system connected to Q Exactive Classic or Q Exactive Plus mass spectrometer equipped with a standard nanoelectrospray source (Thermo Fisher Scientific). Digested peptides (~1 µg) were resuspended in loading buffer (3% (v/v) acetonitrile and 1% (v/v) formic acid (FA) in dH2O) and and injected onto an in-house packed column1 over 2 min with a flow rate of 250 nL/min; the column was 40-50 cm × 75 μm inner diameter and contained 1.9 µm C18AQ particles (Dr Maisch GmbH). . Peptides were separated at a flow rate of 200 nL/min using a linear gradient of 5-30% buffer B over 120 min. Solvent A consisted of 0.1% (v/v) formic acid, and solvent B consisted of 80% acetonitrile, 0.1% formic acid, 20% dH2O. The end-to-end run time was 150 min, including sample loading and column equilibration times.

37

Chapter 2: Materials and Methods

2.2.9.1 Shotgun proteomics method

For data-dependent acquisition run, full scan MS1 was operated as follows: mass range was between 300 and 1750 m/z at a resolution of 70000 at 200 m/z and 100 ms injection time was used with 3 × 106 automatic gain control (AGC). The top 20 most intense precursor ions were subjected to Orbitrap mass analysis for further fragmentation via high energy collision dissociation (HCD) activation (R = 35,000 at 200 m/z; 1 × 106 AGC; 120 ms injection time; 30 normalized collision energy; 2 m/z isolation window; 8.3 × 105 intensity threshold; minimum charge state of +2; dynamic exclusion of 60 s). Singly charged precursor ions and precursors of unknown charge states were excluded from fragmentation. A collision energy of 28 was set for iTRAQ MS/MS.

2.2.9.2 SWATH mass spectrometry method

For data-independent acquisition (DIA) or Sequential Window Acquisition of all Theoretical Mass Spectra (SWATH-MS) runs, after each full-scan MS1 (R = 140,000 at 200–1600 m/z; 3 × 106 AGC; 120 ms injection time), 16 × 25 m/z windows in the 425–825 m/z range were sequentially isolated and subjected to MS/MS (R = 17,500 at 200 m/z, 3 × 106 AGC; 60 ms injection time; 30% normalized collision energy). After the first eight 25 m/z isolations and MS/MS run and before the last eight 25 m/z isolations, a full-scan MS1 was performed. 25 m/z isolation window placements were optimized in Skyline [149] to result in an inclusion list starting at 437.9490 m/z with increments of 25.0114 m/z.

2.2.10 Data Analysis

2.2.10.1 Converting RAW files to Mascot Generic Format Files (MGF files)

RAW formatted fileswere converted into MGF file format using MS converter tool [150] for MS/MS searches in Mascot software. Filter option was set to peak Picking and MS levels were set to 1 to 2, while the remaining parameters were left at the default values.

2.2.10.2 XL-MS/MS data analysis using pLink software

38

Chapter 2: Materials and Methods

In order to identify cross-linked peptides, MS data were analyzed by the pLink2 software version 2.3.1 [151]. The ini file used for the data search was configured as below. Carbamidomethyl cysteine was set as a fixed modification and both oxidation of methionine and N-terminal acetylation of protein were set as variable modifications, the FDR was set to 0.05, and precursor and peptide tolerance were set to ± 15 ppm. For database configuration, a search database containing sequences of all subunits of the NuRD complex and frequently eluted contaminations was created. Visualization of identified protein-protein cross-links was done using xiNET [152] and Xlink Analyzer [153].

2.2.10.3 Annotating tandem mass spectra of the crosslinked peptides

The pLabel accessory tool of pFind Studio was used to annotate spectra/peak labeling of cross-linked peptides for visual inspection and publication [154]. Alpha peptide and beta peptide y- and b-ions were labelled.

2.2.10.4 Quantitative proteomics data analysis

2.2.10.4.1 MaxQuant

The MaxQuant software package works coupled to the search engine Andromeda [155], which is provided together with MaxQuant [156]. Prior to data analysis with MaxQuant, the human proteome fasta file was downloaded from UNIPROT (proteome ID: UP000005640) and was configured as protein database. For iBAQ data analysis, standard parameters were set with the additional options including match between runs, label free quantification (LFQ), and iBAQ. Cysteine carbamidomethyl was specified as a fixed modification and oxidation of methionine was set as variable modification. The tolerances of precursor and fragment ions were set at ±20 ppm and ±0.02 Da, respectively. For digestion, trypsin was set as the protease with two missed cleavage permitted. The false discovery rate for peptide, protein, and site identification was set to 1%. From the MaxQuant output, a txt file named ProteinGroups was used for further processing and a Peptide txt file was used for quality check of the data.

2.2.10.4.2 SWATH-MS data using Skyline software tool

39

Chapter 2: Materials and Methods

All DIA data were processed using Skyline v3.5.0.9319. Skyline is an open-source proteomics software tool which is for targeted proteomics method creation and quantitative data analysis and is widely used for building Selected Reaction Monitoring (SRM) / Multiple Reaction Monitoring (MRM) and Full-Scan (MS1 and MS/MS) quantitative methods and analyzing the resulting data [149]. Skyline also supports a variety of full-scan chromatography-based quantitative proteomics approaches, such as MS1 filtering, targeted MS/MS, and data-independent acquisition (DIA). Reference spectral libraries were built in Skyline as magellan storage file format (msf) files using the BiblioSpec algorithm [157]. Precursor and product ion extracted ion chromatograms were generated using extraction windows that were two-fold the full- width at half maximum for both MS1 and MS2 filtering. Ion match tolerance was set to 0.055 m/z. For MS1 filtering, the first three isotopic peaks with charges +2, +3, and +4 were included while for MS2, b- and y-type fragment ions with charges +1, +2, and +3 were considered. To ensure correct peak identification and assignment, the following criteria had to be met: (i) co-elution of light and heavy peptides; (ii) dot product values of >0.95 between peptide precursor ion isotope distribution intensities and theoretical intensities for both the heavy and light peptides; (iii) dot product values of >0.85 between library spectrum intensities and current observed intensities; (iv) relative dot product values of >0.95 between the observed heavy and light ion intensities; (v) matching peak shape for precursor and product ions of both the heavy and light peptides. Finally, peptides were quantified using area under the curve of the M, M+1 and M+2 peaks for MS1 and a manually curated set of y-ions for MS2. Curation was based on: (i) linearity as determined from the standard curve, and (ii) reproducibility and consistent observation between technical and biological replicates.

40

Chapter 3: The stoichiometry and composition of the NuRD complex

Chapter 3. Comparison of the composition and interaction partners of the Nucleosome Remodeling and Deacetylase (NuRD) complex in cancer cells

Mehdi Sharifi Tabar, Joel P. Mackay1, Jason K. K. Low1 1School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia

1Correspondence should be addressed to JPM: [email protected], +61 2 9351 3906; or JKKL: [email protected], +61 2 9351 3907

Abbreviations: SWI/SNF, SWItch/Sucrose Non-Fermentable; INO80, DNA helicase INO80; ISWI, Imitation SWI; CHD, Chromodomain helicase DNA binding; NuRD, Nucleosome remodelling and deacetylase complex; HEK, Human embryonic kidney; FLAG, epitope DYKDDDDK; HA, Human influenze haemaglutinin epitope YPYDVPDY; PBS, phosphate buffered saline; PEI, Polyethyleneimine; DTT, dithiothreitol; HRP, horseradish peroxidase; RRL, Rabbit reticulocyte lysate; IVT, in vitro translation; GST, glutathione S-transferase; iBAQ, intensity-based absolute quantification; MS, mass spectrometry; LC-MS/MS, liquid chromatography tandem mass spectrometry

3.1 Abstract

The nucleosome remodeling and deacetylase complex (NuRD) is a widely conserved regulator of gene expression. Determination of the subunit composition of the complex and identification of its binding partners are important steps towards understanding its architecture and function. The question of how these properties of the complex vary across different cell types has not been addressed in detail to date. Here, we set up a two-step purification protocol coupled to LC-MS/MS to assess NuRD composition and interaction partners in three different cancer cell lines, using label-free intensity- based absolute quantification (iBAQ). Our data indicate that the stoichiometry of the NuRD complex is preserved across our three different cancer cell lines. In addition, our interactome data suggest ZNF219 and SLC25A5 as possible interaction partners of the complex. To corroborate this latter finding, in vitro and cell-based pulldown experiments were carried out. These experiments indicated that ZNF219 can interact

41

Chapter 3: The stoichiometry and composition of the NuRD complex with RBBP4, GATAD2A/B, and CHD4, whereas SLC25A5 might interact with MTA2 and GATAD2A.

Key words; NuRD complex, , stoichiometry, protein-protein interactions, co-immunoprecipitations, iBAQ, ZNF219, SLC25A5,

3.2 Introduction

In complex organisms, there are four major classes of chromatin remodeling complexes that have ATP-dependent chromatin remodeling activity: (i) SWI/SNF, (ii) INO80, (iii) ISWI and (iv) CHD families [20, 158-160]. Several members of the CHD family (CHD3, 4 and 5 in mammals) are found associated with the nucleosome remodeling and deacetylase (NuRD) complex. In mammals, the NuRD complex is ~1 MDa in size and comprises at least six to seven proteins, which generally exist as two or more paralogues: CHD4 (and its paralogues CHD3 and CHD5), MTA1 (plus MTA2 and MTA3), HDAC1 (and HDAC2), GATAD2A (and GATAD2B), MBD3 (and MBD2), RBBP4 (and RBBP7) and CDK2AP1 [82, 100, 128, 161-163]. NuRD has been implicated in gene transcription and DNA repair and harbours both histone deacetylase and chromatin remodeling activities (for reviews see [164]). Overall, NuRD has been most commonly assigned as a repressor of gene expression [165], although it has also been linked to gene activation in some cases [58, 71]. However, the biochemical mechanisms underlying these activities are not well understood. One recent study showed that the conditional knock out of CHD4-NuRD in granule neurons of mice led to a significant decrease in H2A.Z occupancy at the promoters of a subset of genes and the concomitant up-regulation of these genes [166]. These data hint that repression of gene transcription by NuRD might in part involve histone variant exchange. Moreover, at specific developmental stages or in disease states, it is possible that NuRD can recruit other enzymes such as the lysine demethylase LSD1 to exert its function [90, 167].

Our biochemical understanding of the NuRD complex has been limited by difficulties with producing the intact complex recombinantly. Even parameters such as the subunit stoichiometry of the complex have remained unclear, despite several studies that have addressed this question [168, 169]. The mechanism(s) by which NuRD is

42

Chapter 3: The stoichiometry and composition of the NuRD complex recruited to sites of activity are also only beginning to be appreciated. To broaden our understanding of NuRD complex structure and function, we have used mass spectrometry (MS) to assess the subunit stoichiometry of the complex in three cancer cell lines representing three different tissues: the embryonic stem-cell-like NTERA-2 (hereafter referred to as NT2) cell line, the PC3 prostate cancer cell line, and the MCF7 breast cancer cell line [67, 170]. and have also used these data to identify proteins that associate with the complex. Our data indicate that the stoichiometry of the NuRD complex is preserved across our three different cancer cell lines and largely in agreement with data available in the literature. Interaction mapping experiments demonstrate that ZNF219 and SLC25A5 can interact with the complex.

3.3 Results

3.3.1 NuRD subunit expression across several cell lines

Compositional variation can arise from differences in the expression level of the subunits of a complex. We therefore first examined the expression profile of NuRD subunits using RNA-Seq data from the Human Protein Atlas (HPA) consortium [57, 113, 166, 171-173]. RNA-Seq data are represented as Transcripts Per Million (TPM), a metric that attempts to normalize transcript abundance for gene length and sequencing depth. Expression profiling revealed that, although the canonical paralogues of NuRD are all expressed in NT2, PC3, and MCF7 cells (Fig 3-1), the level of expression for several paralogues can vary widely across cell types. For example, MTA3 is expressed at a ~5- or ~10-fold higher level in NT2 cells than in MCF7 and PC3 cells, respectively. Likewise, RBBP4 expression is five times higher in NT2 cells than in MCF7 cells, whereas RBBP7 expression varies by less than 20% over the three cell lines (Fig 3-1).

The profile also shows significant differences between paralogues in a given cell type. For example, MTA1, MTA2, and MTA3 expression are within two-fold of each other in NT2 cells, whereas there is a three-fold and four-fold difference between MTA2 and MTA1, and MTA2 and MTA3, respectively, in MCF7 cells (Fig 3-1).

43

Chapter 3: The stoichiometry and composition of the NuRD complex

In all three cell lines, CHD4 is the dominant CHD-family protein (Fig 3-1); transcript levels are 2 to 18 times higher than that for CHD3, depending on the cell type; CHD5 was not detected above the threshold of 1 TPM. Similarly, GATAD2B and MBD2 are expressed at significantly lower levels than GATAD2A and MBD3, respectively. It should be noted that, because the RBBP4/7 and HDAC1/2 subunits are also found in other complexes (e.g., the SIN3A, CoREST and PRC2 complexes [174] [65, 175, 176]), the transcript abundance data reflects the sum of the contributions from all complexes in which these proteins are found. Other proteins, such as MTA1/2/3 and GATAD2A/B are predominantly associated with NuRD.

Fig 3-1. mRNA expression pattern of NuRD subunits in MCF7, NT2, PC3, HeLa, U2OS and HEK293 cells. Transcript abundance was extracted from RNA-Seq data analyzed by the Human Protein Atlas consortium [177]. The stacked bar graph compares the level of expression of paralogues in a given cell line as well as the level of expression of paralogues across cell types. Based on transcript levels, RBBP7, MTA2, GATAD2A, MBD3, and CHD4 seem to be the dominant paralogues in most cell types, whereas HDAC1 and HDAC2 quantities are quite similar.

44

Chapter 3: The stoichiometry and composition of the NuRD complex

3.3.2 Affinity purification followed by sucrose gradient centrifugation allows isolation of specific species

To investigate differences in NuRD composition between different cell types, we used a two-step purification protocol to allow isolation of the complex from cultured mammalian cells [82, 100, 128, 161-163]. A two-step approach reduces issues such as the presence of an excess of the subunit that is used as the purification ‘handle’ (RBBP4/7 in our case, through its interaction with FOG1 [138, 178, 179]. We first carried out a FOG-affinity purification step and then used sucrose gradient ultracentrifugation to improve the purity of isolated NuRD complex. Figure 3-2A shows SDS-PAGE analysis of purified NuRD from three cancer cell lines (NT2, PC3, and MCF7) following the first-step FOG1-affinity purification. Both similarities and differences (including significant differences in the amount of CHD4 observed) are apparent. Figure 3-2B shows the second-step sucrose gradient-based purification of NuRD derived from each cell line. Earlier fractions (fractions 4–19) show excess GST- FOG1 ‘bait’ protein and excess RBBP4/7 ‘handle’; these appear to be eliminated in the later fractions, as intended.

Following sucrose gradient ultracentrifugation, two fractions from each cell line (fractions 24 and 28) were selected for LC-MS/MS analysis as being representative of NuRD-type species.

45

Chapter 3: The stoichiometry and composition of the NuRD complex

Fig 3-2. SDS-PAGE showing the NuRD complex purified from NT2, PC3 and MCF7 cells. (A) SYPRO-Ruby stained SDS-PAGE showing affinity purification of NuRD from each cell type, based on the interaction of resin-bound GST-FOG1 with RBBP4/7. (B) SYPRO-Ruby stained SDS-PAGE gels show fractionation of affinity-purified NuRD species on a 5–35% sucrose gradient. Earlier fractions (4–19) contain excess GST- FOG1 protein. Fractions 24 and 28, indicated with black arrows, were selected for stoichiometry and interactome analysis. The identification of the proteins was done mass spectrometry.

46

Chapter 3: The stoichiometry and composition of the NuRD complex

3.3.3 NuRD complex stoichiometry is conserved across NT2, PC3, and MCF7 cells

Following LC-MS/MS and data quality analysis (described in Materials and Methods), we used the intensity-based absolute quantification (iBAQ) method to determine the stoichiometry of the complex [1]. In iBAQ, the sum of intensities of all tryptic peptides observed in the mass spectrometer for each protein is divided by the number of theoretically recognizable peptides. In general, because paralogues of the NuRD complex subunits have high sequence similarity (e.g., RBBP4 and RBBP7 are 92% identical), it was difficult to reliably distinguish the levels of each paralogue using iBAQ quantification. Thus, to estimate the stoichiometry of the NuRD components, we first treated each set of paralogues as a single group. We then used the MTA subunit as our reference point, setting its stoichiometry to be two units per NuRD assembly based on the X-ray crystal structure of the 2:2 MTA1-HDAC1 complex [112]. Fig 3-3 and Supplementary Data 1 shows the results of this analysis. For each replicate, we have opted to show the stoichiometry values averaged from fractions 24 and 28; fractions 24 and 28 agree closely for most NuRD subunits, with the exceptions of CHD and CDK2AP. More of these two proteins were found in the heavier-sedimenting fraction 28, as expected (refer to Supplementary Data 1 for this breakdown). In most cases, there was close agreement between the cell lines. For example, the MTA:HDAC ratio is between 2:2.1 and 2:2.3 in all cases. This suggests that the published crystal structure recapitulates the HDAC and MTA architecture in the NuRD complex in solution in diverse cell types. For the MBD subunit, the corresponding values are 1.5, 1.4 and 1.2, and for GATAD2 0.9, 1 and 1.2 are observed (in MCF7, NT2 and PC3 cells, respectively). Taken together, these data point towards a MTA:HDAC:MBD:GATAD2 stoichiometry of 2:2:1:1.

RBBP was detected at much higher levels, with RBBP:MTA ratios of 7.7:2, 6.6:2 and 9.3:2 observed for MCF7, NT2 and PC3 cells, respectively. More striking, however, was the large range of values; for example, RBBP quantities of ~5–11 were measured in MCF7 cells (Fig 3-3A, Supplementary Data 1). A similarly broad distribution of values

47

Chapter 3: The stoichiometry and composition of the NuRD complex for RBBP has been observed in several previous iBAQ measurements of NuRD stoichiometry [100, 168, 180, 181].

In contrast, both CHD and CDK2AP subunits gave values indicating less than a single subunit per NuRD assembly (0.07, 0.2 and 0.2 for CHD and 0, 0.4 and 0.3 for CDK2AP). We also noted that the quantities of CHD and CDK2AP appear to be highly correlated (r2 = 0.83; Fig 3-3B); higher quantities of CHD are associated with higher quantities of CDK2AP.

Fig 3-3 Stoichiometry of subunits of the NuRD complex in each cell line. Stoichiometry of subunits of the NuRD complex in each cell line. In all panels, stoichiometry of subunits is shown relative to MTA, which is set to two copies per NuDe/NuRD assembly. (A) Calculated stoichiometry for the NuDe complex from

48

Chapter 3: The stoichiometry and composition of the NuRD complex fraction 24. (B) Calculated stoichiometry for the NuRD complex from fraction 28. Each point is one experimental replicate. The bar graph shows the median stoichiometry value derived from the three experimental replicates. (C) Change in number of subunits per complex between NuDe and NuRD fractions. Each point is one experimental replicate pair (fractions 24 and 28) from the same sucrose gradient. (D) Graphs showing the correlation of subunit number for selected pairs of subunits. Each data point is an independent measurement; data from all sucrose-gradient fractions are included.

3.3.4 ZNF219 and SLC25A5 can interact with NuRD subunits

Several interaction partners have been reported for the NuRD complex, including a subclass of the family of ZnF transcription factors that contains FOG1 [178], ZBTB7A [168] and Sall1 [182]. These proteins share a common N-terminal RBBP-binding motif [138]. Other regulators such as Zic2 [66, 183], PHF6 [184] ZMYND8 [57], and the lysine demethylase LSD1 [90, 167] have also been shown to associate with NuRD. In some cases, these proteins most likely represent bridges between NuRD and specific genomic sites, whereas in other cases might expand the functional diversity of distinct NuRD complexes in different biological contexts. However, despite the fact that the NuRD complex is strongly linked to cancer onset and progression, its interaction partners have not been studied in cancer cells. From our LC-MS/MS data, we identified 82 possible NuRD-interacting proteins [(Fig 3-4), also see Appendix D].

To choose among the candidate proteins for further biochemical analysis, we further refined the enriched proteins by considering a combination of the following parameters: (i) the number of unique peptides observed (≥ 3, except for COASY); (ii) the score for a candidate protein in the Contaminant Repository for Affinity Purification (CRAPome) database (proteins with scores >50 were rejected) [185]; and (iii) available interactome information in the BioGRID database (candidate proteins were searched for interaction partners — especially NuRD subunits or other known interactors of NuRD, [186]). We selected eight candidate NuRD-interacting proteins, namely ZNF219, SLC25A5, LANCL1, PAICS, DDB1, COASY, DCAF11 and KPNA4, for further analysis.

49

Chapter 3: The stoichiometry and composition of the NuRD complex

To corroborate our LC-MS/MS data, we carried out pulldown experiments. Full-length candidate genes (FLAG tagged) were co-expressed with HA-tagged NuRD components (MTA2, GATAD2A, CHD4, MBD3 (as MBD3cc; see Methods), and RBBP, along with tagless HDAC1) in HEK293 cells, and anti-FLAG beads were used to capture the candidate proteins from the cell lysate. For six of the candidate proteins (LANCL1, PAICS, DDB1, COASY, DCAF11 and KPNA4), no interactions were observed with any NuRD subunit (data not shown). On the other hand, ZNF219 robustly pulled down HDAC1, MTA2, CHD4 and RBBP4, and much weaker bands were detected for MBD3cc and GATAD2A (Fig 3-5A). Similarly, SLC25A5 showed strong interactions with MTA2, HDAC1 and CHD4, and weaker bands were observed for GATAD2A, MBD3cc and RBBP4 (Fig 3-5B).

Because multiple interactions were observed for both ZNF219 and SLC25A5, we reasoned that a subset of these apparent interactions might, actually reflect indirect ‘bridging’ interactions that are facilitated by endogenous NuRD components. We have observed this bridging effect previously in NuRD pulldowns carried out from mammalian cell lysates [187]. Indeed, the presence of HDAC1 in all of the pulldowns that used ZNF219 as the bait (despite it not being transfected in most cases) and the fact that all tested NuRD subunits show some level of positive interaction might indicate that endogenous NuRD complex is being pulled down in these experiments and that indirect interactions could be at play.

To address this possibility, we repeated our pulldown experiments by co-expressing each pair of proteins in an in-vitro transcription-translation system made from rabbit reticulocyte lysate (RRL); we have shown previously that this lysate contains negligible amounts of NuRD subunits, which should reduce the possibility of false positive interactions [187]. Using the RRL system, clear interactions were observed between ZNF219 and both GATAD2A and RBBP4 (Fig 3-5C), whereas no interaction was observed with MTA2, HDAC1, MBD3 or CHD4. Similarly, SLC25A5 only showed interactions with MTA2 and GATAD2A (Fig 3-5D).

50

Chapter 3: The stoichiometry and composition of the NuRD complex

Fig 3-4. Complete list of possible interacting partners of the NuRD complex. Heatmap showing the filtered list of possible NuRD interactors detected in by LC- MS/MS . The proteins shown are ones that are present at least in two biological

51

Chapter 3: The stoichiometry and composition of the NuRD complex replicates of a particular cell type. Cell type and experimental replicate number are indicated above each column. Averaged iBAQ-adjusted intensity values were divided by the summed intensities for MTA subunits and log₂-transformed. Core NuRD subunits are presented at the top, followed by the potential interactors that were further tested, and other proteins are listed in alphabetical order.

3.3.5 ZNF219 contacts NuRD via its four N-terminal zinc fingers

As ZNF219 has previously been shown to act as a gene co-regulator [188, 189], we chose to define in more detail its interaction with NuRD subunits. We used deletion mapping to narrow down the regions of each protein that were responsible for making these interactions. ZNF219 was divided into three fragments: (i) ZNF219N, which contained ZnFs 1–4; (ii) ZNF219M, which contained ZnFs 5–8; and (iii) ZNF219C, which contained ZnF9 (Fig 3-6A). At the same time, we expressed GATAD2B as two fragments. We also tested three fragments of CHD4, to address the possibility that the large size of this protein (210 kDa) prevented its correct folding in the RRL experiment shown in Fig 3-5C. RBBP4 was expressed as a full-length protein, given its known single-domain structure [139, 190]. Each of the ZNF219 fragments was then tested for its ability to interact with the NuRD subunit fragments.

In the case of CHD4, the interaction mapping carried out with ZNF219 fragments as baits pointed most consistently to an interaction between the N-terminal end of ZNF219 and the C-terminal fragment of the CHD4 (Fig 3-6B). Similar results were obtained in reciprocal experiments with CHD4 fragments used as baits (Supplementary Fig 3-2). A weak interaction was also observed between ZNF219N and CHD4M in one replicate (Supplementary Fig 3-2), perhaps suggesting that this region might play a minor role in the ZNF219-CHD4 interaction. The interaction between ZNF219 and GATAD2B also appeared to localize to ZNF219N, which was pulled down by the C- terminal half of GATAD2B (and much more weakly by the N-terminal half of GATAD2B) (Fig 3-6C). Similarly, ZNF219N was also responsible for mediating the interaction with full-length RBBP4 (Fig 3-6C).

52

Chapter 3: The stoichiometry and composition of the NuRD complex

Fig 3-5. Pulldowns show that ZNF219 and SLC25A5 can interact with NuRD subunits. (A) Western blots probing the interaction of ZNF219 with NuRD components, following co-expression of the indicated proteins in HEK293 cells and FLAG immunoprecipitation (IP). ZNF219 showed clear interactions with MTA2, HDAC1, CHD4 and RBBP4, and weaker bands were observed for MBD3cc and GATAD2A. The use of MBD3cc is described in Methods. Note that HDAC1 appears in all pulldowns, indicating that endogenous NuRD components are being pulled down along with the

53

Chapter 3: The stoichiometry and composition of the NuRD complex transfected subunits. (B) The same pulldown experiment as (A) but probing the interactions of SLC25A5 with NuRD components instead. SLC25A5 showed clear interactions with MTA2, HDAC1 and CHD4, and weaker bands were observed for GATAD2A, MBD3cc and RBBP4. (C) The same pulldown experiment as (A) but carried out using co-transcription and co-translation in rabbit reticulocyte lysates (RRL). In this case, only GATAD2A and RBBP4 showed a robust interaction with ZNF219. (D) The same experiment as (B) but in the RRL system; only MTA2 and GATAD2A display an interaction with SLC25A5. In all panels, dashed boxes indicate the expected position of prey proteins for a positive interaction.

Fig 3-6. Deletion constructs show ZNF219 contacts NuRD via its four N- terminal zinc fingers. (A) Domain structures of ZNF219, GATAD2B and CHD4.

54

Chapter 3: The stoichiometry and composition of the NuRD complex

Boxed regions indicate domains for which 3D structures are known or are predicted to be ordered; and lines indicate regions predicted to be disordered. Deletion constructs used for interaction mapping are indicated below each protein. HMG = HMG-box domain. PHD and CD refer to PHD domains and chromodomains, respectively. CC = coiled coil. ZnF = GATA-type zinc finger. C2H2 = classical zinc finger. (B) Western blots showing the results of pulldowns with the indicated proteins. Fragments of FLAG-ZNF219 were co-transcribed and translated with fragments of HA- CHD4 in the RRL system and were pulled down using FLAG beads. The gel was probed using both HA and FLAG antibodies. (C) Western blots showing the results of pulldowns with the indicated proteins. Fragments of HA- or FLAG-ZNF219 were co- transcribed and co-translated with either full length HA-RBBP4 or fragments of FLAG- GATAD2B. The gel was probed only with HA antibody to avoid overlap with the FLAG- tagged bait proteins. In all panels, dashed boxes indicate the expected position of prey proteins for a positive interaction.

3.6. Discussion

Published data suggest that the composition of most protein complexes is constant across many cell types [57, 60, 112, 113, 166, 171-173, 191-194]. We suggest that the NuRD complex has a composition that can vary in a context dependent manner – not necessarily in architecture, however, but rather in paralogue distribution. Although we see that the canonical subunit composition of the complex (RBBP-MTA-HDAC-MBD- GATAD2-CHD) is generally conserved across three different cell types – and that most paralogues were found in each case, we did observe some differences. Most prominently, MBD2 was not observed in NuRD from NT2 cells, despite the fact that MBD2 transcript levels are higher in NT2 cells than in MCF7 or PC3 cells. This suggests that MBD3-NuRD is the predominant NuRD complex in NT2 cells.

MBD2- or MBD3-exclusive NuRD complexes have been previously described and shown to have distinct roles [134, 136, 195]. For example, it has been shown that Mbd3 knock-out mice die during early embryogenesis, whereas Mbd2 knock-out mice are viable and fertile, indicating that these two proteins are not functionally redundant [134]. Knockdown of MBD3 but not MBD2 dramatically increases the efficiency of the reprogramming of somatic cells to pluripotent stem cells [60]. A similar observation

55

Chapter 3: The stoichiometry and composition of the NuRD complex has been made for the GATAD2 subunit: depletion of GATAD2A but not GATAD2B leads to highly efficient induced pluripotent stem cell formation [40, 88, 196-200]. In addition, GATAD2A-NuRD is also selectively recruited by ZMYND8 to DNA double- stranded breaks [57]. CHD3-, CHD4- and CHD5-specific NuRD complexes have also been implicated in mouse cortical development [201]. Paralogue interchanges of these types will most likely be revealed to be a widespread aspect of NuRD activity.

3.4 Materials and methods

3.4.1 Constructs

All genes used in this study were human, unless otherwise stated, cloned from a cDNA pool into FLAG-pcDNA3.1, HA-pcDNA3.1 or tagless-pcDNA3.1 expression vectors. Full- length genes for mouse MTA2 (UniProt ID: Q9R190), HDAC1 (UniProt ID: Q13547), mouse GATAD2A (UniProt ID: Q8CHY6), CHD4 (UniProt ID: Q14839), and RBBP4 (UniProt ID: Q09028), SLC25A5 (UniProt ID: P05141). ZNF219 (UniProt ID: Q9P2Y4), LANCL1 (UniProt ID: O43813), KPNA4 (UniProt ID: O00629), PAICS (UniProt ID: P22234), COASY (UniProt ID: Q13057), DCAF11 (UniProt ID: Q8TEB1), DDB1 (UniProt ID: Q16531), and MBD3cc [which consists of murine MBD3 (UniProt ID: Q9Z2D8) fused to residues 133–174 of mouse GATA2DA. This region of GATA2DA has been previously shown to form a coiled-coil interaction with MBD3 [132]; this fusion construct aims to stabilize MBD3]. We also generated a series of shorter constructs in the same vector: GATAD2BN (1–277), GATAD2BC (339-633), CHD4N (1–355), CHD4M (355– 1230), CHD4C (1230–1912), ZNF219N (1-273), ZNF219M (256-618), ZNF219C (600-722).

3.4.2 Cell culture

The human NTERA2 clone D1 EC stem cells (NT2) were cultured in DMEM/F-12 medium, whereas PC3 and MCF7 cell lines were grown in RPMI 1640. All media were supplemented with 10% (v/v) fetal bovine serum and 100 units/mL penicillin and 100 μg/mL streptomycin. PC3 and MCF7 Cells were passaged every 5–6 days but NT2 cells were passaged every 3–4 days.

56

Chapter 3: The stoichiometry and composition of the NuRD complex

3.4.3 Nuclear extract isolation and NuRD complex affinity purification

Nuclear extract preparation and FOG1-affinity pulldowns were performed essentially as described previously [82, 100, 128, 161-163]. First, affinity purification was performed in two steps. Step 1: Escherichia coli BL21 (DE3) cells (500 µL) expressing GST-FOG1 (1–45) were lysed via sonication in GST binding buffer (50 mM Tris, 150 mM NaCl, 0.1% (v/v) mercaptoethanol, 0.5 mM PMSF, 0.1 mg/ml lysozyme, 10 µg/ml DNase I, pH 7.5) and clarified via centrifugation (16,000 g, 20 min, 4 °C). The cleared supernatant and 300 µL of glutathione-Sepharose 4B beads (GE Healthcare) were incubated for 2 h at 4 °C. The beads carrying GST-FOG1 were then washed three times in GST wash buffer (50 mM Tris, 500 mM NaCl, 1% (v/v) Triton X-100, 1 mM DTT, pH 7.5) and then NuRD binding buffer (50 mM HEPES-KOH, 150mM NaCl, 1% (v/v) Triton X-100, 1 mM DTT, 1× complete protease inhibitor (Roche), pH 7.4). Step 2: Nuclear extracts were prepared by incubating 2 × 108 cells with 500 µL of hypotonic lysis buffer (10 mM HEPES-KOH, 1.5 mM MgCl2, 10 mM KCl, 1 mM DTT, complete protease inhibitor, pH 7.9) for 20 min at 4 °C. IGEPAL CA-630 was then added [final concentration, 0.6% (v/v)], and the cells were then further incubated for 10 min. The mixture was then vortexed for 10 s and then centrifuged (3,300 g, 5 min). The cytoplasmic supernatant was discarded, and the nuclear pellet was gently washed once with lysis buffer (supplemented with 0.6% (v/v) IGEPAL CA-630). The washed nuclear pellet was re-suspended in NuRD binding buffer, then lysed by sonication, and incubated on ice for 30 min to allow the chromatin to precipitate. The nuclear extract was then clarified via centrifugation (16,000 g, 20 min, 4 °C), and the cleared supernatant was incubated with the above FOG1 affinity resin overnight at 4 °C. Post incubation, the resin was washed three times with wash buffer 1 (50 mM HEPES-KOH, 500 mM NaCl, 1% (v/v) Triton X-100, 1 mM DTT, pH 7.4) and then 3 times with NuRD wash buffer 2 (50 mM HEPES-KOH, 150 mM NaCl, 0.1% (v/v) Triton X-100, 1 mM DTT, pH 7.4). Finally, the complex was eluted with elution buffer (50 mM HEPES-KOH, 150 mM NaCl, 1 mM DTT, and 50 mM glutathione, pH 8).

3.4.4 Sucrose density gradient ultracentrifugation

57

Chapter 3: The stoichiometry and composition of the NuRD complex

Sucrose density gradients were prepared essentially as described in [202]. To prepare 5–35% (w/v) sucrose density gradients (in 50 mM HEPES-KOH pH 8.2, 150 mM NaCl), 6 mL of filtered 5% sucrose buffer was transferred into a 12-mL ultracentrifugation tube (Beckman Coulter) using a syringe attached to a Pasteur pipette. Then, using the same syringe, 6 mL of 35% sucrose buffer was added onto the bottom of the tube very gently, so as not to introduce any bubbles, and the tube was sealed with Parafilm- M. The tube was slowly tilted (>60s) onto its side (completely horizontal) and left untouched for 4 h at room temperature. The tubes are then slowly tilted back to the vertical position and then left standing at 4 °C for at least 1 h before use. The NuRD complex sample (200 µL) was then layered on top of the gradient and centrifuged (186,000 g, 4 °C, 18 h). The gradients were fractionated as 200-µL aliquots.

3.4.5 Sample preparation and nano-flow liquid chromatography- tandem mass spectrometry

Two different sucrose gradient fractions from each biological replicate were selected for enzymatic digestion (fractions 24 and 28). The selected fractions were dried down in a vacuum centrifuge, then dissolved in 100 µL of 8 M urea (in 50 mM NH4HCO3) to a final concentration of 0.2–1 mg/mL. Next, samples were reduced using 5 µL of 50 mM TCEP at 37 °C for 30 min, then alkylated for 20 min in the dark at RT with 5 µL of 50 mM of iodoacetamide. Endo-LysC/Trypsin mix was then added at a 1:25 enzyme:substrate (wt:wt) ratio and the mixture was incubated at 37 oC for 4 h.

Samples were diluted with 50 mM NH4HCO3 to 0.75–1 M urea; trypsin was then added at a 1:50 ratio (wt:wt) and the mixture incubated at 37 °C for 16 h. Digested peptides were acidified and desalted using in-house packed C18 stagetips. LC-MS/MS analysis was carried out on Easy nLC-1000 UHPLC (Proxeon) system connected to a Q-Exactive Plus mass spectrometer equipped with a standard nanoelectrospray source (Thermo Fisher Scientific). Peptides (~1.5 µg) were dissolved in loading buffer (3% (v/v) acetonitrile and 1% (v/v) formic acid (FA) in double-distilled water), picked up by the autosampler and injected onto a 40 cm × 75 μm inner diameter column packed in- house with 1.9-µm C18AQ particles. Peptides were separated at a flow rate of 200 nL/min using a linear gradient of 5–30% buffer B over 120 min. Solvent A consisted of 0.1% (v/v) formic acid, and solvent B consisted of 80% (v/v) acetonitrile, 0.1%

58

Chapter 3: The stoichiometry and composition of the NuRD complex

(v/v) formic acid. The end-to-end run time was 150 min, including sample loading and column equilibration times.

The Q-Exactive Plus mass spectrometer was set to a data-dependent acquisition mode (DDA). In the DDA run, each full scan MS1 was operated as follows: mass range was between m/z 300 and 1750 at resolution of 70,000 at 200 m/z and 100-ms injection time. The top 20 most intense precursor ions were selected to be fragmented in the Orbitrap via high energy collision dissociation activation (R = 35,000 at 200 m/z; 1 × 106 AGC; 120 ms injection time; 30 normalized collision energy; 2 m/z isolation window; 8.3 × 105 intensity threshold; minimum charge state of +2; dynamic exclusion of 60 s).

3.4.6 Analysis of MS/MS data

Raw data were analyzed by MaxQuant (version 1.5.6.5) [156] using standard settings with the additional ‘match between runs’ option, LFQ and iBAQ selected. Methionine oxidation (M) and carbamidomethyl cysteine (C) were selected as variable and fixed modifications, respectively. The quality of the tryptic peptides was first analysed using the peptide text file created in MaxQuant. The majority of the observed peptides (~97%) were 7–30 amino acids in length, suggesting complete digestion of the samples (Supplementary Fig 3-1). Maxquant analysis identified 281 proteins at a 1% false discovery rate (Supplementary Data 1). Of these 281 proteins, 46 were filtered out; these comprised 9 protein duplicates, 34 contaminant proteins (e.g., keratin) and 3 false positive hits (i.e., decoy hits) (Supplementary Data 1). At this stage, fraction 28 of PC3 replicate 1 and fraction 24 of MCF7 replicate 2 were also excluded from further analysis, as some canonical subunits of NuRD was not detected (Supplementary Data 1).

To determine the possible interactors, the list of 235 proteins was further filtered: proteins that were present, based on averaged iBAQ-adjusted intensity values (of the two fractions tested per replicate), and in at least 2 biological replicates in any set of 3 replicates (per cell type) were shortlisted. This filter yielded a total of 36 proteins. Of these 36 proteins, 12 were NuRD complex paralogues and 24 were possible new interactors (Supplementary Data 1). For NuRD stoichiometry determination, the iBAQ-

59

Chapter 3: The stoichiometry and composition of the NuRD complex adjusted intensity values for MTA1, -2, and -3 were summed together (Supplementary Data 1) and intensity values for known NuRD complex subunits (including all paralogs and CDK2AP1) were divided by this summed MTA value. The final values were then multiplied by 2 because there are likely to be 2 MTA molecules per NuRD complex based on published crystal structures [112, 171]. All raw mass spectrometry data and search results have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository [203].

3.4.7 Co-immunoprecipitation analysis in HEK293 cells and rabbit reticulocyte lysate

Suspension HEK Expi293FTM cells (Thermo Fisher Scientific) were used for interaction analysis as detailed elsewhere [163, 187]. In brief, 4 µg plasmid DNA and 8 µg PEI were mixed and incubated in 200 µL sterile PBS at room temperature for 30 min, and then was added to 2 mL of cells (2 × 106 per mL). The transfected cells were collected after 65 h, and then lysed by sonication in 0.5 mL of lysis buffer (50 mM Tris pH 7.9, 150 mM NaCl, 1 mM PMSF, 1× complete protease inhibitor (Roche), 0.2 mM DTT, 1% (v/v) Triton X-100). Cell lysates were cleared by centrifugation at 20,000 g at 4 °C for 20 min and 5% of cleared lysate was used as input to check expression. Cleared lysates were mixed with 20 µL anti-FLAG Sepharose 4B beads (Biotool, Houston, TX, USA; pre-equilibrated with 50 mM Tris-HCl, 150 mM NaCl, 0.1% (v/v) Triton X-100) and rotated end-over-end for 2 h at 4 °C. Next, the beads were washed five times with 1 mL of washing buffer (50 mM Tris pH 7.5, 150 mM NaCl, 0.5% (v/v) Igepal CA-630, and the bound proteins were eluted in three lots of 20 µL of elution buffer (10 mM HEPES pH 7.5, 150 mM NaCl, and 150 µg/mL 3× FLAG peptide).

For cell free expression, the TNT Quick Coupled Transcription/Translation System (Promega, Fitchburg, WI, USA) was used. Two plasmids were co-transcribed and co- translated in the same reaction mixture according to the manufacturer’s protocol. Immunopurification was performed as described above for HEK293 cells. 5% of input was used for expression check.

For Western blot analysis, the gel-separated proteins were blotted onto PVDF membranes and probed with antibodies from Cell Signalling Technology (Danvers, MA,

60

Chapter 3: The stoichiometry and composition of the NuRD complex

USA): anti-FLAG-HRP (2044S, 1:20,000), anti-HA-HRP (2999S, 1:20,000), anti-HDAC1 (5356S, 1:20,000).

Acknowledgments

This work was supported by the National Health and Medical Research Council Grants APP1012161, 1063301 and 571099; Australian Research Council Grant DE120102857. We thank Mario Torrado for help with immunopurification.

61

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Chapter 4. Understanding the assembly and stoichiometry of MMH and MMHG complexes

4.1 Introduction

The NuRD complex uniquely connects chromatin remodeling to DNA methylation and histone modifications. It brings together a methylcytosine-binding domain protein (MBD2/3), histone deacetylases (HDAC1/2), and a large nucleosome remodeler (CHD3/4) [61, 132]. It has previously been shown that HDAC1/2 and RBBP4/7 are recruited by MTA1-3 to form a subcomplex that we named the MHR (MTA-HDAC- RBBP) complex [163, 204] (Fig 4-1). The remaining subunits such as MBD, GATAD2 and CHD (which we collectively term mGC for simplicity (Fig 4-2), although we note that is not currently clear whether it can form a truly independent complex) can be sequentially added to the core MHR complex and build up the whole NuRD complex, which is then capable of binding and remodeling DNA.

However, the details of these inter-subunit interactions and the stoichiometry of the complex are not well known. We therefore asked the question “How does the mGC module interact with the MHR module?” To address this question we recombinantly expressed two subcomplexes: one composed of MBD3, N-terminal half of the MTA2 protein (referred to as MTA2N hereafter) and HDAC1 (hereinafter referred to as MMH complex), and another containing full length MTA2, MBD3, HDAC1 and GATAD2A (hereinafter referred to as MMHG complex) in suspension Expi293F™ cells. Then we investigated the stoichiometry of the subunits and topology of the complexes using mass spectrometry (MS) based proteomics techniques. Absolute protein quantification, using heavy labeled isotopes followed by data independent acquisition mass spectrometry (DIA-MS or SWATH-MS that can be used interchangeably), was performed to determine the stoichiometry. Crosslinking mass spectrometry (XL- MS/MS) was used to assist the modeling of SPEM data.

62

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Fig 4-1. Schematic representation of the MHR and MMH, and mGC subcomplexes. The NuRD complex is composed of six different proteins, each of which have several paralogs. The MTA-HDAC-RBBP complex (MHR) forms the core NuRD complex and the other subunits (MBD, GATAD2 and CHD) are recruited to this core. It has been shown by us and others that MTA and HDAC make a dimer of dimers, and RBBP4 is recruited by C-terminal part of MTA [163]. However, we do not exactly know how many RBBP4 molecules can interact with the MTA C-terminal half, which will be addressed in Chapter 5. Question marks indicate that we still do not clearly know if the MBD-MTA-HDAC or MMH complex can be formed or even the mGC module exists or not. The ELM2-SANT domains in the MTA protein dimerize through the ELM2 domain and then wrap around the HDAC protein. The BAH domais is positioned to present nucleosomal substrate to the NuRD complex.

4.1.1 What are DIA-MS and SWATH-MS?

In proteomics studies, the MS can be set up to scan and analyse the peptides in different ways, depending on the goal of the study. There are two main modes of data acquisition: data dependent acquisition (DDA) [205] and data independent acquisition (DIA) [206]. In the DDA method, peptides are ionized into precursor ions and are

63

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes guided into the MS. Upon entering the mass spectrometer, these precursor ions are scanned in the mass analyzer. In the next step, the collected ions (known as precursor ions) are subjected to the collision chamber to generate fragment ions or product ions for analysis in a second mass analyzer. The selection of precursor peptides is performed based on predefined criteria in the method and is usually based on ion abundance (the 15 to 20 most abundant ions are usually selected). The selection of the most abundant ions is random, thus this method suffers from a lack of reproducibility. This issue is exacerbated in the case of low abundant peptides in complex mixtures.

To overcome this problem, the concept of independent data acquisition was proposed by John Yates’ group in 2004 [207]. They proposed a modified method, named data independent, for tandem mass spectrometers that sequentially isolates and fragments 10 m/z windows [208]. In contrast to DDA, every detectable ion is fragmented, not just the more abundant species. Sequential Window Acquisition of All Theoretical Peptides of All Spectra (SWATH-MS) is a recently developed implementation of data- independent acquisition (DIA) and targeted analysis [206].

The SWATH-MS method was first developed by Ruedi Abersold’s group and has become popular as it combines the advantages of DDA and single reaction monitoring (SRM). SRM is the gold standard approach for targeted quantitative proteomics which is performed in triple-quadrupole MS in which a precursor ion of a particular mass is selected in the first MS run and then an ion product of the precursor ion is selected in the second mass spectrometer stage for detection. The SWATH-MS approach circumvents the low reproducibility of the DDA method because all peptides within a defined mass-to-charge (m/z) window are subjected to fragmentation, and the mass spectrometer scans the full m/z range [209].

4.1.2 Why do we need to perform absolute protein quantification?

In many quantitative proteomics studies, the quantities of the detected peptides are not determined in absolute terms (as for example the number of femtomoles of a peptide present in a particular sample). This is because relative quantification reports

64

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes a dimensionless protein ratio that describes change in abundance of the protein between two or more of samples. However, relative quantification has some limitations, including the fact that the peak intensity of a peptide can vary across samples due to both the gradual deterioration of the nano-spray ESI needle and the reduced binding capacity of the LC column [210]. Absolute protein abundances are typically expressed in units such as copy number per cell, picomoles of protein per volume of body fluid, and protein weight per tissue or cell extract weight [211]. To determine the stoichiometry or copy number of subunits of the NuRD complex, we required an absolute quantification method. The use of a known amount of isotopically labeled heavy peptides is the most accurate way to quantify protein abundance [212, 213]. In this method, heavy labelled isotope peptide standards are spiked in with the biological samples prior to proteolytic digestion. These internal standards have different masses to the peptides derived from the samples and this difference can be recognized in a high-resolution mass spectrometer. The abundance of the peptides from each sample is calculated/quantified and compared to the isotopically labeled heavy standards.

4.1.3 What is XL-MS/MS and what can it tell us?

XL-MS/MS is the process of chemically joining two or more protein molecules by a covalent bond, digesting the treated sample with a protease and then analyzing the resulting peptide mixture by mass spectrometry to identify the crosslinked peptides [214]. In the past few years, XL-MS/MS of large protein complexes has become a valuable method combining proteomics and structural biology and it has recently been successfully applied for the elucidation of the structure and architecture of large multiprotein complexes such as SWI/SNF and INO80 [215, 216]. Proteins that are close enough in space can be crosslinked by a crosslinker (XL), and therefore XLs connecting two subunits provide information on the physical proximity of these subunits. In addition, the distance restraints provided by the length of the crosslinker and sidechains can be used for interpreting EM envelopes of the complexes [217].

In the current study focused on NuRD, we used BisSulfoSuccinimidylSuberate (BS3) H12/D12 (isotopically-coded light (H12) and heavy (D12)) crosslinker, which is a homobifunctional amine-to-amine crosslinker that is non-cleavable (Fig 4-2). The

65

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes light and heavy molecules differ by 12 deuterium atoms in the heavy form instead of 12 hydrogen atoms in light form. BS3 contains an amine-reactive sulfo-N- hydroxysulfosuccinimide (sulfo-NHS) ester at each end of an 8-carbon spacer arm with a 11.4 Å length. The theoretical distance between the target groups is restricted by the length of this linker joining the adjacent reactive ends of the crosslinker, and this can be extrapolated to a maximum distance restraint between the two crosslinked residues. The NHS esters react with primary amines at pH 7-9 and forms stable amide bonds. For use in structure determination, the crosslinks detected in the mass spectrometer are translated into maximal distances between the two crosslinked amino acids, which helps when modeling structures of multi-subunit protein complexes [217].

Fig 4-2. Structure of the BS3 crosslinker. The distance constraint between the Cα atoms of two residues is calculated by summing the length of the XL spacer arm and side chains of the residues. The length of a BS3 spacer arm and each lysine side chain are 11.4 Å, and 6 Å, respectively. A distance tolerance of 6 Å is also added to the theoretical maximum length because XL-MS/MS studies on proteins with known structure consistently report XLs that exceeds 24 Å [218]. A distance constraint of ≤30 Å between alpha carbons of lysine residues has been reported as an appropriate distance for BS3 and DSS cross linkers [218].

Thus far, we do not clearly know whether one or two MBD molecules exist in the NuRD complex and whether MBD can make a stable complex with the MTA-HDAC (MH) core complex. Moreover, it is not clear whether the mGC complex can form a truly independent complex and if the mGC module can be sequentially added to the core MHR complex and build up the whole NuRD complex. The goal of the work in this Chapter is to begin to address this gap in our knowledge.

66

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

4.1.4 single particle electron microscopy (SPEM)

In the early 1970s, Electron Tomography (ET) was used to three-dimensionally image biological specimens [219]. The basis of electron tomography is a tomogram that can be obtained from the collecting different views of the specimen. However, the radiation used in ET for imaging particles can limit the number of collected images because of the damage caused by radiation. Nevertheless, this principle has been extended to become the foundation for almost all three-dimensional electron microscopies and single particle analysis [220]. Advances in SPEM and computational methods circumvented the limitation of ET by combining thousands of images of the same macromolecule (e.g., protein) in random orientations. Over the past decade, SPEM techniques have helped scientists to solve the structure of very large protein complexes even of megadalton size, including many structures that have not been able to be determined using crystallography or other structural biology methods [221].

4.2 Results

4.2.1 Experimental design

Figure 4-3 shows the experimental pipeline used for protein expression, purification and downstream analysis. Different parameters, such as the number of EXPI293F™ cells needed for transfection to get the right number of complexes for each experiment, had to be accounted for in order to manage time, consumables, and costs. Initially, a trial overexpression was conducted with 2 × 106 EXPI293F™ cells to check if the sub-complexes were formed recombinantly and optimize the best purification conditions. Next, large scale protein expression and purification experiments were planned considering that 40%-50% of sample was lost during each concentration/desalting and sucrose fractionation steps. We estimated that for XL- MS/MS we would need about ~20-100 mg/mL of purified complex, for DIA-MS about ~0.03-0.1 mg/mL, and for negative-stain single particle electron microscopy (SPEM) experiment about 0.5-1 mg/mL.

67

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Fig 4-3. Schematic representation of the steps used to prepare samples for XL-MS/MS, DIA-MS, and SPEM experiments. (A) Protein complexes were overexpressed in EXPI293F™ cells and affinity purified and concentrated using Amicon Filters. (B) For DIA-MS (stoichiometry determination) and for the negative stained EM (structural analysis) samples were subjected to a second purification step. (C) For XL- MS/MS experiments, protein materials from Step A were directly used because the excess FLAG peptides were eliminated during concentration.

4.2.2 MMH and MMHG complexes were successfully expressed and purified from EXPI293F™ cells

To produce the MMH complex, we used the MBD3cc construct that consists of full- length FLAG-mbd3 fused to the first coiled-coil domain (CR1) of GATAD2A (Fig 4-4). The CR1 region forms a coiled coil with the C-term of MBD3 and makes MBD3 more soluble and stable [132]. Constructs encoding either tagless full-length HDAC1 or HA- tagged N-terminal half of MTA2 (residues 1-429) were cloned into the mammalian expression vector pcDNA3 by Gibson assembly [147]. A tagless version of HDAC1 was used because we had previously established that the presence of a tag can affect HDAC1 deacetylase activity and may also perturb its interaction with other proteins [187].

To produce the MMHG complex, we used full-length constructs for FLAG-MTA2, HA- MBD3 (not MBD3cc), HDAC1 and HA-GATAD2A. In this case, an MBD3 construct without the GATAD2A coiled-coil fusion was used, because GATA2A protein is present and we expected it to interact with MBD3 through the coiled coil.

68

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Fig 4-4. Constructs used to express MMH and MMHG complexes. The MTA2N construct is used because it lacks the C-terminal half responsible for binding to RBBP4/7 and avoids pulling down endogenous RBBP4/7. The MBD3cc construct contains two different coiled-coil (cc) domains. The second cc (GATAD2A CC1) was fused to this protein to make it soluble and folded as shown previously [132]. Full length HDAC1 and GATAD2A were used. Dark colours indicate domains for which structures are known; pale colours indicate domains for which reliable homology models can be built; white indicates regions predicted to be ordered but with unknown structures; and lines indicate regions predicted to be disordered (using PONDR).

4.2.3 High salt washing buffer seems to be ideal to get a pure complex

Trial experiments to optimize transfection and protein purification were performed using the MMH complex. About 2 × 106 EXPI293F™ cells were seeded in a 12-well plate and transfected with a mixture of pCDNA3.1 plasmids encoding FLAG-MBD3cc, HA-MTA2N, and tagless HDAC1. The same molar ratio of the plasmids were used by taking their length into account. Transfected cells were collected after 72 h and immunopurification was performed using 20 µl of anti-FLAG Sepharose 4B beads (hereafter referred as FLAG beads) to capture FLAG-MBD3cc as bait and consequently other subunits as preys. Elutions were run onto SDS-PAGE for examining the purity of the complex (Fig 4-5A). Results indicated that there were considerable amounts of other proteins that co-purified with the complex. To address this issue, we used washing buffers containing different concentrations of NaCl (e.g., 150 mM, 500 mM, and 1000 mM) (Fig 4-5B). Washing buffer containing 500 mM NaCl showed the best

69

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes result as the majority of the contaminant proteins were washed off without disrupting the MMH complex (Fig 4-5B). Despite the stringent washing conditions, we could still observe a band around 66 kDa, which corresponds to a heat shock protein (HSP71, confirmed by mass spectrometry), which co-purified with the complexes. The heat shock protein was removed from the sample by subjecting samples to a second purification step, meaning that the heat shock protein is most likely co-purifying with the complex by nonspecifically attaching to the FLAG beads. A similar protocol was used successfully for the production of MMHG.

Fig 4-5. MMH complex expression and purity check on a Sypro-Ruby stained SDS-PAGE. (A) Sypro-Ruby stained SDS-PAGE showing MMH complex following overexpression in EXPI293F™ cells, FLAG beads were incubated with cell lysate and then washed with 150 mM NaCl, 0.5% IGEPAL, and 50 mM Tris-HCl pH 8. Elution was carried out from FLAG beads with FLAG peptide. (B) Sypro-Ruby stained SDS-PAGE showing MMH complex washed with different NaCl concentrations. HSP71 is a heat shock protein which is produced in cells in response to a rise in temperature. M refers to Mark2 protein ladder, E1-E3 refers to elution 1 to 3, and LW refers to last wash. The identification of the proteins was achieved with mass spectrometry.

4.2.4 Large scale overexpression and purification of MMH and MMHG complexes

70

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

After optimizing the expression and purification of MMH, large scale experiments were set up to purify sufficient sample for SPEM, XL-MS/MS, and DIA-MS studies. For both MMH and MMHG subcomplexes, we transfected 2 × ~250 mL EXPI293F™ cells (2 × 106 cells/mL) as indicated in Table 4-1. Protein expression was confirmed by western blotting using FLAG, HA and HDAC1 specific antibodies (Fig 4-6A). If protein expression was not visible in the western blot, a small-scale pulldown was performed using FLAG beads to enrich for the complex and confirm expression. For purification of the whole samples, 1.25 mL of FLAG beads per 250 mL transfected cells were used to capture and purify the complexes. The complexes were eluted from the FLAG beads using 5 mL 3×FLAG peptide and, upon elution, the purified complexes were further concentrated to 0.5 mL using 4 mL Amicon Centrifugal Filters in a swinging bucket centrifuge at a low speed (500 g at 4 °C) (Fig 4-6B).

Table 4-1: Amount of plasmid used to transfect 100 mL EXPI293F™ cells

Plasmid Quantity (µg) Complex FLAG-MBD3cc 27 HA-MTA2N 61 MMH HDAC1 68 HA-MBD3 42 FLAG-MTA2 50 MMHG HDAC1 48 HA-GATAD2A 55

71

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Fig 4-6. Expression and purification of MMH and MMHG complexes. (A) A western blotted SDS-PAGE (using HRP-conjugated anti-FLAG, anti-HA, and anti- HDAC1 antibodies) probing the expression of overexpressed subunits in two experimental replicates (Rep1 and Rep2). The protein marker used in western blotting was WesternC standard. (B) Sypro-Ruby stained SDS-PAGE showing FLAG-affinity purified complexes. Endogenous CHD4 was co-purified with MMHG complex, as it was previously established that CHD4 interacts with GATAD2 [162]. Both MTA2 and GATAD2A are ~70 kDa so the corresponding bands can overlap with each other in the gel of the MMHG complex. Our MS data confirmed the presence of both proteins in the complex.

The western blot experiment (Fig 4-6A) represents proteins with various intensities which is due to the difference in antibody quality and also protein expression level. Figure 4-6B represents the FLAG immunopurification results clearly showing the copurification of CHD4 with the MMHG complex. The CHD4 protein is not copurified with MMH complex where there is no GATAD2 protein available.

Complex samples purified by FLAG-pulldown still contained a few contaminants and a second purification step was deemed necessary. The purpose of running a sucrose density gradient (called here a normal sucrose gradient, NSG) is to remove free subunits, handle protein (or bait protein is used to capture other proteins in a crude

72

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes cell lysate), and excess 3× FLAG peptides, which can lead to a bias toward stoichiometry determination or LC column blockage when running onto the LC-MS/MS. GraFix (Gradient Fixation) sucrose density gradient is another purification method commonly used to stabilize macromolecular complexes for SPEM analysis [222]. As explained in detail in chapter 2 glutaraldehyde is added to gradient during formation, resulting in a gradient of glutaraldehyde along with sucrose. In this method, macromolecules undergo a mild, intramolecular chemical cross-linking while being purified by density gradient ultracentrifugation. GraFix aims to stabilize macromolecular complexes for SPEM analysis. FLAG-purified MMH and MMHG complexes were subjected to both normal and GraFix sucrose density gradient purification.

Gradients containing 2-25% and 5-35% sucrose were used to fractionate the MMH and MMHG complexes, respectively. During the preparation of GraFix gradients, 0.5% glutaraldehyde was added to the highest concentration sucrose buffer, in order to produce a gradient concentration of crosslinker throughout the gradient column. Figure 4-7 represents fractionated complexes on SDS-PAGE. Fractions highlighted with a dashed red dashed box (Fig 4-7A and 4-8A) were further processed for stoichiometry analysis. Every 2 or 3 adjacent fractions from GraFix gradients were pooled together to make different sets of samples for SPEM analysis. For the MMH complex, fractions 24, 25, 26 were pooled and labeled as MMH-1 and fractions 27, 28, 29 were pooled as a separate set and labeled MMH-2 (Fig 4-7B). For the MMHG complex, fractions 34 and 35 were combined (MMHG-1) and fractions 36, 37, and 38 were combined (MMHG-2, Fig 4-8B).

73

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Fig 4-7. Fractionation of MMH complex on normal and GraFix sucrose density gradients (2-25%). (A) Sypro-Ruby stained SDS-PAGE showing different fractions of normal sucrose gradient containing the MMH complex from two experimental replicates. Fractions 18 and 20 were selected for stoichiometry analysis. (B) Sypro-Ruby stained SDS-PAGE showing different fractions of MMH-1 from the GraFix sucrose gradient. Highlighted fractions were combined and used for SPEM analysis. The later fractions (31-45) corresponded to higher molecular weight subcomplexes and some unknown complexes. BSA was loaded as a standard for quantification purposes.

74

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Fig 4-8. Fractionation of the MMHG complex on normal and GraFix sucrose density gradients (5-35%). (A) Sypro-Ruby stained SDS-PAGE showing different fractions from a normal sucrose gradient separating MMHG species. Fractions 20-26 contain the majority of recombinantly expressed species. Fractions 22 and 24 were selected for stoichiometry analysis. (B) Sypro-Ruby stained SDS-PAGE showing different fractions of a GraFix sucrose gradient; the selected fractions used for SPEM. BSA was loaded as a standard for quantification purposes.

4.2.5 Proteotypic peptide selection

In targeted proteomics, acquisition of a high-quality data is dependent upon the selection of a set of optimal peptides for all proteins of interest. Selecting a peptide that can provide the greatest signal-to-noise ratio in a particular assay matrix is important. The Skyline software package makes it easy to build a method that can

75

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes measure a broad range of peptides in search of the best peptide in a complex mixture. Results from this initial measurements and initial shotgun proteomics experiments can then be used to choose the best peptides. Determination of the best peptides for NuRD complex subunits for quantification of the NuRD subunits and the library setup, was carried out previously by Dr Jason Low in the Mackay laboratory. The selection of reliable and representative proteotypic peptides (PTPs) involved the following criteria: (i) reproducible and reliable detection of the peptide across different MS runs and biological replicates; (ii) absence of missed cleavages; (iii) absence of known post- translational modifications (PTMs) (ascertained by querying protein databases such as PhosphoELM and UniProt and large scale proteomic datasets that had not been incorporated into UniProt at the time of experimental design) or potential predicted PTM sites; (iv) hydrophobicity scores (very hydrophobic peptides could present solubility issues and have aberrant chromatographic behavior); (v) avoidance where possible of residues that can be difficult to incorporate into synthetic peptides (C, F, I, L, V, W, Y); (vi) avoidance of extremely long or short peptides (9-10 residues is ideal); (vii) where possible, avoidance of the following residues: methionine- unpredictable oxidation; cysteines - unpredictable oxidation but can be mitigated by reduction and alkylation of the samples; prolines - unusual elution profile due to isomerization and predominant CID fragmentation pattern; N-terminal glutamines and glutamates - can undergo spontaneous modification to pyroglutamate; (viii) of less importance and where possible, to selection of peptides with precursor ions <1000 m/z (this is due to the instrument used for data acquisition). Using these criteria, Dr Low selected 2-3 PTPs that were unique for each target protein and, where paralogues are very similar (e.g., HDAC1/2 and RBBP4/7), an additional 1-3 ‘pseudo PTPs’ shared between the paralogues were also chosen.

4.2.6 Stoichiometry analysis

The concentration of the purified complexes within each normal sucrose fraction was calculated by Image J software based on the BSA standard. Then, 10 ng and 15 ng of the MMH and MMHG complexes in combination with known quantities of the 13C/15N- labelled internal standard peptides (up to three per protein) were digested and desalted using stage tips, and then subjected to DIA-MS analysis. The Q-Exactive

76

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes mass spectrometer was operated in a data-independent acquisition (DIA) mode. Raw data were collected and quality checked and analysed using Skyline software version 4.1.0.11796. Firstly, peptides with an isotope dot product (idotp) value less than 0.9 were set to be omitted from the dataset. The idotp value is extracted from the first MS stage (MS1) and compares the theoretical mass spectra of a precursor peptide and naturally abundant isotopes (M, M+1, M+2) to the experimental mass spectra calculated by Skyline. Thus, the idotp values are a measure of confidence that a detected peak corresponds to the peptide of interest. Secondly, the peak area for each light peptide was refined by removing any extra shoulder in selected peak areas. Finally, transitions with different retention time compared to the heavy peptides were not considered and were filtered out during data processing. We manually assessed the quality of the spectra arising from all peptides. As expected, all the transitions showed very high quality peaks. The elution profile of the precursor and product transitions between light and heavy peptides matched closely. Figure 4-9 shows an example of all transitions and retention times derived from light and heavy versions of the VWDPDNPLTDR peptide (from MTA2) at the level of MS1 and MS2.

77

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Fig 4-9. Quality check of the peaks area at MS1 and MS2 levels. Graphs of ion counts for MS1 and MS2 spectra. The VWDPDNPLTDR peptide from MTA2 protein is used as an example. Both (A) and (B) indicate light and heavy precursor peptides peaks from light and heavy labeled peptides, respectively. Skyline calculates the peak area for both light and heavy peptides and gives a ratio, which is used to quantify the light peptide. (C) and (D) Peaks are shown for fragment ions (b ions) derived from the precursor of light and heavy labeled peptides.

4.2.6.1 The MMH complex showed an unexpected 2:2:2 stoichiometry

Our ability to purify a stable complex comprising MTA2N, HDAC1 and MBD3cc demonstrates that MBD3 makes a direct interaction with the MTA-HDAC1 core complex, consistent with our published finding that MBD3 can interact with MTA1 [187]. Given emerging data from our own lab on intact NuRD and with consideration of all data from the literature, our expectation was that one molecule of MBD3 is present in the intact NuRD complex – with MBD, GATAD2 and CHD being present in

78

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes the complex in one copy and the 2:2 HDAC:MTA complex binding four molecules of RBBP.

To analyse the data, we used the MTA subunit as our reference point to normalize the processed data (MTA should be two units per NuRD assembly based on the X-ray crystal structure of the 2:2 MTA-HDAC complex [204]). Unexpectedly, we saw a 2:2.6:2.2 ratio for MTA2-HDAC1-MBD3 for two biological replicates (each comprising three technical replicates), which suggested the presence of two molecules of MBD3 in MMH complex (Fig 4-10). As it is depicted in the Figure 4-1, and XL-MS/MS data from our lab and other groups it is possible that the BAH domains in the MTA protein can provide a binding surface for the binding of the MBD proteins. We speculated that MBD3 is possibly occupying both binding sites on MTA dimer in the absence of other NuRD components and leading to a symmetric conformation.

Fig 4-10. Stoichiometry of the MMH complex. (A) Each bar represents the number of subunits of each protein, relative to MTA2 being set to 2. Two biological replicates are shown (Rep1 and Rep2). Each bar represents the mean of three technical replicates. (B) In addition to DIA-MS data, we also used MMH pull down assay data to model the MMH complex.

79

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

4.2.6.2 Co-immunopurification followed by western blotting confirmed two copies of MBD in MMH complex.

To biochemically corroborate the unexpected 2:2:2 stoichiometry, combinations of Tagless-HDAC1, HA-MTA2N and FLAG-MBD3cc – plus or minus Tagless-MBD3 – were co-expressed in EXPI293F™ cells. An anti-FLAG pulldown followed by a western blot (Fig 4-11) demonstrated that the Tagless-MBD3 (which runs separately to FLAG- MBD3cc) co-purifies with FLAG-MBD3cc, tagless-HDAC1 and HA-MTA2N, consistent with the idea that two copies of MBD3 exist in the MMH complex.

Fig 4-11. The MMH complex contains more than one MBD protein. Western blotted SDS-PAGE gels (using HRP-conjugated anti-FLAG, anti-HA, anti-MBD3, and anti-HDAC1 antibodies) probing the interaction of MBD3 with MMH components, following co-expression of the indicated proteins in EXPI293F™ cells and FLAG immunoprecipitation (IP). Samples are 1% of input samples or 10% of bead-retained proteins following FLAG IP. (A) Western blot in which an MBD3-specific antibody was used to detect the second MBD3. The tagless MBD3 was co-purified with the complex, consistent with the idea that FLAG-MBD3 can bind to one copy of MTA2 in the complex and Tagless-MBD3 can bind to the other. (B) Same gel as (A) but probed with anti- FLAG, anti-HA, and anti-HDAC1 antibodies.

80

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

4.2.6.3 GATAD2 subunit may prevent binding of a second MBD subunit in the MMHG complex.

In parallel work, our laboratory had demonstrated that the ratio of the MBD3:MTA:HDAC in the NuDe complex (a NuRD complex that lacks CHD4 but contains GATAD2) is 1:2:2, we therefore hypothesized that it is the presence of the GATAD2 subunit that inhibits binding of a second MBD subunit in the NuDe and NuRD complexes. The stoichiometry of the MMHG complex was determined by co-expressing FLAG-MTA2, HDAC1, HA-MBD3 and HA-GATAD2A. The MMHG complex was purified using a full length FLAG-MTA2 protein. DIA-MS analysis of two biological replicates gave rise to complexes with subunit ratios of 2:2.5:1.3:0.8 for MTA:HDAC:MBD3:GATAD2 (Fig 4-12).

Fig 4-12. Stoichiometry of the MMHG complex. (A) Each bar represents the number of subunits of each protein, relative to MTA2 being set to 2. Two biological replicates are shown (Rep1 and Rep2). Each bar represents the mean of three technical replicates. (B) Schematic model for MMHG based the DIA-MS data.

81

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

4.2.7 Crosslinking mass-spectrometry data analysis

We performed XL-MS/MS experiment in order to investigate the interaction map of the MMH complex and enhance our understanding of the domains that are in close proximity. For XL-MS/MS analysis, the BS3 crosslinker was used to crosslink both the MMH and MMHG complexes (Fig 4-13). Briefly, the MMH complex were overexpressed and purified from EXPI293 cells, then ~ 40 µg of FLAG purified complex was mixed with BS3 for 30 min and then the crosslinkers were quenched (see materials and methods). Of note, MMH crosslinking with DSS failed (data not shown here) so we switched to BS3 and used only a 30-min incubation, as we did before for the MR complex (161). Because the 30-min incubation yielded a significant number of crosslinked peptides, we did not do any further optimization. Next, cross-linked complexes were digested with trypsin and Lys-C proteases, and the resulting mixtures run on size exclusion chromatography (SEC), in order to enrich for larger peptides, which are more likely to be the crosslinked ones that are being sought. The crosslinked peptides were then analysed using a Q-Exactive LC-MS/MS instrument.

Fig 4-13. BS3 cross-linked complexes. (A) Sypro-Ruby stained SDS-PAGE showing a band over 200 kDa after crosslinking the MMH complex. (B) Sypro-Ruby stained SDS-PAGE showing a band over 200 kDa after crosslinking the MMHG complex. Five percent of the crosslinked materials was run onto the SDS-PAGE. Lane M is the Mark12 ladder. Red lines schematically indicate the BS3 crosslinker that crosslinks the subunits.

82

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

The pLink software package [223] was used for crosslinking mass spectrometry data analysis. Identification of spectra and peptide pairs were performed under conditions of 5% FDR control. XL-MS/MS data provide information about cross-linked, mono- linked, loop-linked, and regular peptides. Monolink or deadend peptides are generated from XLs that have only one functional group because the other group is hydrolyzed during sample processing. However, looplink peptides form a reaction in which both functional groups of the cross-linker react with lysine residues that exist in the same peptide after enzymatic digestion. Inter XLs connect peptides from different proteins, providing information about subunit proximity in a complex whereas intra XLs connect lysine residues on two different peptides from the same protein.

XL-MS/MS analysis showed the same data for both MMH and MMHG complexes and henceforth we only discuss the MMH XL-MS/MS data. Table 2 shows the XL-MS/MS data for MMH complex, in the format of spectra and peptides. It indicates that only 25% of the identified peptides were BS3 cross-linked peptides (inter- and intra- XLs), while the majority (46%) were regular peptides. About ~20% and ~7% of the peptides were mono-linked and loop-linked peptides. The loop-linked and mono-linked cannot provide any distance information between proteins of the complex, but can yield important information regarding which lysines are exposed and available for crosslinking. One reason for dominance of normal peptides is that, even with enrichment strategies such as size exclusion chromatography, the XL peptides are of relatively low abundance. To increase the percentage of crosslinked peptides, different enrichment methods (e.g., cation exchange) as well as the use of acidic crosslinkers (e.g., ADH) are recommended. After sorting the crosslinked peptides (scores ≤ 1 × 10-4), we ended up with 360 intra- and 79 inter-XL peptides. Redundant XLs linking residues from endogenous proteins (e.g., HDAC2, MBD2, and RBBP4, which were pulled down along with recombinant complex from EXPI293 cells) with the same crosslinked residues as recombinantly expressed proteins were eliminated from the data set. For example, QSQIQKEATAQK (90)-GKPDLNTALPVR (172) can be assigned between Lys90 in recombinantly expressed MBD3_Mouse and lysine (172) in P66A_Human or lysine90 in endogenously expressed MBD2_Human and lysine (172) in P66A_Human. After these filtering steps, a dataset of unique XLs comprising 22 inter- XLs and 90 intra XLs was obtained (Appendix E and F).

83

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Table 2. Percentage of XL types Types Cross-linked Spectra Loop-linked Spectra Mono-linked Spectra Regular Spectra Percent 12.9% 4.8% 13.9% 68.4% Types Cross-linked peptides Loop-linked peptides Mono-linked Peptides Regular peptides Percent 25.9% 7.3% 20.2% 46.5% Next, to check the overall distribution of the XLs in the complex, the set of inter- and intra-XLs were visualized using the xiNET tool (Fig 4-14). As can be seen in Figure 4-14 the pattern of XLs indicates that the MBD domain of MBD3 lies adjacent to the MTA BAH, ELM and SANT domains and that GATAD2 shares XLs only with MBD3. The data also indicate that the N-terminal portion of HDAC1 contacts the N-terminal portion of MTA2. The number of detected inta-XLs was significantly higher in MBD3 than the other subunits.

Fig 4-14. Node-link diagram showing inter and intra XLs. Inter XLs obtained for the MMH complex are shown with black lines. MBD3cc is a stabilized form of MBD3(1-291), which is fused with the interacting coiled-coil region of GATAD2A (residues 133–174).

The mass spectra corresponding to each XL were individually validated using the pLabel tool [223] to ensure that peptide sequences of the XLs match the corresponding spectra. Crosslinks with at least four fragment ions on both the alpha (α) and beta (β) chain were retained for modelling (in XL-MS/MS studies, the longer

84

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes chain in crosslinked peptide is annotated as α chain and the shorter one is named β chain) [224]. It is notable that the number of XLs observed for each subunit is very different, which might be due to the secondary structure surrounding potential XL sites [173] or the number of crosslinkable residues. Figure 4-15 shows a fragmentation spectrum for a representative XL identified in this study.

Fig 4-15. Fragmentation spectrum of a representative crosslinked peptide identified in this study. Alpha peptide y- and b-ions are labelled in green and pink, respectively, while the beta peptide y- and b-ions are labelled in red and blue, respectively. The X axis is mass to charge ratio (m/z) and Y axis represents relative intensity.

4.2.8 Modeling the MMH complex with PyMOL

Chemical cross-linking is a complementary technique that enable structural modeling of large protein complexes by providing evidence to establish structural topologies of large protein complexes. We used XL-MS/MS for determination of MMH components interaction interfaces. To this aim, unique inter and intra XLs were selected were mapped onto the available crystal structures using the PyMOL software package (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.). The reference structures were obtained from the Protein Data Bank (PDB) and used for MMH

85

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes modeling. Several crystal structures are available for some fragments of the NuRD subunits. The structures are as following: the crystal structure of the MTA1ELM-SANT - HDAC1 dimer (PDB:4BKX), MTA1-RBBP4 (PDB:5FXY), MTA1-RBBP4 (PDB:4PBY), MBD domain of MBD3 (PDB:2MB7), MBD-GATAD2 coiled coil (PDB:2L2L),. Crosslink distances were assessed on the basis of adhering to the to BS3 ≤ 30 Å. Nine out of 22 inter XLs lie within sequences for which there are available structures, therefore, these were used in the modeling (Table 4).

Table 4. Inter XLs used to model structures in the MMH complex. No Type Protein 1 Protein2 Residue 1 Residue 2 Status Linker Distance ID 1 Inter MBD3 GATAD2A 68 149 Satisfied BS3 11 B1 2 Inter MBD3 GATAD2A 46 146 Satisfied BS3 16 B2 3 Inter GATAD2A MBD3 149 46 Satisfied BS3 22 B3 4 Inter MTA2 MBD 32 46 Satisfied BS3 24 B4 5 Inter HDAC1 MBD3 74 46 Satisfied BS3 28 B5 6 Inter MBD3 GATAD2A 46 172 Satisfied BS3 29 B6 7 Inter MTA2 MBD3 41 46 Satisfied BS3 11 B7 8 Inter HDAC1 MBD3 89 46 Satisfied BS3 25 B8 9 Inter HDAC1 MTA2 89 32 Satisfied BS3 23 B9

Using the measurement function in PyMOL we were able to assess the XLs that satisfy or violate the distance restraints for both inter and intra XLs. Of 9 usable XLs all could satisfy the expected distance threshold (Fig 4-16). One XL is shown that can be mapped onto the GATAD2-HDAC (B1), three XLs onto the MBD-GATAD2 (B2, B3, and B6), two XLs onto the MTA-MBD (B4 and B7), two onto the MBD-HDAC (B5 and B8), one onto MTA-HDAC (B9).

86

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes

Fig 4-16. Model of the MMH complex using our crosslinking data and available structures of the of NuRD subcomplexes. Observed XLs that can be mapped with existing structural data. The structures used in our modeling were

MTA1ELM-SANT-HDAC1 (PDB:4BKX), MBD domain of MBD3 (PDB:2MB7), MBD-GATAD2 coiled-coil (PDB:2L2L), and MTA1 BAH domain (was based on the structure of the Sir3 BAH domain (PDB:2FVU). One XL is shown that can be mapped onto the GATAD2- HDAC (B1), three XLs onto the MBD-GATAD2 (B2, B3, and B6), two XLs onto the MTA- MBD (B4 and B7), two onto the MBD-HDAC (B5 and B8), one onto MTA-HDAC (B9).

4.2.9 Modeling of the MMH complex using SPEM and XL-MS/MS data

We next used SPEM and XL-MS/MS data to extend our understanding of how the MMH subunits are spatially arranged in the complex. The MMH complex model was generated by fitting the available crystallographic structures into our EM envelope (generated by Dr. Ana Silva in the Mackay laboratory) using the “fit in map” function of the Chimera software [225]. Crystal structures of MTA-HDAC complex together with two copies each of the MBD of MBD3 and the MBD3-GATAD2B coiled-coil (which is part of the MBD3cc construct) were fitted into the EM map. Figure 4-17 shows the

87

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes model that best fitted the XL-MS/MS and EM data. The two copies of MBD3cc are arranged symmetrically – each one positioned adjacent to each of the two MTA BAH domains. XLs analysis in Chimera was performed using the Xlink Analyzer, a plugin package in Chimera [153, 225]. XL distances threshold was also set to 30 Å. Of the nine inter XLs only one XL [HDAC (K74)-MBD3 (K46)] could not be satisfied in the model and was considered as violating XLs and is shown in red. The rest were satisfied (Fig 4-17).

Fig 4-17. Structural analysis of the MMH complex using SPEM and XLs data. Model of the MMH complex showing all inter and intra XLs satisfying or violating in the model. Unsatisfied XLs are shown in red; satisfied XLs are shown in blue. Final 3D envelope for MMH refined to 29 Å resolution. Subunit colours are as in Figure 4-1.

The structures used in our modeling were MTA1ELM-SANT-HDAC1 (PDB:4BKX), MBD domain of MBD3 (PDB:2MB7), MBD-GATAD2 coiled-coil (PDB:2L2L), and MTA1 BAH domain (was based on the structure of the Sir3 BAH domain (PDB:2FVU). SPEM data were analysed by Dr. Ana Silva.

4.3 Discussion

In Chapter 3, we demonstrated that only one MBD3 exists in the NuRD complex. In addition, DIA-MS based absolute quantification of the NuRD complex, carried out concurrently in the Mackay lab, also demonstrated the presence of one copy of MBD3

88

Chapter 4: Understanding the assembly and stoichiometry of MMH and MMHG complexes in the NuRD complex (unpublished data). These findings are consistent with previous work, which reported one MBD3 per NuRD complex [181]. In contrast, three iBAQ studies conducted by others determined different ratios for MBD3. One determined 1:3 for MBD:MTA [169], and the other two determined a 2:2 ratio [57, 168]. These latter studies used MBD3 as a handle for NuRD purification and therefore it is likely that it leads to overestimation of the stoichiometry of MBD3. Taken together, the stoichiometry of the MBD3 has been relatively unclear and in this Chapter I try to address this ambiguity.

Where is the binding surface of MBD3 in the NuRD complex? Our crosslinking mass spectrometry and previous studies suggested that MBD3 make an interaction with the NuRD complex via the N-terminal part of the MTA1 protein which contains the BAH domain in the MTA1. Since there are two BAH domains in the MTA-HDAC dimer-of- dimers, it is likely that in the MMH complex, where other subunits are absent, MBD3 can have access to the second BAH domain, and leading to two copies of MBD3 in the MMH complex. This raises the possibility that introduction of other subunits blocks the second binding site.

I tested this hypothesis by adding the GATAD2A subunit into the MMH complex, creating MMHG. Gnanapragasam and colleagues have shown that MBD2 and GATAD2A make a heterodimeric coiled-coil [132]. Results indicate that introduction of GATAD2A into the MMH complex not only leads to the formation of a stable complex but also brings endogenous CHD4 (present in Expi293F™ cells) into the complex. We demonstrated that addition of GATAD2A and CHD4 to the complex directly competes away the binding of a second copy of MBD from the MTA-HDAC core, changing the stoichiometry to one copy and consequently introducing asymmetry into the complex.

The data also suggest that contacts between the MTA BAH domain and the MBD3 MBD domain are a key part of the way in which the histone deacetylase and DNA translocase modules of the NuRD complex come together.

89

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

Chapter 5. The MTA1 subunit can recruit 2 copies of RBBP4

5.1 Introduction

MTA1-3 and RBBP4/7 are constitutive subunits of the NuRD complex and most likely play key roles in the interaction between NuRD and the nucleosome (Fig 5-1) [138, 139, 171]. RBBP4/7 forms a 7-bladed β-propeller structure (Fig 5-2A,D) [226]. The MTA family of proteins contain several distinct domains. The N-terminal region contains four domains (all residues in parentheses refer to human MTA1): (i) BAH(1- 164); (ii) ELM2 (165-283); (iii) SANT(284-335); and (iv) ZnF(387-417) (Fig 5-2B). Previous work, and as described in the preceding Chapters, has shown that the ELM2 and SANT domains work together to interact with HDAC1/2 proteins to form the core of the NuRD complex [112]. However, elements of the C-terminal half of the MTA protein appears to be disordered and secondary structure predictions suggest several helical segments but no identifiable domains. Homology model predictions suggest that these disordered regions can undergo a significant conformational change upon binding RBBP4/7 [163]. Further, the Mackay lab has shown that MTA1(656-686) can interact with RBBP4 through a short motif (678-KRAARR-683 also known as R2) and this interaction is essential for the integration of RBBP4 into the NuRD complex (Fig 5-2B) [139].

However, as shown in Chapter 3, the stoichiometry of the NuRD complex suggests that per NuRD complex, there are two MTA subunits and at least 4 RBBPs subunits; suggesting that there must be additional RBBP binding sites within the NuRD complex. This Chapter explores this hypothesis. Using a range of biochemical and biophysical techniques, we demonstrated that MTA1 (and by extension, MTA2 as well) has a 2nd RBBP binding site, and we determined the stoichiometry of MTA:RBBP subunits using DIA-MS. We also carried out SPEM analysis on recombinantly produced MR complexes and used XL-MS/MS to model the MTA-RBBP (MR) complex. Data presented in this Chapter can build on our understanding of NuRD complex structure and help us understand the biochemical basis for the activity of this complex.

90

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

Fig 5-1. Schematic representation of the MR subcomplex within the NuRD complex. The MR complex is shown with bright colours, while the other subunits are faded out. The question mark indicates that we still do not know exactly how many molecules of RBBP4 exist in the NuRD complex. Work in previous Chapters and in the literature have estimated the stoichiometry of RBBP to be ~2–4 per MTA.

Fig 5-2. Schematics and three-dimensional structures of RBBP4 and MTA1. Labelled boxes indicate domains for which structures are known and lines indicate regions predicted to be disordered. (A) RBBP4 contains seven WD40 repeats, which

91

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

fold into a seven-bladed β-propeller fold. (B) Schematic of MTA1; the N-terminal part that was not used in this study is faded out, whereas the C-terminal part of the MTA1 used in this study contains the R1 and R2 regions, which are shown as colored boxes.

(C) The MTA1Cmut protein, in which the R2 motif is mutated to AAAAAA. (D) Three- dimensional structure of MTA1R2-RBBP4 (675-686; PDB 4PBY).

5.2 Results

5.2.1 Multiple alignment of a portion of MTA1 protein across eukaryotes shows two possible binding sites.

To identify additional RBBP binding sites within MTA1, we performed both sequence alignment and secondary structure prediction analysis using the CLC sequence viewer (CLC Bio, a QIAGEN Company, Aarhus, Denmark) alignment and PORTER [227] software tools, respectively. Figure 5-3 shows the sequence alignment between several diverse species (e.g., human, mouse, camel, bees, and fly). These bioinformatics data predict a region stretching ~60 amino acids (residues 483–538) may form four α-helices (labelled 1 to 4). Among these, α-helix 2 resembles a sequence that has been previously shown to mediate an interaction with RBBPs (α- helix 5) [139]. Given that the α-helix 2 contains the highly conserved AAR motif, which is followed by a PYxPI loop (highlighted with a yellow box), it seems likely that this helix could potentially bind a second molecule of RBBP4/7.

92

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

Fig 5-3. Multiple sequence alignment of MTA1 (483-715) from a range of eukaryotes. Rectangles indicate R1 and R2 regions in sequence of MTA1 orthologues from Homo sapiens (Q13330), Mus musculus (Q8K4B0), Struthio camelus australis (A0A093HEK0), Habropoda laboriosa (A0A0L7R1K5), and Drosophila melanogaster (Q9VNF6). The yellow highlighted motif seems to be conserved right after α-helices 2 and 5. Black rectangle indicates predicted α-helices (α- helices 1-4), and the red rectangle indicates an α-helix (α-helix 5) for which we have crystal structure.

5.2.2 Biochemical analysis to confirm sequence alignment results

To explore the possibility of this second RBBP binding site, first trial co-overexpressions and purification of FLAG-tagged human MTA1C (residues 449–715), human HA-tagged full-length RBBP4 and a mutant version of MTA1C (MTA1Cmut; residues 678-KRAARR- 683 mutated to AAAAAA; were performed. All the co-expressions of MTA1C with

RBBP4 (MR), and MTA1Cmut and RBBP (MmutR) were performed in EXPI293F™ cells (see Methods for details).

After setting up the expression and purification using pulldown assays Dr. Mario Torrado in the Mackay lab investigated whether the C-terminal half of MTA1 can indeed bind to more than one RBBP as suggested by the sequence alignment results.

Combinations of FLAG-RBBP4, HA-RBBP4, and tagless-MTA1C and tagless-MTA1Cmut

93

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

were co-expressed in EXPI293F™ cells. Pulldowns were performed using anti-FLAG Sepharose beads to capture RBBP4 as the handle/bait protein. Western blot analysis (Fig 5-4A lane 4, and B) showed that HA-RBBP4 was consistently pulled down when tagless-MTA1C was coexpressed, indicating that this portion of MTA1 is able to bind two RBBP4 molecules simultaneously. However, pull downs using MTA1Cmut instead of MTA1C, abolished the ability of MTA1 to bind the second RBBP4 (Fig 5-4A lane 5, and C). These pulldowns strongly support the hypothesis that there is a second RBBP binding site in the MTA1 sequence.

Fig 5-4. The C-terminal half of MTA1 can bind two molecules of RBBP4. (A) Pulldown analysis showing that MTA1C is able to bind two molecules of RBBP4. Cleared cell lysates (Input) of EXPI293F™ cells expressing the proteins indicated on top, as well as anti-FLAG immunoprecipitates (anti-FLAG pulldown), were analysed by Western Blot using anti-FLAG, anti-HA or anti-MTA1 antibodies to detect the proteins indicated on the left. (B) and (C) Schematic representation of the pulldowns and western blot results for MTA1C-RBBP4 and MTA1Cmut-RBBP4 complexes, respectively. This experiment was performed by Dr. Mario Torrado in the Mackay lab.

5.2.3 Large scale expression and purification of MTA:RBBP subcomplexes

94

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

Knowing that MTA1C is able to bind to two molecules of RBBP4, I sought to perform stoichiometry and structural studies. Prior to a large scale pulldown experiment that needs a lot of beads we ran an expression check experiment at small scale to ensure the expression of the subunits. About 5 × 105 transfected cells were lysed and examined by immunoblotting using HA and FLAG specific antibodies (Fig 5-5 A). Following a FLAG pulldown, a fairly pure complex with little endogenous NuRD components could be obtained (Fig 5-5 B). For DIA-MS and XL-MS/MS experiments, ~2.5 × 108 cells were transfected with either FLAG-MTA1C and HA-RBBP4 or FLAG-

MTA1Cmut and HA-RBBP4. For negative-stain SPEM analysis, recombinant complexes were produced using 1 x 109 transfected cells; in both cases, cells were collected after 72 h. All experiments, at a minimum, were done in duplicate.

Fig 5-5. Expression and purification of MR and MmutR complexes. (A) Immunoblotting of EXPI293F™ cell lysates from cells co-transfected with FLAG-MTA1C and HA-RBBP4 or FLAG-MTA1Cmut and HA-RBBP4. (B) SYPRO Ruby stained gel showing MR and MmutR complexes purified from transfected cells. FLAG-MTA1C or

FLAG-MTA1Cmut immobilized on beads were used to pull down HA-RBBP4. HSP71 copurifies with the RBBP and MTA proteins. A heat shock protein (~ 70 kDa), which was identified by LC-MS/MS, is commonly seen in other NuRD subcomplex purifications.

As mentioned in previous Chapters, to eliminate excess ‘handle’ protein and improve protein crosslinking efficiency, and thereby allow for more accurate stoichiometry determination, sucrose density gradient ultracentrifugation was utilized as a second

95

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

purification step. For SPEM analysis, the GraFix (Gradient Fixation) sucrose density gradient ultracentrifuge protocol was used. GraFix aims to stabilize macromolecular complexes for SPEM analysis [222] by introducing low concentrations of a covalent crosslinker such as glutaraldehyde to the sucrose gradient. For each sucrose density gradient, ~400 µg of purified complex was used. Figures 5-6 and 5-7 show the resulting sucrose gradient fractionations. For the normal sucrose gradients, broad elution profiles were observed. Comparing the MR and MmutR subcomplexes, the MR subcomplex appears to elute slightly later than the MmutR subcomplex; the majority of the MR complex elutes at fractions 14-16 (Fig 5-6A), whereas the majority of the

MmutR complex elutes at fractions 12-14 (Fig 5-6B). In contrast to the normal sucrose gradients, in GraFix sucrose gradients, the elution peak was very narrow and the MR subcomplex eluted as a single peak at ~120 kDa. After fraction 18 in normal gradients and after 16 in Grafix some bands are detected above 70 kDa (Fig 5-7), especially in

MmutR as it seems more concentrated; these are likely to be endogenous NuRD subunits copurifying with the recombinantly expressed complexes. Note that we did not run MmutR Grafix because we sought to do SPEM only on the MR complex.

96

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

Fig 5-6. SDS-PAGE analysis of sucrose density gradient (2-25%), showing different fractions of MR and MmutR complexes purified from two different experimental replicates. (A) After FLAG purification, complexes were subjected to sucrose density ultracentrifugation and fractionated into 200 µL fractions. Then 10% of each fraction was run onto SDS-PAGE and stained with SYPRO-Ruby stain. Fractions boxed in red with dotted lines were taken for DIA-MS experiments. (B) SDS-

PAGE showing exactly the same fractions for MmutR complex. BSA with known concentration was used for quantification of the complex using Image J software. Fractions boxed in red with dotted lines were taken for DIA-MS experiments.

Fig 5-7. Grafix fractions of MR complex on SDS-PAGE. SYPRO-Ruby stained SDS-PAGE showing selected fractions of a 2-25% sucrose plus 1% glutaraldehyde GraFix gradient. In comparison to non-GraFix samples, the elution peak was narrow and only one predominant species is detected at around 120 kDa.

5.2.4 Stoichiometry analysis

With purified complexes in hand, we used absolute protein quantification mass spectrometry (DIA-MS) to corroborate our conclusion that the C-terminal half of MTA1 contains two RBBP binding sites. For both the MR and MmutR complexes, we chose two fractions, 10 and 14, from the normal sucrose gradients. 5–10 ng of each experimental replicate was mixed with 100 fmol of 13C,15N-labeled peptide standards, digested

97

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

together using trypsin enzyme and cleaned up using C18 tips. Then we ran the peptides onto Q-Exactive LC-MS/MS (see Methods for details).

The resulting data were processed in the Skyline (version 4.1.0.18169) software package as discussed in the previous Chapter. Light-to-heavy peptide ratios for at least 2 peptides from each protein were calculated. For data normalization we used the MTA1 subunit as our reference point, setting its stoichiometry to be two units per NuRD assembly based on the X-ray crystal structure of the 2:2 MTA1-HDAC1 complex.

Our stoichiometry data analyses revealed a 1:1 ratio for both fraction 10 and fraction

14 of the MmutR complex (Fig 5-8A,B). In contrast, for the MR complex different stoichiometries were determined depending on the fraction being analysed. For example, for fraction 10 we obtained an unexpected 1:1 molar ratio of MTA1-RBBP4 (Fig 5-8A,C), whereas a 1:2 ratio was obtained for the MR complex collected from fraction 14, suggesting that fraction 14 is the best representative of the complex as it contains more of the recombinantly expressed complex compared to fraction 10 (Fig 5-9A,B). Of note, as the sucrose gradient separates protein complexes based on their sizes, thus it is reasonable to propose that the presence of the MR complex in later fractions (fraction 14), is suggesting the presence of 2 molecules of RBBP4 and one molecule MTA1.

Overall, these data suggest that MTA1 contains two RBBP binding sites in its C- terminal region. The observation of two different stoichiometries in different factions of the wild-type complex suggests that the binding of RBBP4-MTA1 interaction is relatively labile. Consistent with this idea, previous work described RBBP4 and RBBP7 as the most dynamic subunits of the NuRD complex [168].

98

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

Fig 5-8. Stoichiometry of the MmutR and MR complexes. (A) The bar graph represents the mean of ratios for each experimental replicate (n=2). There are three dots on the bars representing technical replicates (n=3). (A) MmutR complex collected from both fractions 10 and 14 shows 1:1 stoichiometry. Similarly, MR complex collected from fraction 14 shows 1:1 stoichiometry. Whereas, MR complex collected from fraction 14 shows 1:2 stoichiometry. Both technical and experimental replicates showed very low variation. (B) Schematic representation of the MmutR complex. (C) Schematic representation of the MR complex.

5.2.5 Crosslinking mass-spectrometry data analysis

As described in the previous Chapter, XL-MS/MS data are useful data for modeling low-resolution of the SPEM data collected from recombinantly expressed complexes in

99

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

this study. in combination with crystal structures. To this end, XL-MS/MS was performed on the MR complex to provide restraints to aid in interpreting SPEM data recorded for the same complex. Two crosslinkers were used in this study; disuccinimidyl suberate (DSS) and adipic acid dihydrazide (ADH). DSS is a water- soluble analog of BS3 and specifically crosslinks primary amines (i.e., lysine residues and the protein N-termini), which has found widespread use in XL-MS/MS/MS studies [148]. However, as some protein regions are refractory to amine-specific crosslinkers, the use of an orthogonal crosslinker such as ADH can enhance the number of crosslinked residues. ADH crosslinks carboxylic acids (i.e., glutamate and aspartate residues and the protein C-terminus) in the presence of a coupling reagent named 4- (4,6-dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride (DMTMM) [148]. Due to the use of DMTMM in ADH crosslinking, a separate side-reaction arising from DMTMM can also lead to “zero-length” crosslinks to form between carboxylic acids and primary amines; a direct condensation reaction between the primary amine and carboxylic acid. These crosslinks will be referred to as “ZLXL” crosslinks.

We treated the MR subcomplex with either ADH or/and DSS and examined the reaction mixture on SDS-PAGE (Fig 5-9). The predominant crosslinked band is ~120 kDa. This apparent mass is consistent with the MTA-RBBP stoichiometry of 1:2; indicating that the crosslinking reactions were largely successful and did not result in large aggregates. The mixture was treated with trypsin and then, to maximize our chances in identifying crosslinked peptides by MS, size exclusion chromatography (SEC) of the digested peptide mixture was performed to enrich for crosslinked peptides [148]. Several SEC fractions were analyzed on the mass spectrometer and crosslinked peptides were identified using the pLink software [223].

100

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

Fig 5-9. Sypro-stained SDS-PAGE showing crosslinked RBBP4-MTA1C after treatment with DSS and ADH. The major band observed for the crosslinked complex runs at a molecular weight consistent with the formation of a 1:1 (RBBP4:MTA1) complex. The lower bands are most likely degraded materials or un- crosslinked proteins.

Data analysis for the DSS-crosslinked sample showed that only 17% (192) of identified peptides were crosslinked peptides (inter- and intra- XLs), while the majority of peptides (60%) were regular peptides. In addition, 16% and 7% of the peptides were mono-linked or loop-linked peptides, respectively (Table 5-1). On the other hand only ~3% of the analyzed peptides were crosslinked in the ADH dataset, indicating perhaps a lower efficiency for this crosslinker compared with DSS.

Table 5-1. pLink DSS XL-MS/MS Data analysis report Types Cross-linked Spectra Loop-linked Spectra Mono-linked Spectra Regular Spectra Percent 7.2% 3.3% 12% 77.4% Types Cross-linked Loop-linked peptides Mono-linked Peptides Regular peptides peptides Percent 17.2% 7.4% 16.1% 59.5% Crosslinked peptides with scores ≤ 1 × 10-4 were considered as candidates for analysis. These peptides were further filtered by removing duplicate or redundant XLs. Taking data from both crosslinkers into account, 94 unique XLs were sorted for further investigation. As shown in the Table 5-2, of the 94 XLs, 59 were intra- and 35 were

101

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

inter-XLs. Of the 35 inter-XLs, 29 were from the DSS dataset, while 2 and 4 inter-XLs were from the ADH and ZLXL datasets, respectively. Of the 59 intra-XLs, 48 were from the DSS dataset, whereas 7 and 4 were from the ADH and ZLXL datasets, respectively. These data suggest that the DSS crosslinker is more efficient for crosslinking of the MR complex in comparison with the ADH.

Table 5-2. All unique XLs detected using two different crosslinking reagents Type Inter Intra Total DSS 29 48 77 ADH 2 7 9 ZLXL 4 4 8 Total 35 59 94

Next, to check the overall distribution of the XLs in the complex, the refined inter- and intra-XLs were visualized using the xiNET tool (Fig 5-10) [228]. The majority of XLs are located between C-terminus of the MTA1 and the whole sequence of RBBP4. In the case of RBBP4, the majority of XLs are located in either the N-terminal or C- terminal thirds of RBBP4, with few in the middle section. A significant number of intra XLs were mapped into the MTA1 protein, while only a few intra XLs were detected for RBBP4, consistent with previous findings reporting a relative lack of XLs for proteins that are rich in β-sheet [173]

Fig 5-10. Node-link diagram showing inter XLs. Inter XLs are shown with black lines. XL-MS/MS data support our biochemical for the interaction of C-terminal half of MTA1 with RBBP4. Data were visualized using xiNET software.

102

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

Finally, we checked the spectra of XLs to ensure that peptide sequences match the corresponding spectra (see Fig 5-11 for an example). The spectra were manually visualized and verified using the pLabel tool part of Plink2 software package. Crosslinks with at least four fragment ions on both the alpha (α) and beta (β) chain each were retained for modelling (in XL-MS/MS studies, the longer chain in the crosslinked peptide is annotated as the α chain and the shorter one is named the β chain) [224].

Fig 5-11. Fragmentation spectrum of a representative crosslinked peptides identified in this study. Alpha peptide y- and b-ions are labelled in yellow and green, respectively, while the beta peptide y- and b-ions are labelled in russet and cyan, respectively The X axis is mass to charge ratio (m/z) and Y axis represents relative intensity.

5.2.6 Structural evaluation of the crosslinks in Pymol

To analyse the crosslink data further, they were visualized on the available crystal structures – one of RBBP4 bound to a fragment of MTA1 encompassing the R1 region (PDB: 4PBY [139]) and one of RBBP4 bound to an R2 peptide (PDB: 5FXY [171]). In addition, using the measurement function in Pymol we were able to analyse the distance restraints for both inter- and intra XLs and assess whether or not the XLs were consistent with the crystal structures. Crosslink distances were assessed on the basis of adhering to the following approximate distance restraints: ADH ≤ 26 Å, DSS ≤ 30 Å, and ZLXL ≤ 14 Å. Of the 35 inter-XLs and 59 intra-XLs, only 9 and 17 of the

103

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

XLs, respectively, can be mapped to the known structure; the remaining XLs fall outside the regions with known structure. These 9 inter-XLs (7 DSS, 1 ADH, and 1 ZLXL) and 17 intra-XLs (8 DSS, 7 ADH, and 2 ZLXL) were listed as usable XLs (Table 5-3).

In Pymol, of the 26 usable XLs only 3 (D1, D2, D3) did not satisfy the expected distance threshold (Fig 5-12A,B). The violated D2 and D3 XLs were further investigated in our SPEM model of the MTA1:RBBP4 1:2 complex because these XLs may occur between MTA and the other RBBP4 molecule in the complex (the second molecule is not available in one single PDB file) (Fig 5-12A).

Table 5-3. Usable Inter- and Intra XLs (9 inter XLs and 17 intra XLs) related to the MR complex.

No Type Protein 1 Protein2 Residue 1 Residue 2 Status Linker Distance ID 1 Intra MTA1 MTA1 488 511 Satisfied ADH 6 A1 2 Intra MTA1 MTA1 488 518 Satisfied ADH 6 A2 3 Intra MTA1 MTA1 518 511 Satisfied ADH 13 A3 4 Inter MTA1 RBBP4 518 104 Satisfied ADH 22 A4 5 Intra RBBP4 RBBP4 104 60 Satisfied ADH 14 A5 6 Intra RBBP4 RBBP4 160 215 Satisfied ADH 18 A6 7 Intra RBBP4 RBBP4 166 124 Satisfied ADH 11 A7 8 Intra RBBP4 RBBP4 5 20 Satisfied ADH 19 A8 9 Intra MTA1 MTA1 509 532 Violated DSS 33 D1 10 Inter MTA1 RBBP4 532 264 Violated DSS 35 D2 11 Inter MTA1 RBBP4 532 22 Violated DSS 40 D3 12 Intra MTA1 MTA1 509 477 Satisfied DSS 26 D4 13 Inter MTA1 RBBP4 509 102 Satisfied DSS 24 D5 14 Intra MTA1 MTA1 477 532 Satisfied DSS 16 D6 15 Inter MTA1 RBBP4 509 26 Satisfied DSS 7 D7 16 Intra RBBP4 RBBP4 162 153 Satisfied DSS 16 D8 17 Intra RBBP4 RBBP4 212 156 Satisfied DSS 10 D9 18 Intra RBBP4 RBBP4 156 215 Satisfied DSS 13 D10 19 Intra RBBP4 RBBP4 264 215 satisfied DSS 28 D11 20 Intra RBBP4 RBBP4 160 120 Satisfied DSS 10 D12 21 Inter MTA1 RBBP4 686 317 Satisfied DSS 9 D13 22 Inter MTA1 RBBP4 686 22 Satisfied DSS 21 D14 23 Inter MTA1 RBBP4 686 4 Satisfied DSS 13 D15 24 Inter MTA1 RBBP4 509 104 Satisfied ZLXL 12 Z1

104

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

25 Intra RBBP4 RBBP4 148 215 Satisfied ZLXL 8 Z2 26 Intra MTA1 MTA1 511 686 Satisfied ZLXL 14 - A, D, and Z refers to the ADH, DSS, and ZLXL crosslinks, respectively. - This crosslink is not applicable in Pymol but in chimera and was satisfied in the model The maximum expected distance for DSS, ADH, and ZLXL are 30 Å, 24 Å, and 14 Å, respectively.  Violated in the EM model as well

Fig 5-12. Topology of MR complex using our crosslinking data and available crystal structures of the RBBP4 and MTA1. (A) X-ray crystal structure of MTA1- RBBP (PDB: 5FXY). Violated XLs are shown with light orange lines. (B) X-ray crystal structure of MTA1-RBBP (PDB: 4PBY). Inter subunits XLs are shown with yellow lines and intra XLs are shown with white lines. DSS, ADH, and ZLXL crosslinks are indicated as D, A, and Z, respectively.

5.2.7 Modeling of the MR complex using SPEM data

105

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

A model for the MR complex was generated by fitting the available crystallographic structures (4PBY and 4FXY) into our EM envelope using the “fit in map” function of the Chimera software. For visualizing the XLs in the EM envelope containing the crystal structures and analyzing how the model agree with the XLs we used the Xlink Analyzer software package, a plugin package in Chimera [153, 225]. As expected, the 23 XLs that satisfied the individual crystal structures of MTA-RBBP (4PBY and 4FXY) were also satisfied here. More importantly, we were able to fulfil two additional XLs, D1 and D2, but not D3 in the current model. In this model D2 connects MTA1-R2 with the RBBP subunit that binds R1. The only XL that was not satisfied in our model is D3. It is possible that dynamics within the complex might allow this XL to be satisfied in another conformation (Fig 5-13).

Fig 5-13. Structural analysis of the RBBP4:MTA1C complex using SPEM and XL-MS/MS data. Conformation of the MR complex subunits showing all inter and intra XLs satisfying or violating in the model. Unsatisfied XLs are shown in red; satisfied XLs are shown in blue. The red XL is formed between MTA1 (532K)-RBBP4 (22K).

106

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

5.3 Discussion

Here we reveal the stoichiometry of the MTA1-RBBP4 complex using biochemical and proteomics approaches. We report that for the MR complex the stoichiometry can fluctuate, depending on the fraction we analyze from the sucrose gradient. Why does this happen? The reason could be due to the fact that binding of RBBP4 is relatively weak, so that 1:1 and 1:2 complexes exist in equilibrium. Therefore, generation of different species of recombinant MR, containing either one RBBP4 or two RBBP4, is possible.

In contrast, only a single (1:1) complex was observed in the MmutR complex experiment, confirming the importance of the R1/R2 sites for RBP binding. It is notable that any MTA1Cmut (with no RBBP4) would be removed in earlier fractions of the sucrose gradient, and therefore not observed, because of its low molecular weight in comparison with the larger size complexes.

Based on our findings the presence of four RBBP4 subunits in the NuRD complex is likely; two MTA subunits per NuRD based on the two MTA1 subunits predicted by the 2:2 crystal structure of HDAC1 bound to the N-terminal fragment of MTA1 [112].

Although RBBP4 and RBBP7 are found in several other protein complexes involved in gene regulation, including SIN3 [176], NURF [140], PRC2 [141], and CAF1 [229], MTA1 is not found in these complexes, meaning that MR is a unique feature of NuRD complex. Furthermore, in the complexes that have been characterized structurally, only a single RBBP subunit is present. What the implications of this observation: why does NuRD enlist four RBBP subunits but the other complexes only one? Given that RBBPs are known to bind to histone H3 and H4, it is possible that NuRD uses this strategy to increase its affinity toward nucleosomes. This could arise from two RBBPs binding to the two copies of histone H3, for example, in a single nucleosome. Alternatively, it is possible that the physiologically relevant remodelling substrate of NuRD is a dinucleosome, and therefore that each of the two 1:2 MR units might contact a separate nucleosome [163]. This hypothesis remains to be tested. In addition to the abovementioned scenarios, binding of NuRD to other histone variants (e.g., H2AZ) might be functionally relevant, as high throughput ChIP-Seq for NuRD subunits (e.g., MBD3 and CHD4) show localization of NuRD complex at genomic 107

Chapter 5: The MTA1 subunit can recruit 2 copies of RBBP4

regions that are enriched with the H2AZ histone variant. These data suggest the possibility of MR interacting with the N-terminal tail of H2AZ via an as yet unknown mechanism [166]. Another question that is raised by this work is that of whether RBBP7 is functionally distinct from RBBP4 in any way. The MTA1 binding surface in RBBP4 that mediates the interaction with MTA1 is identical to that of RBBP7, suggesting that this interaction is not a means by which different NuRD isoforms are selected. In this context, it is worth noting that the R1 and R2 sequences in MTA2 and MTA3 are not identical to MTA1, and it is therefore possible that RBBP4/RBBP7 selectivity in NuRD might be mediated by MTA1 vs MTA2 vs MTA3 NuRD complexes. Even if this were true, however, the histone H3 binding surfaces of RBBP4 and RBBP7 are essentially identical, and so and functional distinction between RBBP4- and RBBP7- NuRD complexes must arise through other mechanisms that are yet to be understood.

108

Chapter 6: Discussion

Chapter 6. Discussion

The NuRD complex is perhaps the least understood of the major remodellers present in mammals. Although over the past two decades there have been numerous studies investigating the function of the NuRD during development and in disease states using knock out models and deep sequencing, our knowledge about the structure, assembly and molecular function of the complex has remained sparse.

The first step towards gaining more insight into the structure and function of the complex is to investigate the stoichiometry and assembly of the complex. Dissecting the complex into smaller and stable subcomplexes is required to achieve this.

A next step would be the investigation of combinatorial assembly of NuRD paralogues into distinct complexes. The co-regulation of paralogous proteins might be a common characteristic of large multi-subunit complexes to regulate their function in different cell types and at different times. For instance, it has been demonstrated that GATAD2A-NuRD but not GATAD2B-NuRD can enable deterministic reprogramming without affecting the proliferation of somatic cells. Combinatorial assembly of the mSWI/SNF or BAF complex subunit paralogues is known togive rise to several distinct complexes, namely npBAF, nBAF, and PBAF. The npBAF is functional in neural stem cells and nBAF in post-mitotic neurons [230, 231].

Another important step to further delineate the function and specificity of the complex is to determine the interacting partners of the NuRD complex. The translocase subunit of the ATP-dependent chromatin remodelling complexes is the shared property of all remodellers. However, each remodeller has its own specific protein partners (e.g., transcription factors and chromatin binding proteins) that give them unique properties and specificity.

Before this work, we had little understanding of the stoichiometry and interacting partners of the NuRD. It was not known that the histone deacetylase and chromatin remodeling subcomplexes are brought together to form a full NuRD, nor how these sub-complexes cooperate to modify chromatin and gene expression. We also did not know whether the canonical subunits and paralogues of the complex are preserved across cell types. In this chapter I will discuss our findings in this area.

109

Chapter 6: Discussion

6.1 High level expression of a subunit does not guarantee its integration into the NuRD complex.

In Chapter 3 it was shown that one distinct NuRD complex (e.g., MBD2-NuRD) can be more dominant in MCF7 and PC3 cell types compared to the NT2 cell line. This compositional variation can arise from differences in the expression level of the subunits at mRNA and/or protein level [1, 172]. We therefore first examined the expression profile of the NuRD subunits in NT2 cells using RNA-Seq data from the Human Protein Atlas (HPA) consortium (Fig 6-1).

Fig 6-1. mRNA expression pattern of NuRD subunits in NT2 cells. RNA-Seq data are represented as Transcripts Per Million. Except for GATAD2B, CHD3, and MBD2, which show low expression, the rest show moderate or high expression.

In addition, we used absolute mRNA and protein abundance data from mouse embryonic stem cells (mESCs), which resemble NT2 cells (Fig 6-2A,B) [1]. NT2 cells closely resemble pluripotent stem cells, and it is therefore reasonable to speculate that in the NT2 cells the expression pattern of NuRD subunits is similar to that of ESCs. The comparison of absolute mRNA and protein data (Fig 6-2C) indicate that there is a relatively direct correlation between NuRD subunits mRNA and protein abundance. However, this rule is not true for all subunits; for example, at the level of mRNA HDAC1 is 2.5× times less abundant than HDAC2 but at the level of protein HDAC1 is 10× more abundant compared to HDAC2. This type of pattern can arise because of

110

Chapter 6: Discussion

differences in the mRNA half-lives, translation rate or turnover of a particular protein. The comparison of the subunit copy numbers (6-2B) with their enrichment pattern in the NuRD complex (Fig 6-3) suggests that there is not a strong correlation between the protein copy number and its enrichment. For example, as has been depicted in Fig 6-2B, MTA2 is present 15× more than MTA1 in mouse ES cells but our data indicate that the distribution of MTA1 and MTA2 in the NT2 NuRD complex is very similar (Fig 6-3). Likewise, RBBP7 (40000 molecules/cell) is ~3× more abundant than RBBP4 (14000 molecules/cell) but it is propensity to involve into the NuRD complex appears relatively weak. In other words, it appears that RBBP4-NuRD is more abundant than RBBP7-NuRD in NT2 cells. This could be due to either a high propensity for RBBP7 to be involved in other complexes such as HAT1 or a higher propensity for RBBP4 to be part of the NuRD complex, effectively competing RBBP7 away from the complex.

111

Chapter 6: Discussion

Fig 6-2. NuRD subunit transcripts and protein copy numbers per cell in mESCs. (A) mRNA copy numbers of the NuRD subunits. (B) Protein copy numbers of the NuRD subunits per cell. Proteins copy numbers are divided by 100. (C) Correlation of mRNA and protein product concentrations of the NuRD subunits. The Y axis is log10 transformed. R-squared is a statistical measure of how close the data are to the fitted regression line.

112

Chapter 6: Discussion

Fig 6-3. Heatmap representing the log2 intensity of the iBAQ values for NuRD subunits in MCF7, NT2 and PC3 cells. The coloring is based on relative iBAQ intensities for each protein. CHD3 and CDK2AP2 are present in noticeably lower amounts in MCF7 than in PC3 and NT2. RBBP4 is the most enriched protein.

6.2 Composition and stoichiometry of the NuRD complex

All mammalian NuRD stoichiometry analyses done to date have been performed in HeLa-derived cell lines [57, 168, 169], together with a single study in mouse ES cells [161]. We assessed NuRD stoichiometry across several cell types to begin to understand whether the stoichiometry of the complex is conserved between cell types. Our iBAQ data indicate that the stoichiometry of the NuRD complex preserved overall across our three chosen cancer cell lines that reflect three different human tissues. In all three cell types, the consensus stoichiometry observed from multiple measurements suggests that: (i) MTA and HDAC subunits are in a 1:1 ratio; and (ii) that GATAD2 and MBD are also in a 1:1 ratio – and that there is half as much of each

113

Chapter 6: Discussion

of these two subunits as MTA and HDAC. Our observations are consistent with the X- ray crystal structure of a 2:2 HDAC1-MTA1 complex [112] and the NMR structure of a 1:1 GATAD2A-MBD2 complex [132]. We propose that the overall stoichiometry of the MTA-HDAC-MBD-GATAD2 part of the NuRD complex is 2:2:1:1.

Several other iBAQ studies of the NuRD complex (Fig 6-4) have been published [57, 168, 169, 181]. One [181] also uses multi-angle laser light-scattering (MALLS) data to directly measure the solution molecular weight of the complex. This latter study draws the same conclusion as we do regarding the stoichiometry of these subunits. In contrast, the other iBAQ studies report a range of stoichiometries for these four subunits. It is possible that variations in purification protocol – including the question of which subunit is used as the purification ‘handle’ – contribute to different observed estimates of the stoichiometry. This interpretation is further supported by the values we measure for RBBP stoichiometry. Values for the current study are presented as the mean across all replicates (n = 9) of all three cell types (refer to Chapter 3 Fig 3-3A).

The amount of the RBBP subunit observed varies significantly from sample to sample in our measurements – ranging from ~5.5–11 RBBPs per NuRD complex. In contrast, the variation observed for the other subunits is much smaller. Other studies report 3– 4 RBBP subunits per complex (after normalizing the data to reflect two MTA subunits per complex). A value of four RBBPs is in best agreement with published structural and biochemical data, which indicate that two RBBP binding sites exist on each MTA molecule [163, 171, 190]. It is possible that the excess in RBBP that we observe is a consequence of using this protein as the purification handle in our first purification step and that the sucrose gradient does not completely remove additional, weakly associated RBBP. One way to address this uncertainty would be for us to use our DIA- MS method to measure the stoichiometry of a NuRD complex purified using another subunit as the handle. This would require the construction of a cell line in which one or NuRD subunits have an affinity tag introduced into the endogenous locus by CRISPR/Cas9 engineering.

Our data also show a sub-stoichiometric amount of the CHD subunit, which is most likely due to the peripheral nature of this protein in the complex [162], which might make it prone to dissociation during purification. Zhang et al (2016), who also

114

Chapter 6: Discussion

employed multi-step purification protocols, also reported sub-stoichiometric quantities of CHD. In contrast, the three other studies [57, 168, 169] used single-step affinity purifications, and consistently reported one CHD subunit per NuRD. A single copy of CHD is most likely, although it is worth noting that it is still unclear whether the NuDe complex has a cellular function. If it did, this might account for our observations.

Fig 6-4. Comparison of NuRD complex subunit stoichiometry from different studies. Summary of stoichiometries deduced from all published iBAQ studies of NuRD, together with the current data. Previous studies used MBD3 as the purification handle whereas RBBP was used in the current study.

CDK2AP1 also has very low abundance in our samples and in the purified Drosophila NuRD complex, whereas it is present at 1–2 copies per NuRD complex in the HeLa- purified complexes. We note that, across all studies (including between our three cell lines), CDK2AP1 abundance tracks closely with CHD abundance (Fig 6-5) suggesting that these two proteins might directly interact. Thus, our low stoichiometry for CHD4 would explain why we observe little CDK2AP1 in our complexes. Taking all of these

115

Chapter 6: Discussion

data into account, we suggest an overall stoichiometry for the NuRD complex of 4:2:2:1:1:1:1 (RBBP:MTA:HDAC:MBD:GATAD2:CHD:CDK2AP1).

Fig 6-5. Scatter plot showing the correlation of CDK2AP and CHD. Quantity of CDK2AP plotted against the quantity of CHD. Each data point is an independent measurement; data from all sucrose-gradient fractions are included. The values are strongly correlated (r2 = 0.88).

6.3 NuRD complexes with different paralogue compositions are likely to have distinct functions

Published data suggest that the composition of most protein complexes is constant across many cell types [172, 192, 194]. We suggest that the NuRD complex has a composition that can vary in a context dependent manner. However, the changes are not in overall architecture but rather in paralogue distribution. Although we did observe that most paralogues were found in each of our three cell types, several differences were observed. Most prominently, MBD2 was not observed in NuRD from NT2 cells, despite the fact that MBD2 transcript levels are higher in NT2 cells than in MCF7 or PC3 cells (Fig 6-3). This suggests that MBD3-NuRD is the predominant NuRD complex in NT2 cells and it is quite possible that this selectivity has functional consequences.

116

Chapter 6: Discussion

MBD2- or MBD3-exclusive NuRD complexes have been previously described and shown to have distinct roles in other cell types [134, 136, 195]. For example, it has been shown that Mbd3 knock-out mice die during early embryogenesis, whereas Mbd2 knock-out mice are viable and fertile, a strong indication that these two proteins are not functionally redundant [134]. Furthermore, knockdown of MBD3 but not MBD2 dramatically increases the efficiency of the reprogramming of somatic cells to pluripotent stem cells [60], although it should be noted that these findings are currently disputed [60]. A similar observation has been made for the GATAD2 subunit: depletion of GATAD2A but not GATAD2B leads to highly efficient induced pluripotent stem cell formation [200]. In addition, GATAD2A-NuRD is also selectively recruited (over GATAD2B NuRD) by ZMYND8 to DNA double-stranded breaks [57]. CHD3-, CHD4- and CHD5-specific NuRD complexes have also been implicated in mouse cortical development [201]. Paralogue interchanges of these types will most likely be revealed to be a widespread aspect of NuRD activity.

6.4 NuRD may contact its substrate through multiple interfaces

An important step in delineating the mechanisms of action of the NuRD complex is to identify the means through which the complex search for a substrate (e.g., nucleosomes). It has been demonstrated that remodellers such as SWI/SNF [55, 209], Chd1 [56], and INO80 [54] make multiple simultaneous interactions with histones and/or DNA during the remodelling reaction that they catalyze. Thus it is reasonable to suggest that the NuRD complex might contact the nucleosome substrate through multiple interfaces. The BAH domain of the MTA subunit is present in several transcriptional co-regulators, and has been shown to act as a histone recognition domain (Kuo et al., 2012 Du et al., 2012). In addition, the two PHD domains in the CHD4 subunit are also important for connection of NuRD with the nucleosome through H3K4 and methylated or acetylated H3K9 [123, 232]. In addition to histone binding domains, several domains in NuRD harbour demonstrated DNA-binding properties. It has been demonstrated that the N-terminal part of the CHD4 and its chromodomains are essential for connection of the NuRD to DNA [128]. It is unlikely that many or all of these direct interactions with DNA will be specific, especially given that the complex most likely acts on diverse DNA sequences across the genome. Therefore, specificity

117

Chapter 6: Discussion

will most likely be introduced through the interaction with site-specific transcriptional coregulatory proteins such as FOG-1, and several other zinc-finger containing transcription factors (discussed in Chapter 1) that target NuRD to particular loci. In line with these ideas, our data show that NuRD can recruit site-specific proteins (e.g., ZNF219 and PWWP2A) to bind distinct genomic regions.

6.5 Interacting partners of the complex

It has been proposed that interactions of chromatin remodelling complexes with a range of transcription factors and chromatin regulating proteins can assist the NuRD complex in binding to nucleosome or to differentially detect histone modifications and histone variants. In addition, it has been reported that the NuRD complex can interact with a range of proteins and deacetylate them and consequently regulate their stability and/or function [100, 101]. Thus understanding the interactome of the complex will help us understand its specificity and targets.

In Chapter 3, we investigated the interactome of the NuRD complex using affinity purification coupled to MS. We identified 83 proteins that were present at least in two of three biological replicates. Among the interacting proteins there are a number of transcription factors and chromatin binding proteins. Of these proteins, some, for example HIF1A [233] and PWWP2A [74, 198], have already been reported as interacting partners of the NuRD complex. Most recently, it has been demonstrated that PWWP2A – a transcriptional regulator [74, 198] – can compete directly with MBD3 for binding to MTA, meaning that it forms a complex with a NuRD subcomplex comprising MTA, HDAC and RBBP subunits.

Because MBD3 connects these proteins to GATAD2 and CHD, the PWWP2A interaction therefore effectively generates a new variant NuRD complex that lacks chromatin remodelling activity. Genome-wide analysis reveals that PWWP2A prefers to bind to the promoters region of highly transcribed genes containing H2A.Z-containing nucleosomes [198]; H2A.Z is a variant of H2A [234]. Strikingly, in addition to PWWP2A, we also observed the enrichment of H2A.Z in our NuRD purification. Most recently, two papers demonstrated that chromatin binding proteins such as PWWP2A competes with MBD3 for binding to MTA1, consequently defining a new variant

118

Chapter 6: Discussion

without remodelling activity. This variant targets acetylation sites on H3K27 and H2AZ [74, 198].

The listed proteins in Chapter 3 can be potentially direct or indirect interactors of the complex or they can be pulled down due to a non-specific interaction with beads. Therefore, biochemical assays are needed to further delineate the interactome data. As explained in Chapter 3, we found the ZNF219 and SLC25A5 are capable of direct interaction with the NuRD complex, at least in in vitro pulldown assays.

6.5.1 ZNF219 may target NuRD to specific loci

ZNF219 is a moderately expressed and poorly characterized protein containing nine classical zinc-finger (ZnF) domains. It has been shown to repress target genes during the differentiation of pluripotent stem cells [188, 189], although no mechanism has been proposed for this repressive activity.

Our data suggest that ZNF219 might recruit the NuRD complex to target gene promoters or enhancers. ZNF219 has also been observed to be enriched following immunoprecipitation of the NuRD complex from neural stem cells that had been derived from pluripotent stem cells [235]. We found ZNF219 to be enriched in all three NT2 cell samples but not at all in the other two cell types.

This observation is consistent with the fact that ZNF219 transcript levels are several times higher in the NT2 cells, and also that NT2 cells closely resemble pluripotent stem cells and can readily be differentiated into neuronal cells [236]. Furthermore, consistent with our data, ZNF219 has also been identified as a NuRD-interacting protein in several affinity-purification-mass-spectrometry studies [237-239].

Our data showed that the N-terminal portion of ZNF219 (residues 1–273) mediates interactions with RBBP4, GATAD2B and the C-terminal portion of CHD4 (CHD4C). Although ZNF219 resembles FOG1 in that it contains nine ZnF domains, it lacks the conserved Lys-Lys-Arg motif used by FOG1 to bind to RBBP4/7 [138], indicating that it recognizes RBBP4 through a distinct mechanism. CHD4C contains two domains that are predicted to be ordered but for which the structures are unknown. We have previously shown that one of these domains can bind GATAD2B [187], perhaps suggesting that ZNF219 makes a concerted interaction with these two regions that 119

Chapter 6: Discussion

are already co-localized. Further biochemical and structural work will be required to determine the molecular basis for this interaction.

6.5.2 NuRD subcomplexes can be found in the cytoplasm and/or cytosol

Very recently, several labs have reported the presence of gene regulating complexes such as COMPASS [240], PRC2 [241], and the BAP1 [191] in the cytoplasm. In some cases, these cytoplasmic complexes have been shown to have functional roles. For example, the presence of the cytoplasmic SET1/COMPASS complex was shown to be essential for triple-negative breast cancer pathogenesis. Similarly, Zhang et al. reported that catalytically active NuRD subcomplexes can be found in the cytosol [181].

Potentially, an HDAC-containing NuRD subcomplex could act to deacetylate non- histone proteins. For example, MTA/HDAC containing complexes have been demonstrated to deacetylate a range of transcription factors including p53 [101], ATF4 [100], and HIF1α [242]. SLC25A5 is known predominantly as a mitochondrial inner membrane protein that belongs to the solute carrier family of proteins and functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm [243].

However, SLC25A5 has also been reported to be present in both the nucleus [244] and the cytoplasm [243]. It is notable that SLC25A5 has 24 acetylation sites [180] and it is possible that cytoplasmic NuRD (or a subcomplex) modulates the acetylation level of SLC25A5. Finally we note that genetic studies on patients with intellectual disabilities have identified GATAD2 and SLC25A5 as possible co-actors in this condition [245]; it is possible that the direct interaction between GATAD2 and SLC25A5, as observed in our pulldown assays, could have a role to play in this disease through a yet unknown mechanism.

6.5.3 RSRC2 as a possible NuRD partner

120

Chapter 6: Discussion

The RSRC2 gene is located in Chr. 12 and encodes an arginine/serine rich coiled-coil protein. RSRC2 is an uncharacterized 50-kDa protein that is well conserved across species from fish to human (Fig 6-6). The presence of a coiled-coil domain in its sequence makes it tempting to speculate that this protein may make an interaction with NuRD subunits such as GATAD2A/B. Further work would be needed to characterize the molecular basis of the interaction of RSRC2 with the NuRD complex. Interestingly, an AP-MS study in search for chromatin associated proteins listed the RSRC2 protein as significant interactors of H3 and H2B histones, consistent with a role for this protein in chromatin regulation [183].

Fig 6-6. Multiple sequence alignment and disordered region prediction. (A) Protein sequences from human, mouse, and fish RSRC2 have been aligned (Q7L4I2). Red residues are identical between all three species. Secondary structure and domain predictions predict several helical segments at the C-terminal half of the protein

121

Chapter 6: Discussion

sequence. (B) Ordered and disordered regions of the protein have been predicted using the PONDER online tool. The C-terminal part with helical regions is likely to be ordered.

6.6 The N-terminal part of MTA may bring the DNA translocase moiety into the deacetylase core

The heterodimer complex formed between MTA and HDAC (MH) creates a platform for other subunits to build up the full NuRD complex. In Chapter 4, we used a combination of absolute quantification (DIA-MS) and XL-MS/MS to further delineate the architecture and topology of the complex by studying several subcomplexes. Our XL-MS/MS data for MMH suggest that MBD3 can make an interaction with the MH complex via the BAH domain in MTA2. Since there are two BAH domains in the MH heterodimer complex, it is likely that MBD3 can interact with both BAH domains.

Consistent with the observation that there are two BAH domains, absolute protein quantification using DIA-MS indicate a stoichiometry of 2:2:2 for the MBD3:MTA2:HDAC1 complex (Fig 6-7A). Although it is consistent with the presence of two BAH domains, the presence of two MBD3 subunits is surprising when considering that our stoichiometry data for the native NuRD complex shows one MBD3 per complex. This observation could be potentially due to one of several reasons: (i) overexpression of the subunits in the MMH complex described herein, which may lead to non-native bias towards a 2:2:2 stoichiometry; (ii) MBD3 was used as the bait protein to purify MMH and this could lead to an excess amount of MBD3; or (iii) the presence of 2 BAH domain in the MTA protein.

We hypothesized that GATAD2A and CHD4, which are present in the whole complex but not MMH, may prevent the binding of the second MBD3 molecule to the MH complex by somehow blocking the second BAH domain. We tested this hypothesis in Chapter 4 by adding GATAD2A into the MMH complex. Our results indicate that introduction of GATAD2A into the MMH complex not only leads to the formation of a stable complex (Fig 6-7B). DIA-MS analysis of the MMHG complex showed a stoichiometry of 2:2.5:1.3:0.8 for MTA:HDAC:MBD3:GATAD2 (relatively close to

122

Chapter 6: Discussion

2:2:1:1), consistent with the idea that GATAD2A and/or CHD4 might directly compete away the binding of a second copy of MBD to the MH core.

Fig 6-7. Schematic representation of the stoichiometry of MBD3 in MMH and in NuRD. (A) Two copies of MBD3 are observed per MMH complex, consistent with MBD3 binding the BAH domain of each MTA1 subunit. (B) After overexpressing GATAD2A with MMH complex only one MBD3 was observed per complex. These data suggest that GATAD2A or CHD4 inhibit the binding of a second MBD3 to the complex.

Our data also suggest that contacts between the MTA BAH domain and the MBD3 MBD domain are a key part of the histone deacetylase and DNA translocase units. As it appears that BAH domain of MTA1 can provide a potential platform for interaction of histone deacetylase core with translocase core within the NuRD complex. Now, another question would be if there is any other protein in cell or during development in different tissues that can compete away MBD2/3 and create a new version of NuRD with new function?

6.7 RBBP4/7 appears to have a dynamic interaction with the MTA protein

Our stoichiometry calculation for the MR complex yielded different values in different sucrose gradient fractions (Section 5.2.4). However, in the case of the MmutR complex, the stoichiometry remained constant, regardless of the fraction. As noted above, variable stoichiometry has been observed in numerous studies of NuRD stoichiometry, and it is not currently clear whether this variability is functionally relevant or is an artefact of the experiments themselves.

However, in the MmutR complex, as MTA1Cmut has just one binding site the generation of two species is unlikely. Nonetheless, if MTA1Cmut (with no RBBP4) is generated it will be removed in earlier fractions because of very small size in comparison with the

123

Chapter 6: Discussion

larger size complexes. Based on the crystal structure of the MH core complex, MTA forms a homodimer, meaning that it can bring a total of four RBBP4 molecules into the complex.

RBBP4 but not MTA1 is found in several other protein complexes, including SIN3 [176], NURF [140], PRC2 [141], and CAF1 [229], meaning that the MR complex is a unique feature of the NuRD complex. It is tempting to ask whether the formation of the MR subcomplex can give the NuRD complex any specificity or, in other words, what would be the consequence of this interaction? Why does NuRD need 4 molecules of RBBP4? It is possible that NuRD uses this strategy to increase its affinity toward nucleosomes. It is also possible that the substrate of NuRD is a dinucleosome in which each MR unit might connect a separate nucleosome, allowing the remodeling module or deacetylating module act on a two-nucleosome unit (Fig 6-8) [163].

6.8 The MR complex gives specificity to NuRD to bind histone H3 but not H4

Previous studies reported that RBBP4 was able to interact with the C-terminal half of MTA1/2. Alqarni et al. solved the structure of the MTA1(670–690) region in complex with full RBBP4 [190]. They reported that a short -helix and a hydrophobic cluster binds to RBBP4 in a distinct pocket.

In the Mackay lab, Alqarni et al. also compared the affinity of the R2 region (656–686) towards binding to RBBP4 and H4 using isothermal titration calorimetry (ITC). They observed that H4 and MTA1 bound in the same groove of RBBP4. Based on the similarity in affinity they suggested that there may be competition between MTA1 and histone H4 for binding to RBBP4. The interaction of the MR complex with H4 seems to be not possible because the extensive interface between MTA and RBBP4 can block the binding site for H4.

In contrast, the interaction of the MR complex with H3 is likely, because in RBBP4, the H3-binding surface lies on the ‘top’ of the WD40 domain that is not blocked by the RBBP-MTA interaction. This suggests that RBBP4 most likely serves to contact histone H3 when NuRD contacts chromatin (Fig 6-8). Consistent with this hypothesis, Millard et al. demonstrated that both RBBP4 alone and the MR complex show an identical

124

Chapter 6: Discussion

binding pattern to H3 peptides [171]. In contrast, the MR complex did not bind to H4 peptides, whereas the free RBBP4 bound to the H4 peptides.

Fig 6-8. Schematic representation of the recruitment of the deacetylase core of NuRD complex to chromatin. Subunits are colored according to the schematic representation in Fig 1-3. Interaction of the RBBP4/7 proteins and the H3 tail could recruit the complex to chromatin on the same or adjacent nucleosomes. Other histone tails, such as those of H2A.Z or H4, would be available for deacetylation by HDAC1/2.

6.9 Concluding remarks

Research described in this thesis has found that NuRD core components are conserved across a number of cancer cell types and have a stoichiometry of 4:2:2:1:1:1 (RBBP4:MTA:HDAC:GATAD2:MBD:CHD). Seemingly, MBD2-NuRD is present in MCF7 and PC3 cells but absent in NT2 cells, suggesting an important role for MBD2-NuRD in MCF7 and PC3 cell lines than NT2 cells. This thesis also provides biochemical evidence for interaction of NuRD with specific transcription factors in specific cell types and perhaps hints at a role for a NuRD subcomplex in the deacetylation of the mitochondrial membrane protein SLC25A5. Our data also reveal that GATAD2 connects the MHR module of NuRD to the CHD remodeller and also modulates the interaction of the MBD subunit with the MTA-HDAC unit. Our proteomics, biochemical and structural data indicate that four molecules RBBP4 are present in a NuRD complex, which may enhance the avidity of the complex to bind its substrate. The data presented in this thesis build on our understanding of NuRD complex structure and

125

Chapter 6: Discussion

moves us closer towards a better understanding of mechanisms of action of the complex.

Following from the findings presented here, several avenues of investigation suggest themselves. An investigation of the interactome of the NuRD complex across normal and diseased tissues might provide insight into how changes in NuRD recruitment are linked to disease states. The composition and interactome of the complex or subcomplexes of NuRD in the cytosol would perhaps shed light on the role of such complexes at sites remote to chromatin. Biochemical and genome wide analysis may provide insights into how the HDAC can alter the substrate specificity for the complex. Further efforts can provide insights into the cooperation of HDAC and CHD to remodeling chromatin structure.

126

References

1. Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W. & Selbach, M. (2011) Global quantification of mammalian gene expression control, Nature. 473, 337-42. 2. Komili, S. & Silver, P. A. (2008) Coupling and coordination in gene expression processes: a systems biology view, Nat Rev Genet. 9, 38-48. 3. Ahmadi Rastegar, D., Sharifi Tabar, M., Alikhani, M., Parsamatin, P., Sahraneshin Samani, F., Sabbaghian, M., Sadighi Gilani, M. A., Mohammad Ahadi, A., Mohseni Meybodi, A., Piryaei, A., Ansari-Pour, N., Gourabi, H., Baharvand, H. & Salekdeh, G. H. (2015) Isoform- Level Gene Expression Profiles of Human Y Azoospermia Factor Genes and Their Paralogs in the Testicular Tissue of Non-Obstructive Azoospermia Patients, J Proteome Res. 14, 3595-605. 4. Gerosa, L., Kochanowski, K., Heinemann, M. & Sauer, U. (2013) Dissecting specific and global transcriptional regulation of bacterial gene expression, Mol Syst Biol. 9, 658. 5. Burnside, K. & Rajagopal, L. (2012) Regulation of prokaryotic gene expression by eukaryotic-like enzymes, Curr Opin Microbiol. 15, 125-31. 6. Ishihama, A. (2012) Prokaryotic genome regulation: a revolutionary paradigm, Proc Jpn Acad Ser B Phys Biol Sci. 88, 485-508. 7. Roundtree, I. A., Evans, M. E., Pan, T. & He, C. (2017) Dynamic RNA Modifications in Gene Expression Regulation, Cell. 169, 1187-1200. 8. Collart, M. A. (2016) The Ccr4-Not complex is a key regulator of eukaryotic gene expression, Wiley Interdiscip Rev RNA. 7, 438-54. 9. Chen, K., Zhao, B. S. & He, C. (2016) Nucleic Acid Modifications in Regulation of Gene Expression, Cell Chem Biol. 23, 74-85. 10. Jangravi, Z., Tabar, M. S., Mirzaei, M., Parsamatin, P., Vakilian, H., Alikhani, M., Shabani, M., Haynes, P. A., Goodchild, A. K., Gourabi, H., Baharvand, H. & Salekdeh, G. H. (2015) Two Splice Variants of Y Chromosome-Located Lysine-Specific Demethylase 5D Have Distinct Function in Prostate Cancer Cell Line (DU-145), J Proteome Res. 14, 3492-502. 11. Voss, T. C. & Hager, G. L. (2014) Dynamic regulation of transcriptional states by chromatin and transcription factors, Nat Rev Genet. 15, 69-81. 12. Gibney, E. R. & Nolan, C. M. (2010) Epigenetics and gene expression, Heredity (Edinb). 105, 4-13. 13. Willyard, C. (2017) An epigenetics gold rush: new controls for gene expression, Nature. 542, 406-408. 14. Soutourina, J. (2018) Transcription regulation by the Mediator complex, Nat Rev Mol Cell Biol. 19, 262-274. 15. Clapier, C. R. & Cairns, B. R. (2009) The biology of chromatin remodeling complexes, Annu Rev Biochem. 78, 273-304. 16. Poss, Z. C., Ebmeier, C. C. & Taatjes, D. J. (2013) The Mediator complex and transcription regulation, Crit Rev Biochem Mol Biol. 48, 575-608. 17. Cavalli-Sforza, L. L. (2005) The Diversity Project: past, present and future, Nat Rev Genet. 6, 333-40. 18. Rando, O. J. & Chang, H. Y. (2009) Genome-wide views of chromatin structure, Annu Rev Biochem. 78, 245-71. 19. Venkatesh, S. & Workman, J. L. (2015) Histone exchange, chromatin structure and the regulation of transcription, Nat Rev Mol Cell Biol. 16, 178-89. 20. Clapier, C. R., Iwasa, J., Cairns, B. R. & Peterson, C. L. (2017) Mechanisms of action and regulation of ATP-dependent chromatin-remodelling complexes, Nat Rev Mol Cell Biol. 18, 407-422.

127

21. Krasnov, A. N., Mazina, M. Y., Nikolenko, J. V. & Vorobyeva, N. E. (2016) On the way of revealing coactivator complexes cross-talk during transcriptional activation, Cell Biosci. 6, 15. 22. Thatcher, T. H. & Gorovsky, M. A. (1994) Phylogenetic analysis of the core histones H2A, H2B, H3, and H4, Nucleic Acids Res. 22, 174-9. 23. Marino-Ramirez, L., Kann, M. G., Shoemaker, B. A. & Landsman, D. (2005) Histone structure and nucleosome stability, Expert Rev Proteomics. 2, 719-29. 24. Strahl, B. D. & Allis, C. D. (2000) The language of covalent histone modifications, Nature. 403, 41-5. 25. Munshi, A., Shafi, G., Aliya, N. & Jyothy, A. (2009) Histone modifications dictate specific biological readouts, J Genet Genomics. 36, 75-88. 26. Nakanishi, S., Lee, J. S., Gardner, K. E., Gardner, J. M., Takahashi, Y. H., Chandrasekharan, M. B., Sun, Z. W., Osley, M. A., Strahl, B. D., Jaspersen, S. L. & Shilatifard, A. (2009) Histone H2BK123 monoubiquitination is the critical determinant for H3K4 and H3K79 trimethylation by COMPASS and Dot1, J Cell Biol. 186, 371-7. 27. Allfrey, V. G., Faulkner, R. & Mirsky, A. E. (1964) Acetylation and Methylation of Histones and Their Possible Role in the Regulation of Rna Synthesis, Proc Natl Acad Sci U S A. 51, 786-94. 28. Chahal, S. S., Matthews, H. R. & Bradbury, E. M. (1980) Acetylation of histone H4 and its role in chromatin structure and function, Nature. 287, 76-9. 29. Lombardi, P. M., Cole, K. E., Dowling, D. P. & Christianson, D. W. (2011) Structure, mechanism, and inhibition of histone deacetylases and related metalloenzymes, Curr Opin Struct Biol. 21, 735-43. 30. Haberland, M., Montgomery, R. L. & Olson, E. N. (2009) The many roles of histone deacetylases in development and physiology: implications for disease and therapy, Nat Rev Genet. 10, 32-42. 31. Seto, E. & Yoshida, M. (2014) Erasers of histone acetylation: the histone deacetylase enzymes, Cold Spring Harb Perspect Biol. 6, a018713. 32. Legube, G. & Trouche, D. (2003) Regulating histone acetyltransferases and deacetylases, EMBO Rep. 4, 944-7. 33. Marmorstein, R. & Roth, S. Y. (2001) Histone acetyltransferases: function, structure, and catalysis, Curr Opin Genet Dev. 11, 155-61. 34. Friedmann, D. R. & Marmorstein, R. (2013) Structure and mechanism of non-histone protein acetyltransferase enzymes, FEBS J. 280, 5570-81. 35. Yuan, H. & Marmorstein, R. (2013) Histone acetyltransferases: Rising ancient counterparts to protein kinases, Biopolymers. 99, 98-111. 36. Fujisawa, T. & Filippakopoulos, P. (2017) Functions of bromodomain-containing proteins and their roles in homeostasis and cancer, Nat Rev Mol Cell Biol. 18, 246-262. 37. Ferreira, H., Flaus, A. & Owen-Hughes, T. (2007) Histone modifications influence the action of Snf2 family remodelling enzymes by different mechanisms, J Mol Biol. 374, 563-79. 38. Chandy, M., Gutierrez, J. L., Prochasson, P. & Workman, J. L. (2006) SWI/SNF displaces SAGA-acetylated nucleosomes, Eukaryot Cell. 5, 1738-47. 39. Ladurner, A. G., Inouye, C., Jain, R. & Tjian, R. (2003) Bromodomains mediate an acetyl- histone encoded antisilencing function at heterochromatin boundaries, Mol Cell. 11, 365-76. 40. Wu, M., Hayward, D., Kalin, J. H., Song, Y., Schwabe, J. W. & Cole, P. A. (2018) Lysine- 14 acetylation of histone H3 in chromatin confers resistance to the deacetylase and demethylase activities of an epigenetic silencing complex, Elife. 7. 41. Narlikar, G. J., Sundaramoorthy, R. & Owen-Hughes, T. (2013) Mechanisms and functions of ATP-dependent chromatin-remodeling enzymes, Cell. 154, 490-503.

128

42. Sims, J. K. & Wade, P. A. (2011) SnapShot: Chromatin remodeling: CHD, Cell. 144, 626- 626 e1. 43. Filippakopoulos, P. & Knapp, S. (2012) The bromodomain interaction module, FEBS Lett. 586, 2692-704. 44. Dirscherl, S. S. & Krebs, J. E. (2004) Functional diversity of ISWI complexes, Biochem Cell Biol. 82, 482-9. 45. Conaway, R. C. & Conaway, J. W. (2009) The INO80 chromatin remodeling complex in transcription, replication and repair, Trends Biochem Sci. 34, 71-7. 46. Zhang, B., Chambers, K. J., Faller, D. V. & Wang, S. (2007) Reprogramming of the SWI/SNF complex for co-activation or co-repression in prohibitin-mediated estrogen receptor regulation, Oncogene. 26, 7153-7. 47. Brahma, S., Udugama, M. I., Kim, J., Hada, A., Bhardwaj, S. K., Hailu, S. G., Lee, T. H. & Bartholomew, B. (2017) INO80 exchanges H2A.Z for H2A by translocating on DNA proximal to histone dimers, Nat Commun. 8, 15616. 48. Pointner, J., Persson, J., Prasad, P., Norman-Axelsson, U., Stralfors, A., Khorosjutina, O., Krietenstein, N., Svensson, J. P., Ekwall, K. & Korber, P. (2012) CHD1 remodelers regulate nucleosome spacing in vitro and align nucleosomal arrays over gene coding regions in S. pombe, EMBO J. 31, 4388-403. 49. Lieleg, C., Ketterer, P., Nuebler, J., Ludwigsen, J., Gerland, U., Dietz, H., Mueller-Planitz, F. & Korber, P. (2015) Nucleosome spacing generated by ISWI and CHD1 remodelers is constant regardless of nucleosome density, Mol Cell Biol. 35, 1588-605. 50. Kelso, T. W. R., Porter, D. K., Amaral, M. L., Shokhirev, M. N., Benner, C. & Hargreaves, D. C. (2017) Chromatin accessibility underlies synthetic lethality of SWI/SNF subunits in ARID1A-mutant cancers, Elife. 6. 51. Hauk, G., McKnight, J. N., Nodelman, I. M. & Bowman, G. D. (2010) The chromodomains of the Chd1 chromatin remodeler regulate DNA access to the ATPase motor, Mol Cell. 39, 711-23. 52. Yan, L., Wang, L., Tian, Y., Xia, X. & Chen, Z. (2016) Structure and regulation of the chromatin remodeller ISWI, Nature. 540, 466-469. 53. Xia, X., Liu, X., Li, T., Fang, X. & Chen, Z. (2016) Structure of chromatin remodeler Swi2/Snf2 in the resting state, Nat Struct Mol Biol. 23, 722-9. 54. Eustermann, S., Schall, K., Kostrewa, D., Lakomek, K., Strauss, M., Moldt, M. & Hopfner, K. P. (2018) Structural basis for ATP-dependent chromatin remodelling by the INO80 complex, Nature. 556, 386-390. 55. Liu, X., Li, M., Xia, X., Li, X. & Chen, Z. (2017) Mechanism of chromatin remodelling revealed by the Snf2-nucleosome structure, Nature. 544, 440-445. 56. Farnung, L., Vos, S. M., Wigge, C. & Cramer, P. (2017) Nucleosome-Chd1 structure and implications for chromatin remodelling, Nature. 550, 539-542. 57. Spruijt, C. G., Luijsterburg, M. S., Menafra, R., Lindeboom, R. G., Jansen, P. W., Edupuganti, R. R., Baltissen, M. P., Wiegant, W. W., Voelker-Albert, M. C., Matarese, F., Mensinga, A., Poser, I., Vos, H. R., Stunnenberg, H. G., van Attikum, H. & Vermeulen, M. (2016) ZMYND8 Co-localizes with NuRD on Target Genes and Regulates Poly(ADP-Ribose)- Dependent Recruitment of GATAD2A/NuRD to Sites of DNA Damage, Cell Rep. 17, 783- 798. 58. Miccio, A. & Blobel, G. A. (2010) Role of the GATA-1/FOG-1/NuRD pathway in the expression of human beta-like globin genes, Mol Cell Biol. 30, 3460-70. 59. Hendrich, B., Guy, J., Ramsahoye, B., Wilson, V. A. & Bird, A. (2001) Closely related proteins MBD2 and MBD3 play distinctive but interacting roles in mouse development, Genes Dev. 15, 710-23.

129

60. Rais, Y., Zviran, A., Geula, S., Gafni, O., Chomsky, E., Viukov, S., Mansour, A. A., Caspi, I., Krupalnik, V., Zerbib, M., Maza, I., Mor, N., Baran, D., Weinberger, L., Jaitin, D. A., Lara- Astiaso, D., Blecher-Gonen, R., Shipony, Z., Mukamel, Z., Hagai, T., Gilad, S., Amann- Zalcenstein, D., Tanay, A., Amit, I., Novershtern, N. & Hanna, J. H. (2013) Deterministic direct reprogramming of somatic cells to pluripotency, Nature. 502, 65-70. 61. Tong, J. K., Hassig, C. A., Schnitzler, G. R., Kingston, R. E. & Schreiber, S. L. (1998) Chromatin deacetylation by an ATP-dependent nucleosome remodelling complex, Nature. 395, 917-21. 62. Zhang, Y., LeRoy, G., Seelig, H. P., Lane, W. S. & Reinberg, D. (1998) The dermatomyositis-specific autoantigen Mi2 is a component of a complex containing histone deacetylase and nucleosome remodeling activities, Cell. 95, 279-89. 63. Zhang, Y., Iratni, R., Erdjument-Bromage, H., Tempst, P. & Reinberg, D. (1997) Histone deacetylases and SAP18, a novel polypeptide, are components of a human Sin3 complex, Cell. 89, 357-64. 64. Kuzmichev, A., Zhang, Y., Erdjument-Bromage, H., Tempst, P. & Reinberg, D. (2002) Role of the Sin3-histone deacetylase complex in growth regulation by the candidate tumor suppressor p33(ING1), Mol Cell Biol. 22, 835-48. 65. You, A., Tong, J. K., Grozinger, C. M. & Schreiber, S. L. (2001) CoREST is an integral component of the CoREST- human histone deacetylase complex, Proc Natl Acad Sci U S A. 98, 1454-8. 66. Luo, Z., Gao, X., Lin, C., Smith, E. R., Marshall, S. A., Swanson, S. K., Florens, L., Washburn, M. P. & Shilatifard, A. (2015) Zic2 is an enhancer-binding factor required for embryonic stem cell specification, Mol Cell. 57, 685-694. 67. Gharechahi, J., Pakzad, M., Mirshavaladi, S., Sharifitabar, M., Baharvand, H. & Salekdeh, G. H. (2014) The effect of Rho-associated kinase inhibition on the proteome pattern of dissociated human embryonic stem cells, Mol Biosyst. 10, 640-52. 68. Rincon-Arano, H., Halow, J., Delrow, J. J., Parkhurst, S. M. & Groudine, M. (2012) UpSET recruits HDAC complexes and restricts chromatin accessibility and acetylation at promoter regions, Cell. 151, 1214-28. 69. Ee, L. S., McCannell, K. N., Tang, Y., Fernandes, N., Hardy, W. R., Green, M. R., Chu, F. & Fazzio, T. G. (2017) An Embryonic Stem Cell-Specific NuRD Complex Functions through Interaction with WDR5, Stem Cell Reports. 8, 1488-1496. 70. Koipally, J., Renold, A., Kim, J. & Georgopoulos, K. (1999) Repression by Ikaros and Aiolos is mediated through histone deacetylase complexes, EMBO J. 18, 3090-100. 71. Gregory, G. D., Miccio, A., Bersenev, A., Wang, Y., Hong, W., Zhang, Z., Poncz, M., Tong, W. & Blobel, G. A. (2010) FOG1 requires NuRD to promote hematopoiesis and maintain lineage fidelity within the megakaryocytic-erythroid compartment, Blood. 115, 2156-66. 72. Schultz, D. C., Friedman, J. R. & Rauscher, F. J., 3rd (2001) Targeting histone deacetylase complexes via KRAB-zinc finger proteins: the PHD and bromodomains of KAP-1 form a cooperative unit that recruits a novel isoform of the Mi-2alpha subunit of NuRD, Genes Dev. 15, 428-43. 73. Saksouk, N., Barth, T. K., Ziegler-Birling, C., Olova, N., Nowak, A., Rey, E., Mateos- Langerak, J., Urbach, S., Reik, W., Torres-Padilla, M. E., Imhof, A., Dejardin, J. & Simboeck, E. (2014) Redundant mechanisms to form silent chromatin at pericentromeric regions rely on BEND3 and DNA methylation, Mol Cell. 56, 580-94. 74. Zhang, T., Wei, G., Millard, C. J., Fischer, R., Konietzny, R., Kessler, B. M., Schwabe, J. W. R. & Brockdorff, N. (2018) A variant NuRD complex containing PWWP2A/B excludes MBD2/3 to regulate transcription at active genes, Nat Commun. 9, 3798.

130

75. Wade, P. A., Jones, P. L., Vermaak, D. & Wolffe, A. P. (1998) A multiple subunit Mi-2 histone deacetylase from Xenopus laevis cofractionates with an associated Snf2 superfamily ATPase, Curr Biol. 8, 843-6. 76. Williams, C. J., Naito, T., Arco, P. G., Seavitt, J. R., Cashman, S. M., De Souza, B., Qi, X., Keables, P., Von Andrian, U. H. & Georgopoulos, K. (2004) The chromatin remodeler Mi- 2beta is required for CD4 expression and T cell development, Immunity. 20, 719-33. 77. Naito, T., Gomez-Del Arco, P., Williams, C. J. & Georgopoulos, K. (2007) Antagonistic interactions between Ikaros and the chromatin remodeler Mi-2beta determine silencer activity and Cd4 gene expression, Immunity. 27, 723-34. 78. Stevens, T. J., Lando, D., Basu, S., Atkinson, L. P., Cao, Y., Lee, S. F., Leeb, M., Wohlfahrt, K. J., Boucher, W., O'Shaughnessy-Kirwan, A., Cramard, J., Faure, A. J., Ralser, M., Blanco, E., Morey, L., Sanso, M., Palayret, M. G. S., Lehner, B., Di Croce, L., Wutz, A., Hendrich, B., Klenerman, D. & Laue, E. D. (2017) 3D structures of individual mammalian genomes studied by single-cell Hi-C, Nature. 544, 59-64. 79. Miller, A., Ralser, M., Kloet, S. L., Loos, R., Nishinakamura, R., Bertone, P., Vermeulen, M. & Hendrich, B. (2016) Sall4 controls differentiation of pluripotent cells independently of the Nucleosome Remodelling and Deacetylation (NuRD) complex, Development. 143, 3074- 84. 80. Bornelov, S., Reynolds, N., Xenophontos, M., Gharbi, S., Johnstone, E., Floyd, R., Ralser, M., Signolet, J., Loos, R., Dietmann, S., Bertone, P. & Hendrich, B. (2018) The Nucleosome Remodeling and Deacetylation Complex Modulates Chromatin Structure at Sites of Active Transcription to Fine-Tune Gene Expression, Mol Cell. 71, 56-72 e4. 81. Takahashi, K. & Yamanaka, S. (2006) Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors, Cell. 126, 663-76. 82. Esfandiari, F., Ashtiani, M. K., Sharifi-Tabar, M., Saber, M., Daemi, H., Ghanian, M. H., Shahverdi, A. & Baharvand, H. (2017) Microparticle-Mediated Delivery of BMP4 for Generation of Meiosis-Competent Germ Cells from Embryonic Stem Cells, Macromol Biosci. 17. 83. Hanna, J. H., Saha, K. & Jaenisch, R. (2010) Pluripotency and cellular reprogramming: facts, hypotheses, unresolved issues, Cell. 143, 508-25. 84. Miller, A. & Hendrich, B. (2018) Chromatin Remodelling Proteins and Cell Fate Decisions in Mammalian Preimplantation Development, Adv Anat Embryol Cell Biol. 229, 3-14. 85. Hu, G. & Wade, P. A. (2012) NuRD and pluripotency: a complex balancing act, Cell Stem Cell. 10, 497-503. 86. Knock, E., Pereira, J., Lombard, P. D., Dimond, A., Leaford, D., Livesey, F. J. & Hendrich, B. (2015) The methyl binding domain 3/nucleosome remodelling and deacetylase complex regulates neural cell fate determination and terminal differentiation in the cerebral cortex, Neural Dev. 10, 13. 87. Zhu, D., Fang, J., Li, Y. & Zhang, J. (2009) Mbd3, a component of NuRD/Mi-2 complex, helps maintain pluripotency of mouse embryonic stem cells by repressing trophectoderm differentiation, PLoS One. 4, e7684. 88. Mor, N., Rais, Y., Peles, S., Sheban, D., Aguilera-Castrejon, A., Zviran, A., Elinger, D., Viukov, S., Geula, S., Krupalnik, V., Zerbib, M., Chomsky, E., Lasman, L., Shani, T., Bayerl, J., Gafni, O., Hanna, S., Buenrostro, J., Hagai, T., Masika, H., Bergman, Y., Greenleaf, W. J., Esteban, M. A., Levin, Y., Massarwa, R., Merbl, Y., Novershtern, N. & Hanna, J. H. (2018) Neutralizing Gatad2a-Chd4-Mbd3 Axis within the NuRD Complex Facilitates Deterministic Induction of Naive Pluripotency, bioRxiv. 89. Lai, A. Y. & Wade, P. A. (2011) Cancer biology and NuRD: a multifaceted chromatin remodelling complex, Nat Rev Cancer. 11, 588-96.

131

90. Wang, Y., Zhang, H., Chen, Y., Sun, Y., Yang, F., Yu, W., Liang, J., Sun, L., Yang, X., Shi, L., Li, R., Li, Y., Zhang, Y., Li, Q., Yi, X. & Shang, Y. (2009) LSD1 is a subunit of the NuRD complex and targets the metastasis programs in breast cancer, Cell. 138, 660-72. 91. Yamada, T., Yang, Y., Hemberg, M., Yoshida, T., Cho, H. Y., Murphy, J. P., Fioravante, D., Regehr, W. G., Gygi, S. P., Georgopoulos, K. & Bonni, A. (2014) Promoter decommissioning by the NuRD chromatin remodeling complex triggers synaptic connectivity in the mammalian brain, Neuron. 83, 122-34. 92. Cai, Y., Geutjes, E. J., de Lint, K., Roepman, P., Bruurs, L., Yu, L. R., Wang, W., van Blijswijk, J., Mohammad, H., de Rink, I., Bernards, R. & Baylin, S. B. (2014) The NuRD complex cooperates with DNMTs to maintain silencing of key colorectal tumor suppressor genes, Oncogene. 33, 2157-68. 93. Li, D. Q., Pakala, S. B., Nair, S. S., Eswaran, J. & Kumar, R. (2012) Metastasis-associated protein 1/nucleosome remodeling and histone deacetylase complex in cancer, Cancer Res. 72, 387-94. 94. Kaur, E., Gupta, S. & Dutt, S. (2014) Clinical implications of MTA proteins in human cancer, Cancer Metastasis Rev. 33, 1017-24. 95. Postmus, P. E. & Green, M. R. (1999) Overview of MTA in the treatment of non-small cell lung cancer, Semin Oncol. 26, 31-6. 96. Sen, N., Gui, B. & Kumar, R. (2014) Role of MTA1 in cancer progression and metastasis, Cancer Metastasis Rev. 33, 879-89. 97. Kumar, R. (2003) Another tie that binds the MTA family to breast cancer, Cell. 113, 142- 3. 98. Sen, N., Gui, B. & Kumar, R. (2014) Physiological functions of MTA family of proteins, Cancer Metastasis Rev. 33, 869-77. 99. Fujita, N., Jaye, D. L., Kajita, M., Geigerman, C., Moreno, C. S. & Wade, P. A. (2003) MTA3, a Mi-2/NuRD complex subunit, regulates an invasive growth pathway in breast cancer, Cell. 113, 207-19. 100. Zeng, H., Zhang, J. M., Du, Y., Wang, J., Ren, Y., Li, M., Li, H., Cai, Z., Chu, Q. & Yang, C. (2016) Crosstalk between ATF4 and MTA1/HDAC1 promotes osteosarcoma progression, Oncotarget. 7, 7329-42. 101. Luo, J., Su, F., Chen, D., Shiloh, A. & Gu, W. (2000) Deacetylation of p53 modulates its effect on cell growth and apoptosis, Nature. 408, 377-81. 102. Yoo, Y. G., Kong, G. & Lee, M. O. (2006) Metastasis-associated protein 1 enhances stability of hypoxia-inducible factor-1alpha protein by recruiting histone deacetylase 1, EMBO J. 25, 1231-41. 103. Zhao, S., Choi, M., Overton, J. D., Bellone, S., Roque, D. M., Cocco, E., Guzzo, F., English, D. P., Varughese, J., Gasparrini, S., Bortolomai, I., Buza, N., Hui, P., Abu-Khalaf, M., Ravaggi, A., Bignotti, E., Bandiera, E., Romani, C., Todeschini, P., Tassi, R., Zanotti, L., Carrara, L., Pecorelli, S., Silasi, D. A., Ratner, E., Azodi, M., Schwartz, P. E., Rutherford, T. J., Stiegler, A. L., Mane, S., Boggon, T. J., Schlessinger, J., Lifton, R. P. & Santin, A. D. (2013) Landscape of somatic single-nucleotide and copy-number mutations in uterine serous carcinoma, Proc Natl Acad Sci U S A. 110, 2916-21. 104. Kim, M. S., Chung, N. G., Kang, M. R., Yoo, N. J. & Lee, S. H. (2011) Genetic and expressional alterations of CHD genes in gastric and colorectal cancers, Histopathology. 58, 660-8. 105. Sperlazza, J., Rahmani, M., Beckta, J., Aust, M., Hawkins, E., Wang, S. Z., Zu Zhu, S., Podder, S., Dumur, C., Archer, K., Grant, S. & Ginder, G. D. (2015) Depletion of the chromatin remodeler CHD4 sensitizes AML blasts to genotoxic agents and reduces tumor formation, Blood. 126, 1462-72.

132

106. Futreal, P. A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N. & Stratton, M. R. (2004) A census of human cancer genes, Nat Rev Cancer. 4, 177-83. 107. Burd, C. J., Kinyamu, H. K., Miller, F. W. & Archer, T. K. (2008) UV radiation regulates Mi-2 through protein translation and stability, J Biol Chem. 283, 34976-82. 108. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) Basic local alignment search tool, J Mol Biol. 215, 403-10. 109. Manavathi, B., Singh, K. & Kumar, R. (2007) MTA family of coregulators in nuclear receptor biology and pathology, Nucl Recept Signal. 5, e010. 110. Humphrey, G. W., Wang, Y., Russanova, V. R., Hirai, T., Qin, J., Nakatani, Y. & Howard, B. H. (2001) Stable histone deacetylase complexes distinguished by the presence of SANT domain proteins CoREST/kiaa0071 and Mta-L1, J Biol Chem. 276, 6817-24. 111. Armache, K. J., Garlick, J. D., Canzio, D., Narlikar, G. J. & Kingston, R. E. (2011) Structural basis of silencing: Sir3 BAH domain in complex with a nucleosome at 3.0 A resolution, Science. 334, 977-82. 112. Millard, C. J., Watson, P. J., Celardo, I., Gordiyenko, Y., Cowley, S. M., Robinson, C. V., Fairall, L. & Schwabe, J. W. (2013) Class I HDACs share a common mechanism of regulation by inositol phosphates, Mol Cell. 51, 57-67. 113. Watson, P. J., Millard, C. J., Riley, A. M., Robertson, N. S., Wright, L. C., Godage, H. Y., Cowley, S. M., Jamieson, A. G., Potter, B. V. & Schwabe, J. W. (2016) Insights into the activation mechanism of class I HDAC complexes by inositol phosphates, Nat Commun. 7, 11262. 114. Hayakawa, T. & Nakayama, J. (2011) Physiological roles of class I HDAC complex and histone demethylase, J Biomed Biotechnol. 2011, 129383. 115. Ordentlich, P., Downes, M., Xie, W., Genin, A., Spinner, N. B. & Evans, R. M. (1999) Unique forms of human and mouse nuclear receptor SMRT, Proc Natl Acad Sci U S A. 96, 2639-44. 116. Colon-Diaz, M., Baez-Vega, P., Garcia, M., Ruiz, A., Monteiro, J. B., Fourquet, J., Bayona, M., Alvarez-Garriga, C., Achille, A., Seto, E. & Flores, I. (2012) HDAC1 and HDAC2 are differentially expressed in endometriosis, Reprod Sci. 19, 483-92. 117. Ma, P. & Schultz, R. M. (2016) HDAC1 and HDAC2 in mouse oocytes and preimplantation embryos: Specificity versus compensation, Cell Death Differ. 23, 1119-27. 118. Schroeder, F. A., Gilbert, T. M., Feng, N., Taillon, B. D., Volkow, N. D., Innis, R. B., Hooker, J. M. & Lipska, B. K. (2017) Expression of HDAC2 but Not HDAC1 Transcript Is Reduced in Dorsolateral Prefrontal Cortex of Patients with Schizophrenia, ACS Chem Neurosci. 8, 662-668. 119. Flaus, A., Martin, D. M., Barton, G. J. & Owen-Hughes, T. (2006) Identification of multiple distinct Snf2 subfamilies with conserved structural motifs, Nucleic Acids Res. 34, 2887-905. 120. Murawska, M. & Brehm, A. (2011) CHD chromatin remodelers and the transcription cycle, Transcription. 2, 244-53. 121. Tyagi, M., Imam, N., Verma, K. & Patel, A. K. (2016) Chromatin remodelers: We are the drivers!!, Nucleus. 7, 388-404. 122. Musselman, C. A., Ramirez, J., Sims, J. K., Mansfield, R. E., Oliver, S. S., Denu, J. M., Mackay, J. P., Wade, P. A., Hagman, J. & Kutateladze, T. G. (2012) Bivalent recognition of nucleosomes by the tandem PHD fingers of the CHD4 ATPase is required for CHD4-mediated repression, Proc Natl Acad Sci U S A. 109, 787-92. 123. Mansfield, R. E., Musselman, C. A., Kwan, A. H., Oliver, S. S., Garske, A. L., Davrazou, F., Denu, J. M., Kutateladze, T. G. & Mackay, J. P. (2011) Plant homeodomain (PHD) fingers of CHD4 are histone H3-binding modules with preference for unmodified H3K4 and methylated H3K9, J Biol Chem. 286, 11779-91.

133

124. Ramirez, J., Dege, C., Kutateladze, T. G. & Hagman, J. (2012) MBD2 and multiple domains of CHD4 are required for transcriptional repression by Mi-2/NuRD complexes, Mol Cell Biol. 32, 5078-88. 125. Watson, A. A., Mahajan, P., Mertens, H. D., Deery, M. J., Zhang, W., Pham, P., Du, X., Bartke, T., Zhang, W., Edlich, C., Berridge, G., Chen, Y., Burgess-Brown, N. A., Kouzarides, T., Wiechens, N., Owen-Hughes, T., Svergun, D. I., Gileadi, O. & Laue, E. D. (2012) The PHD and chromo domains regulate the ATPase activity of the human chromatin remodeler CHD4, J Mol Biol. 422, 3-17. 126. Uhlen, M., Hallstrom, B. M., Lindskog, C., Mardinoglu, A., Ponten, F. & Nielsen, J. (2016) Transcriptomics resources of human tissues and organs, Mol Syst Biol. 12, 862. 127. Thompson, P. M., Gotoh, T., Kok, M., White, P. S. & Brodeur, G. M. (2003) CHD5, a new member of the chromodomain gene family, is preferentially expressed in the nervous system, Oncogene. 22, 1002-11. 128. Silva, A. P., Ryan, D. P., Galanty, Y., Low, J. K., Vandevenne, M., Jackson, S. P. & Mackay, J. P. (2016) The N-terminal Region of Chromodomain Helicase DNA-binding Protein 4 (CHD4) Is Essential for Activity and Contains a High Mobility Group (HMG) Box-like- domain That Can Bind Poly(ADP-ribose), J Biol Chem. 291, 924-38. 129. Brackertz, M., Boeke, J., Zhang, R. & Renkawitz, R. (2002) Two highly related p66 proteins comprise a new family of potent transcriptional repressors interacting with MBD2 and MBD3, J Biol Chem. 277, 40958-66. 130. Brackertz, M., Gong, Z., Leers, J. & Renkawitz, R. (2006) p66alpha and p66beta of the Mi-2/NuRD complex mediate MBD2 and histone interaction, Nucleic Acids Res. 34, 397-406. 131. Feng, Q., Cao, R., Xia, L., Erdjument-Bromage, H., Tempst, P. & Zhang, Y. (2002) Identification and functional characterization of the p66/p68 components of the MeCP1 complex, Mol Cell Biol. 22, 536-46. 132. Gnanapragasam, M. N., Scarsdale, J. N., Amaya, M. L., Webb, H. D., Desai, M. A., Walavalkar, N. M., Wang, S. Z., Zu Zhu, S., Ginder, G. D. & Williams, D. C., Jr. (2011) p66Alpha-MBD2 coiled-coil interaction and recruitment of Mi-2 are critical for globin gene silencing by the MBD2-NuRD complex, Proc Natl Acad Sci U S A. 108, 7487-92. 133. Cramer, J. M., Scarsdale, J. N., Walavalkar, N. M., Buchwald, W. A., Ginder, G. D. & Williams, D. C., Jr. (2014) Probing the dynamic distribution of bound states for methylcytosine-binding domains on DNA, J Biol Chem. 289, 1294-302. 134. Le Guezennec, X., Vermeulen, M., Brinkman, A. B., Hoeijmakers, W. A., Cohen, A., Lasonder, E. & Stunnenberg, H. G. (2006) MBD2/NuRD and MBD3/NuRD, two distinct complexes with different biochemical and functional properties, Mol Cell Biol. 26, 843-51. 135. Menafra, R. & Stunnenberg, H. G. (2014) MBD2 and MBD3: elusive functions and mechanisms, Front Genet. 5, 428. 136. Yildirim, O., Li, R., Hung, J. H., Chen, P. B., Dong, X., Ee, L. S., Weng, Z., Rando, O. J. & Fazzio, T. G. (2011) Mbd3/NURD complex regulates expression of 5- hydroxymethylcytosine marked genes in embryonic stem cells, Cell. 147, 1498-510. 137. Qian, Y. W., Wang, Y. C., Hollingsworth, R. E., Jr., Jones, D., Ling, N. & Lee, E. Y. (1993) A retinoblastoma-binding protein related to a negative regulator of Ras in yeast, Nature. 364, 648-52. 138. Lejon, S., Thong, S. Y., Murthy, A., AlQarni, S., Murzina, N. V., Blobel, G. A., Laue, E. D. & Mackay, J. P. (2011) Insights into association of the NuRD complex with FOG-1 from the crystal structure of an RbAp48.FOG-1 complex, J Biol Chem. 286, 1196-203. 139. Alqarni, S. S., Murthy, A., Zhang, W., Przewloka, M. R., Silva, A. P., Watson, A. A., Lejon, S., Pei, X. Y., Smits, A. H., Kloet, S. L., Wang, H., Shepherd, N. E., Stokes, P. H., Blobel, G. A., Vermeulen, M., Glover, D. M., Mackay, J. P. & Laue, E. D. (2014) Insight into

134

the architecture of the NuRD complex: structure of the RbAp48-MTA1 subcomplex, J Biol Chem. 289, 21844-55. 140. Barak, O., Lazzaro, M. A., Lane, W. S., Speicher, D. W., Picketts, D. J. & Shiekhattar, R. (2003) Isolation of human NURF: a regulator of Engrailed gene expression, EMBO J. 22, 6089-100. 141. Kuzmichev, A., Nishioka, K., Erdjument-Bromage, H., Tempst, P. & Reinberg, D. (2002) Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein, Genes Dev. 16, 2893-905. 142. Kaufman, P. D. (1996) Nucleosome assembly: the CAF and the HAT, Curr Opin Cell Biol. 8, 369-73. 143. Verreault, A., Kaufman, P. D., Kobayashi, R. & Stillman, B. (1998) Nucleosomal DNA regulates the core-histone-binding subunit of the human Hat1 acetyltransferase, Curr Biol. 8, 96-108. 144. Daigo, Y., Suzuki, K., Maruyama, O., Miyoshi, Y., Yasuda, T., Kabuto, T., Imaoka, S., Fujiwara, T., Takahashi, E., Fujino, M. A. & Nakamura, Y. (1997) Isolation, mapping and mutation analysis of a human cDNA homologous to the doc-1 gene of the Chinese hamster, a candidate tumor suppressor for oral cancer, Genes Cancer. 20, 204-7. 145. Ertekin, A., Aramini, J. M., Rossi, P., Leonard, P. G., Janjua, H., Xiao, R., Maglaqui, M., Lee, H. W., Prestegard, J. H. & Montelione, G. T. (2012) Human cyclin-dependent kinase 2- associated protein 1 (CDK2AP1) is dimeric in its disulfide-reduced state, with natively disordered N-terminal region, J Biol Chem. 287, 16541-9. 146. Mohd-Sarip, A., Teeuwssen, M., Bot, A. G., De Herdt, M. J., Willems, S. M., Baatenburg de Jong, R. J., Looijenga, L. H. J., Zatreanu, D., Bezstarosti, K., van Riet, J., Oole, E., van Ijcken, W. F. J., van de Werken, H. J. G., Demmers, J. A., Fodde, R. & Verrijzer, C. P. (2017) DOC1-Dependent Recruitment of NURD Reveals Antagonism with SWI/SNF during Epithelial-Mesenchymal Transition in Oral Cancer Cells, Cell Rep. 20, 61-75. 147. Gibson, D. G., Young, L., Chuang, R. Y., Venter, J. C., Hutchison, C. A., 3rd & Smith, H. O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases, Nat Methods. 6, 343-5. 148. Leitner, A., Joachimiak, L. A., Unverdorben, P., Walzthoeni, T., Frydman, J., Forster, F. & Aebersold, R. (2014) Chemical cross-linking/mass spectrometry targeting acidic residues in proteins and protein complexes, Proc Natl Acad Sci U S A. 111, 9455-60. 149. MacLean, B., Tomazela, D. M., Shulman, N., Chambers, M., Finney, G. L., Frewen, B., Kern, R., Tabb, D. L., Liebler, D. C. & MacCoss, M. J. (2010) Skyline: an open source document editor for creating and analyzing targeted proteomics experiments, Bioinformatics. 26, 966-8. 150. Chambers, M. C., Maclean, B., Burke, R., Amodei, D., Ruderman, D. L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J., Hoff, K., Kessner, D., Tasman, N., Shulman, N., Frewen, B., Baker, T. A., Brusniak, M. Y., Paulse, C., Creasy, D., Flashner, L., Kani, K., Moulding, C., Seymour, S. L., Nuwaysir, L. M., Lefebvre, B., Kuhlmann, F., Roark, J., Rainer, P., Detlev, S., Hemenway, T., Huhmer, A., Langridge, J., Connolly, B., Chadick, T., Holly, K., Eckels, J., Deutsch, E. W., Moritz, R. L., Katz, J. E., Agus, D. B., MacCoss, M., Tabb, D. L. & Mallick, P. (2012) A cross-platform toolkit for mass spectrometry and proteomics, Nat Biotechnol. 30, 918-20. 151. Fan, S. B., Meng, J. M., Lu, S., Zhang, K., Yang, H., Chi, H., Sun, R. X., Dong, M. Q. & He, S. M. (2015) Using pLink to Analyze Cross-Linked Peptides, Curr Protoc Bioinformatics. 49, 8 21 1-19. 152. Chen, Z. A., Jawhari, A., Fischer, L., Buchen, C., Tahir, S., Kamenski, T., Rasmussen, M., Lariviere, L., Bukowski-Wills, J. C., Nilges, M., Cramer, P. & Rappsilber, J. (2010)

135

Architecture of the RNA polymerase II-TFIIF complex revealed by cross-linking and mass spectrometry, EMBO J. 29, 717-26. 153. Kosinski, J., von Appen, A., Ori, A., Karius, K., Muller, C. W. & Beck, M. (2015) Xlink Analyzer: software for analysis and visualization of cross-linking data in the context of three- dimensional structures, J Struct Biol. 189, 177-83. 154. Li, D., Fu, Y., Sun, R., Ling, C. X., Wei, Y., Zhou, H., Zeng, R., Yang, Q., He, S. & Gao, W. (2005) pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry, Bioinformatics. 21, 3049-50. 155. Cox, J., Neuhauser, N., Michalski, A., Scheltema, R. A., Olsen, J. V. & Mann, M. (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment, J Proteome Res. 10, 1794-805. 156. Cox, J. & Mann, M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification, Nat Biotechnol. 26, 1367-72. 157. Frewen, B. & MacCoss, M. J. (2007) Using BiblioSpec for creating and searching tandem MS peptide libraries, Curr Protoc Bioinformatics. Chapter 13, Unit 13 7. 158. Saha, A., Wittmeyer, J. & Cairns, B. R. (2006) Mechanisms for nucleosome movement by ATP-dependent chromatin remodeling complexes, Results Probl Cell Differ. 41, 127-48. 159. Johnson, C. N., Adkins, N. L. & Georgel, P. (2005) Chromatin remodeling complexes: ATP-dependent machines in action, Biochem Cell Biol. 83, 405-17. 160. Sarnowska, E., Gratkowska, D. M., Sacharowski, S. P., Cwiek, P., Tohge, T., Fernie, A. R., Siedlecki, J. A., Koncz, C. & Sarnowski, T. J. (2016) The Role of SWI/SNF Chromatin Remodeling Complexes in Hormone Crosstalk, Trends Plant Sci. 21, 594-608. 161. Bode, D., Yu, L., Tate, P., Pardo, M. & Choudhary, J. (2016) Characterization of Two Distinct Nucleosome Remodeling and Deacetylase (NuRD) Complex Assemblies in Embryonic Stem Cells, Mol Cell Proteomics. 15, 878-91. 162. Low, J. K., Webb, S. R., Silva, A. P., Saathoff, H., Ryan, D. P., Torrado, M., Brofelth, M., Parker, B. L., Shepherd, N. E. & Mackay, J. P. (2016) CHD4 Is a Peripheral Component of the Nucleosome Remodeling and Deacetylase Complex, J Biol Chem. 291, 15853-66. 163. Schmidberger, J. W., Sharifi Tabar, M., Torrado, M., Silva, A. P., Landsberg, M. J., Brillault, L., AlQarni, S., Zeng, Y. C., Parker, B. L., Low, J. K. & Mackay, J. P. (2016) The MTA1 subunit of the nucleosome remodeling and deacetylase complex can recruit two copies of RBBP4/7, Protein Sci. 25, 1472-82. 164. Torchy, M. P., Hamiche, A. & Klaholz, B. P. (2015) Structure and function insights into the NuRD chromatin remodeling complex, Cell Mol Life Sci. 72, 2491-507. 165. Ramirez, J. & Hagman, J. (2009) The Mi-2/NuRD complex: a critical epigenetic regulator of hematopoietic development, differentiation and cancer, Epigenetics. 4, 532-6. 166. Yang, Y., Yamada, T., Hill, K. K., Hemberg, M., Reddy, N. C., Cho, H. Y., Guthrie, A. N., Oldenborg, A., Heiney, S. A., Ohmae, S., Medina, J. F., Holy, T. E. & Bonni, A. (2016) Chromatin remodeling inactivates activity genes and regulates neural coding, Science. 353, 300-305. 167. Whyte, W. A., Bilodeau, S., Orlando, D. A., Hoke, H. A., Frampton, G. M., Foster, C. T., Cowley, S. M. & Young, R. A. (2012) Enhancer decommissioning by LSD1 during embryonic stem cell differentiation, Nature. 482, 221-5. 168. Kloet, S. L., Baymaz, H. I., Makowski, M., Groenewold, V., Jansen, P. W., Berendsen, M., Niazi, H., Kops, G. J. & Vermeulen, M. (2015) Towards elucidating the stability, dynamics and architecture of the nucleosome remodeling and deacetylase complex by using quantitative interaction proteomics, FEBS J. 282, 1774-85.

136

169. Smits, A. H., Jansen, P. W., Poser, I., Hyman, A. A. & Vermeulen, M. (2013) Stoichiometry of chromatin-associated protein complexes revealed by label-free quantitative mass spectrometry-based proteomics, Nucleic Acids Res. 41, e28. 170. Sharifi Tabar, M., Mackay, J. P. & Low, J. K. K. (2019) The stoichiometry and interactome of the Nucleosome Remodeling and Deacetylase (NuRD) complex are conserved across multiple cell lines, FEBS J. 171. Millard, C. J., Varma, N., Saleh, A., Morris, K., Watson, P. J., Bottrill, A. R., Fairall, L., Smith, C. J. & Schwabe, J. W. (2016) The structure of the core NuRD repression complex provides insights into its interaction with chromatin, Elife. 5, e13941. 172. Ori, A., Iskar, M., Buczak, K., Kastritis, P., Parca, L., Andres-Pons, A., Singer, S., Bork, P. & Beck, M. (2016) Spatiotemporal variation of mammalian protein complex stoichiometries, Genome Biol. 17, 47. 173. Schneider, M., Belsom, A., Rappsilber, J. & Brock, O. (2016) Blind testing of cross- linking/mass spectrometry hybrid methods in CASP11, Proteins. 84 Suppl 1, 152-63. 174. Clark, M. D., Marcum, R., Graveline, R., Chan, C. W., Xie, T., Chen, Z., Ding, Y., Zhang, Y., Mondragon, A., David, G. & Radhakrishnan, I. (2015) Structural insights into the assembly of the histone deacetylase-associated Sin3L/Rpd3L corepressor complex, Proc Natl Acad Sci U S A. 112, E3669-78. 175. Margueron, R. & Reinberg, D. (2011) The Polycomb complex PRC2 and its mark in life, Nature. 469, 343-9. 176. Silverstein, R. A. & Ekwall, K. (2005) Sin3: a flexible regulator of global gene expression and genome stability, Curr Genet. 47, 1-17. 177. Uhlen, M., Fagerberg, L., Hallstrom, B. M., Lindskog, C., Oksvold, P., Mardinoglu, A., Sivertsson, A., Kampf, C., Sjostedt, E., Asplund, A., Olsson, I., Edlund, K., Lundberg, E., Navani, S., Szigyarto, C. A., Odeberg, J., Djureinovic, D., Takanen, J. O., Hober, S., Alm, T., Edqvist, P. H., Berling, H., Tegel, H., Mulder, J., Rockberg, J., Nilsson, P., Schwenk, J. M., Hamsten, M., von Feilitzen, K., Forsberg, M., Persson, L., Johansson, F., Zwahlen, M., von Heijne, G., Nielsen, J. & Ponten, F. (2015) Proteomics. Tissue-based map of the human proteome, Science. 347, 1260419. 178. Hong, W., Nakazawa, M., Chen, Y. Y., Kori, R., Vakoc, C. R., Rakowski, C. & Blobel, G. A. (2005) FOG-1 recruits the NuRD repressor complex to mediate transcriptional repression by GATA-1, EMBO J. 24, 2367-78. 179. Saathoff, H., Brofelth, M., Trinh, A., Parker, B. L., Ryan, D. P., Low, J. K., Webb, S. R., Silva, A. P., Mackay, J. P. & Shepherd, N. E. (2015) A peptide affinity reagent for isolating an intact and catalytically active multi-protein complex from mammalian cells, Bioorg Med Chem. 23, 960-5. 180. Shen, Z., Wang, B., Luo, J., Jiang, K., Zhang, H., Mustonen, H., Puolakkainen, P., Zhu, J., Ye, Y. & Wang, S. (2016) Global-scale profiling of differential expressed lysine acetylated proteins in colorectal cancer tumors and paired liver metastases, J Proteomics. 142, 24-32. 181. Zhang, W., Aubert, A., Gomez de Segura, J. M., Karuppasamy, M., Basu, S., Murthy, A. S., Diamante, A., Drury, T. A., Balmer, J., Cramard, J., Watson, A. A., Lando, D., Lee, S. F., Palayret, M., Kloet, S. L., Smits, A. H., Deery, M. J., Vermeulen, M., Hendrich, B., Klenerman, D., Schaffitzel, C., Berger, I. & Laue, E. D. (2016) The Nucleosome Remodeling and Deacetylase Complex NuRD Is Built from Preformed Catalytically Active Sub-modules, J Mol Biol. 428, 2931-42. 182. Basta, J. M., Robbins, L., Denner, D. R., Kolar, G. R. & Rauchman, M. (2017) A Sall1- NuRD interaction regulates multipotent nephron progenitors and is required for loop of Henle formation, Development. 144, 3080-3094.

137

183. Lambert, J. P., Tucholska, M., Go, C., Knight, J. D. & Gingras, A. C. (2015) Proximity biotinylation and affinity purification are complementary approaches for the interactome mapping of chromatin-associated protein complexes, J Proteomics. 118, 81-94. 184. Liu, Z., Li, F., Zhang, B., Li, S., Wu, J. & Shi, Y. (2015) Structural basis of plant homeodomain finger 6 (PHF6) recognition by the retinoblastoma binding protein 4 (RBBP4) component of the nucleosome remodeling and deacetylase (NuRD) complex, J Biol Chem. 290, 6630-8. 185. Mellacheruvu, D., Wright, Z., Couzens, A. L., Lambert, J. P., St-Denis, N. A., Li, T., Miteva, Y. V., Hauri, S., Sardiu, M. E., Low, T. Y., Halim, V. A., Bagshaw, R. D., Hubner, N. C., Al-Hakim, A., Bouchard, A., Faubert, D., Fermin, D., Dunham, W. H., Goudreault, M., Lin, Z. Y., Badillo, B. G., Pawson, T., Durocher, D., Coulombe, B., Aebersold, R., Superti- Furga, G., Colinge, J., Heck, A. J., Choi, H., Gstaiger, M., Mohammed, S., Cristea, I. M., Bennett, K. L., Washburn, M. P., Raught, B., Ewing, R. M., Gingras, A. C. & Nesvizhskii, A. I. (2013) The CRAPome: a contaminant repository for affinity purification-mass spectrometry data, Nat Methods. 10, 730-6. 186. Chatr-Aryamontri, A., Oughtred, R., Boucher, L., Rust, J., Chang, C., Kolas, N. K., O'Donnell, L., Oster, S., Theesfeld, C., Sellam, A., Stark, C., Breitkreutz, B. J., Dolinski, K. & Tyers, M. (2017) The BioGRID interaction database: 2017 update, Nucleic Acids Res. 45, D369-D379. 187. Torrado, M., Low, J. K. K., Silva, A. P. G., Schmidberger, J. W., Sana, M., Sharifi Tabar, M., Isilak, M. E., Winning, C. S., Kwong, C., Bedward, M. J., Sperlazza, M. J., Williams, D. C., Jr., Shepherd, N. E. & Mackay, J. P. (2017) Refinement of the subunit interaction network within the nucleosome remodelling and deacetylase (NuRD) complex, FEBS J. 284, 4216- 4232. 188. Sakai, T., Hino, K., Wada, S. & Maeda, H. (2003) Identification of the DNA binding specificity of the human ZNF219 protein and its function as a transcriptional repressor, DNA Res. 10, 155-65. 189. Banach-Orlowska, M., Pilecka, I., Torun, A., Pyrzynska, B. & Miaczynska, M. (2009) Functional characterization of the interactions between endosomal adaptor protein APPL1 and the NuRD co-repressor complex, Biochem J. 423, 389-400. 190. Alqarni, S. S. M., Murthy, A., Zhang, W., Przewloka, M. R., Silva, A. P. G., Watson, A. A., Lejon, S., Pei, X. Y., Smits, A. H., Kloet, S. L., Wang, H. X., Shepherd, N. E., Stokes, P. H., Blobel, G. A., Vermeulen, M., Glover, D. M., Mackay, J. P. & Laue, E. D. (2014) Insight into the Architecture of the NuRD Complex: structure of the RbAp48-MTA1 subcomplex, Journal of Biological Chemistry. 289, 21844-21855. 191. Bononi, A., Giorgi, C., Patergnani, S., Larson, D., Verbruggen, K., Tanji, M., Pellegrini, L., Signorato, V., Olivetto, F., Pastorino, S., Nasu, M., Napolitano, A., Gaudino, G., Morris, P., Sakamoto, G., Ferris, L. K., Danese, A., Raimondi, A., Tacchetti, C., Kuchay, S., Pass, H. I., Affar, E. B., Yang, H., Pinton, P. & Carbone, M. (2017) BAP1 regulates IP3R3-mediated Ca(2+) flux to mitochondria suppressing cell transformation, Nature. 546, 549-553. 192. Ori, A., Banterle, N., Iskar, M., Andres-Pons, A., Escher, C., Khanh Bui, H., Sparks, L., Solis-Mezarino, V., Rinner, O., Bork, P., Lemke, E. A. & Beck, M. (2013) Cell type-specific nuclear pores: a case in point for context-dependent stoichiometry of molecular machines, Mol Syst Biol. 9, 648. 193. Sandoval, P. C., Slentz, D. H., Pisitkun, T., Saeed, F., Hoffert, J. D. & Knepper, M. A. (2013) Proteome-wide measurement of protein half-lives and translation rates in vasopressin- sensitive collecting duct cells, J Am Soc Nephrol. 24, 1793-805. 194. Lessard, J., Wu, J. I., Ranish, J. A., Wan, M., Winslow, M. M., Staahl, B. T., Wu, H., Aebersold, R., Graef, I. A. & Crabtree, G. R. (2007) An essential switch in subunit composition of a chromatin remodeling complex during neural development, Neuron. 55, 201-15.

138

195. Dos Santos, R. L., Tosti, L., Radzisheuskaya, A., Caballero, I. M., Kaji, K., Hendrich, B. & Silva, J. C. R. (2014) MBD3/NuRD Facilitates Induction of Pluripotency in a Context- Dependent Manner, Cell Stem Cell. 15, 392. 196. Ayala, R., Willhoft, O., Aramayo, R. J., Wilkinson, M., McCormack, E. A., Ocloo, L., Wigley, D. B. & Zhang, X. (2018) Structure and regulation of the human INO80-nucleosome complex, Nature. 556, 391-395. 197. He, S., Tong, X., Han, M., Hu, H. & Dai, F. (2018) Genome-Wide Identification and Characterization of WD40 Protein Genes in the Silkworm, Bombyx mori, Int J Mol Sci. 19. 198. Link, S., Spitzer, R. M. M., Sana, M., Torrado, M., Völker-Albert, M. C., Keilhauer, E. C., Burgold, T., Pünzeler, S., Low, J. K. K., Lindström, I., Nist, A., Regnard, C., Stiewe, T., Hendrich, B., Imhof, A., Mann, M., Mackay, J. P., Bartkuhn, M. & Hake, S. B. (2018) PWWP2A binds distinct chromatin moieties and interacts with an MTA1-specific core NuRD complex, Nature Communications. 9, 4300. 199. Liu, K., Xu, C., Lei, M., Yang, A., Loppnau, P., Hughes, T. R. & Min, J. (2018) Structural basis for the ability of MBD domains to bind methyl-CG and TG sites in DNA, J Biol Chem. 293, 7344-7354. 200. Mor, N., Rais, Y., Sheban, D., Peles, S., Aguilera-Castrejon, A., Zviran, A., Elinger, D., Viukov, S., Geula, S., Krupalnik, V., Zerbib, M., Chomsky, E., Lasman, L., Shani, T., Bayerl, J., Gafni, O., Hanna, S., Buenrostro, J. D., Hagai, T., Masika, H., Vainorius, G., Bergman, Y., Greenleaf, W. J., Esteban, M. A., Elling, U., Levin, Y., Massarwa, R., Merbl, Y., Novershtern, N. & Hanna, J. H. (2018) Neutralizing Gatad2a-Chd4-Mbd3/NuRD Complex Facilitates Deterministic Induction of Naive Pluripotency, Cell Stem Cell. 23, 412-+. 201. Nitarska, J., Smith, J. G., Sherlock, W. T., Hillege, M. M., Nott, A., Barshop, W. D., Vashisht, A. A., Wohlschlegel, J. A., Mitter, R. & Riccio, A. (2016) A Functional Switch of NuRD Chromatin Remodeling Complex Subunits Regulates Mouse Cortical Development, Cell Rep. 17, 1683-1698. 202. Stone, A. B. (1974) A simplified method for preparing sucrose gradients, Biochem J. 137, 117-8. 203. Vizcaino, J. A., Cote, R. G., Csordas, A., Dianes, J. A., Fabregat, A., Foster, J. M., Griss, J., Alpi, E., Birim, M., Contell, J., O'Kelly, G., Schoenegger, A., Ovelleiro, D., Perez-Riverol, Y., Reisinger, F., Rios, D., Wang, R. & Hermjakob, H. (2013) The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013, Nucleic Acids Res. 41, D1063-9. 204. Millard, C. J., Fairall, L. & Schwabe, J. W. (2014) Towards an understanding of the structure and function of MTA1, Cancer Metastasis Rev. 33, 857-67. 205. Zhou, P., Xia, J., Zhou, Y. J., Wan, J., Li, L., Bao, J., Shi, Y. J. & Bu, H. (2015) Proportions of acetyl-histone-positive hepatocytes indicate the functional status and prognosis of cirrhotic patients, World J Gastroenterol. 21, 6665-74. 206. Gillet, L. C., Navarro, P., Tate, S., Rost, H., Selevsek, N., Reiter, L., Bonner, R. & Aebersold, R. (2012) Targeted data extraction of the MS/MS spectra generated by data- independent acquisition: a new concept for consistent and accurate proteome analysis, Mol Cell Proteomics. 11, O111 016717. 207. Venable, J. D., Dong, M. Q., Wohlschlegel, J., Dillin, A. & Yates, J. R. (2004) Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra, Nat Methods. 1, 39-45. 208. Martins-de-Souza, D., Faca, V. M. & Gozzo, F. C. (2017) DIA is not a new mass spectrometry acquisition method, Proteomics. 17. 209. Collins, B. C., Hunter, C. L., Liu, Y., Schilling, B., Rosenberger, G., Bader, S. L., Chan, D. W., Gibson, B. W., Gingras, A. C., Held, J. M., Hirayama-Kurogi, M., Hou, G., Krisp, C., Larsen, B., Lin, L., Liu, S., Molloy, M. P., Moritz, R. L., Ohtsuki, S., Schlapbach, R., Selevsek, N., Thomas, S. N., Tzeng, S. C., Zhang, H. & Aebersold, R. (2017) Multi-laboratory

139

assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry, Nat Commun. 8, 291. 210. Bantscheff, M., Lemeer, S., Savitski, M. M. & Kuster, B. (2012) Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present, Anal Bioanal Chem. 404, 939-65. 211. Ludwig, C., Claassen, M., Schmidt, A. & Aebersold, R. (2012) Estimation of absolute protein quantities of unlabeled samples by selected reaction monitoring mass spectrometry, Mol Cell Proteomics. 11, M111 013987. 212. Picotti, P. & Aebersold, R. (2012) Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions, Nat Methods. 9, 555-66. 213. Hsu, J. L. & Chen, S. H. (2016) Stable isotope dimethyl labelling for quantitative proteomics and beyond, Philos Trans A Math Phys Eng Sci. 374. 214. Holding, A. N. (2015) XL-MS: Protein cross-linking coupled with mass spectrometry, Methods. 89, 54-63. 215. Chen, K., Wu, K., Gormley, M., Ertel, A., Wang, J., Zhang, W., Zhou, J., Disante, G., Li, Z., Rui, H., Quong, A. A., McMahon, S. B., Deng, H., Lisanti, M. P., Wang, C. & Pestell, R. G. (2013) Acetylation of the cell-fate factor dachshund determines p53 binding and signaling modules in breast cancer, Oncotarget. 4, 923-35. 216. Kadoch, C. & Crabtree, G. R. (2015) Mammalian SWI/SNF chromatin remodeling complexes and cancer: Mechanistic insights gained from human genomics, Sci Adv. 1, e1500447. 217. Schneider, M., Belsom, A. & Rappsilber, J. (2018) Protein Tertiary Structure by Crosslinking/Mass Spectrometry, Trends Biochem Sci. 43, 157-169. 218. Merkley, E. D., Rysavy, S., Kahraman, A., Hafen, R. P., Daggett, V. & Adkins, J. N. (2014) Distance restraints from crosslinking mass spectrometry: mining a molecular dynamics simulation database to evaluate lysine-lysine distances, Protein Sci. 23, 747-59. 219. Lucic, V., Forster, F. & Baumeister, W. (2005) Structural studies by electron tomography: from cells to molecules, Annu Rev Biochem. 74, 833-65. 220. De Rosier, D. J. & Klug, A. (1968) Reconstruction of three dimensional structures from electron micrographs, Nature. 217, 130-4. 221. Lyumkis, D. (2019) Challenges and opportunities in cryo-EM single-particle analysis, J Biol Chem. 294, 5181-5197. 222. Stark, H. (2010) GraFix: stabilization of fragile macromolecular complexes for single particle cryo-EM, Methods Enzymol. 481, 109-26. 223. Yang, B., Wu, Y. J., Zhu, M., Fan, S. B., Lin, J., Zhang, K., Li, S., Chi, H., Li, Y. X., Chen, H. F., Luo, S. K., Ding, Y. H., Wang, L. H., Hao, Z., Xiu, L. Y., Chen, S., Ye, K., He, S. M. & Dong, M. Q. (2012) Identification of cross-linked peptides from complex samples, Nat Methods. 9, 904-6. 224. Schilling, B., Row, R. H., Gibson, B. W., Guo, X. & Young, M. M. (2003) MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides, J Am Soc Mass Spectrom. 14, 834-50. 225. Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C. & Ferrin, T. E. (2004) UCSF Chimera--a visualization system for exploratory research and analysis, J Comput Chem. 25, 1605-12. 226. Xu, C. & Min, J. (2011) Structure and function of WD40 domain proteins, Protein Cell. 2, 202-14. 227. Pollastri, G. & McLysaght, A. (2005) Porter: a new, accurate server for protein secondary structure prediction, Bioinformatics. 21, 1719-20. 228. Combe, C. W., Fischer, L. & Rappsilber, J. (2015) xiNET: cross-link network maps with residue resolution, Mol Cell Proteomics. 14, 1137-47.

140

229. Verreault, A., Kaufman, P. D., Kobayashi, R. & Stillman, B. (1996) Nucleosome assembly by a complex of CAF-1 and acetylated histones H3/H4, Cell. 87, 95-104. 230. Wu, J. I., Lessard, J. & Crabtree, G. R. (2009) Understanding the words of chromatin regulation, Cell. 136, 200-6. 231. Staahl, B. T., Tang, J., Wu, W., Sun, A., Gitler, A. D., Yoo, A. S. & Crabtree, G. R. (2013) Kinetic analysis of npBAF to nBAF switching reveals exchange of SS18 with CREST and integration with neural developmental pathways, J Neurosci. 33, 10348-61. 232. Musselman, C. A., Mansfield, R. E., Garske, A. L., Davrazou, F., Kwan, A. H., Oliver, S. S., O'Leary, H., Denu, J. M., Mackay, J. P. & Kutateladze, T. G. (2009) Binding of the CHD4 PHD2 finger to histone H3 is modulated by covalent modifications, Biochem J. 423, 179-87. 233. Wang, K., Chen, Y., Ferguson, S. D. & Leach, R. E. (2013) MTA1 and MTA3 Regulate HIF1a Expression in Hypoxia-Treated Human Trophoblast Cell Line HTR8/Svneo, Med J Obstet Gynecol. 1. 234. Punzeler, S., Link, S., Wagner, G., Keilhauer, E. C., Kronbeck, N., Spitzer, R. M., Leidescher, S., Markaki, Y., Mentele, E., Regnard, C., Schneider, K., Takahashi, D., Kusakabe, M., Vardabasso, C., Zink, L. M., Straub, T., Bernstein, E., Harata, M., Leonhardt, H., Mann, M., Rupp, R. A. & Hake, S. B. (2017) Multivalent binding of PWWP2A to H2A.Z regulates mitosis and neural crest differentiation, EMBO J. 36, 2263-2279. 235. Doerr, J. M. (2014) Protein-protein interactions in human pluripotent stem cell- derived neural stem cells and their neuronal progeny, PhD Thesis, University of Bonn and Hertie Foundation, Germany. 236. Vakilian, H., Mirzaei, M., Sharifi Tabar, M., Pooyan, P., Habibi Rezaee, L., Parker, L., Haynes, P. A., Gourabi, H., Baharvand, H. & Salekdeh, G. H. (2015) DDX3Y, a Male-Specific Region of Y Chromosome Gene, May Modulate Neuronal Differentiation, J Proteome Res. 14, 3474-83. 237. Hein, M. Y., Hubner, N. C., Poser, I., Cox, J., Nagaraj, N., Toyoda, Y., Gak, I. A., Weisswange, I., Mansfeld, J., Buchholz, F., Hyman, A. A. & Mann, M. (2015) A human interactome in three quantitative dimensions organized by stoichiometries and abundances, Cell. 163, 712-23. 238. Menafra, R., Brinkman, A. B., Matarese, F., Franci, G., Bartels, S. J., Nguyen, L., Shimbo, T., Wade, P. A., Hubner, N. C. & Stunnenberg, H. G. (2014) Genome-wide binding of MBD2 reveals strong preference for highly methylated loci, PLoS One. 9, e99603. 239. Huttlin, E. L., Bruckner, R. J., Paulo, J. A., Cannon, J. R., Ting, L., Baltier, K., Colby, G., Gebreab, F., Gygi, M. P., Parzen, H., Szpyt, J., Tam, S., Zarraga, G., Pontano-Vaites, L., Swarup, S., White, A. E., Schweppe, D. K., Rad, R., Erickson, B. K., Obar, R. A., Guruharsha, K. G., Li, K., Artavanis-Tsakonas, S., Gygi, S. P. & Harper, J. W. (2017) Architecture of the human interactome defines protein communities and disease networks, Nature. 545, 505-509. 240. Wang, L., Collings, C. K., Zhao, Z., Cozzolino, K. A., Ma, Q., Liang, K., Marshall, S. A., Sze, C. C., Hashizume, R., Savas, J. N. & Shilatifard, A. (2017) A cytoplasmic COMPASS is necessary for cell survival and triple-negative breast cancer pathogenesis by regulating metabolism, Genes Dev. 31, 2056-2066. 241. Bodega, B., Marasca, F., Ranzani, V., Cherubini, A., Della Valle, F., Neguembor, M. V., Wassef, M., Zippo, A., Lanzuolo, C., Pagani, M. & Orlando, V. (2017) A cytosolic Ezh1 isoform modulates a PRC2-Ezh1 epigenetic adaptive response in postmitotic cells, Nat Struct Mol Biol. 24, 444-452. 242. Butt, N. A., Kumar, A., Dhar, S., Rimando, A. M., Akhtar, I., Hancock, J. C., Lage, J. M., Pound, C. R., Lewin, J. R., Gomez, C. R. & Levenson, A. S. (2017) Targeting MTA1/HIF- 1alpha signaling by pterostilbene in combination with histone deacetylase inhibitor attenuates prostate cancer progression, Cancer Med. 6, 2673-2685.

141

243. Chevrollier, A., Loiseau, D., Reynier, P. & Stepien, G. (2011) Adenine nucleotide translocase 2 is a key mitochondrial protein in cancer metabolism, Biochim Biophys Acta. 1807, 562-7. 244. de Mateo, S., Castillo, J., Estanyol, J. M., Ballesca, J. L. & Oliva, R. (2011) Proteomic characterization of the human sperm nucleus, Proteomics. 11, 2714-26. 245. Vandewalle, J., Bauters, M., Van Esch, H., Belet, S., Verbeeck, J., Fieremans, N., Holvoet, M., Vento, J., Spreiz, A., Kotzot, D., Haberlandt, E., Rosenfeld, J., Andrieux, J., Delobel, B., Dehouck, M. B., Devriendt, K., Fryns, J. P., Marynen, P., Goldstein, A. & Froyen, G. (2013) The mitochondrial solute carrier SLC25A5 at Xq24 is a novel candidate gene for non-syndromic intellectual disability, Hum Genet. 132, 1177-85.

142

Appendix A: Gibson assembly primer sequences.

Name Sequence Tm LANCL1_fwd_Gibson GGATCCACTAGAACACATAATAGG ATGGCTCAAAGGGCCTTCC 60 LANCL1_rev_Gibson TGGATATCTGCAGAATTCTGATCA TCAGAGTTCAAATGCAGGGAAC 59 KPNA4_fwd_Gibson GGATCCACTAGAACACATAATAGG ATGGCGGACAACGAGAAACTG 61 KPNA4_rev_Gibson TGGATATCTGCAGAATTCTGATCA TAAAACTGGAACCCTTCTGTTGG 60 PAICS_fwd_Gibson GGATCCACTAGAACACATAATAGG ACAGCTGAGGTACTGAACATTGG 60 PAICS _rev_Gibson TGGATATCTGCAGAATTCTGATCA ACATTCTCTGATTTTCTTGTCAG 58 DCAF11_fwd_Gibson GGATCCACTAGAACACATAATAGG ATGGGATCGCGGAACAGCAG 62 DCAF11_rev_Gibson TGGATATCTGCAGAATTCTGATCA CTGGGGTGAGGAAAAGGGTG 60 ZNF219_fwd_Gibson GGATCCACTAGAACACATAATAGG ATGGAGGGCTCACGTCCC 61 ZNF219_rev_Gibson TGGATATCTGCAGAATTCTGATCA CCGTTCTTGCCCCCCCAG 62 CALML5_fwd_Gibson GGATCCACTAGAACACATAATAGG ATGGCCGGTGAGCTGACTC 62 CALML5_rev_Gibson TGGATATCTGCAGAATTCTGATCA CTCCTGGGCGAGCATCCTC 62 SLC25A5_fwd_Gibson GGATCCACTAGAACACATAATAGG GATGCCGCTGTGTCCTTCG 61 SLC25A5_rev_Gibson TGGATATCTGCAGAATTCTGATCA TGTACTTCTTGATTTCATCATAC 60 DDB1_fwd_Gibson GGATCCACTAGAACACATAATAGG ATGTCGTACAACTACGTGGTAAC 58 DDB1_rev_Gibson TGGATATCTGCAGAATTCTGATCA CTAATGGATCCGAGTTAGCTCC 58 COASY_fwd_Gibson GGATCCACTAGAACACATAATAGG ATGGCCGTATTCCGGTCGG 62 COASY_rev_Gibson TGGATATCTGCAGAATTCTGATCA GTCGAGGGCCTGATGAGTCTT 61 HIF1A_fwd_Gibson GGATCCACTAGAACACATAATAGG AACGACAAGAAAAAGATAAGTTC 60 HIF1A_rev_Gibson TGGATATCTGCAGAATTCTGATCA AGTTAACTTGATCCAAAGCTCTG 60 KIF20B_fwd_Gibson GGATCCACTAGAACACATAATAGG CAAGAGGGAGTACCTCGACC 58 KIF20B_rev_Gibson TGGATATCTGCAGAATTCTGATCA TTTGGCTGTTTTTGTTCGAAGTC 59

143

Appendix B: qPCR primer sequences. Table 2-5. Name Sequence Tm (s) MTA1_fwd_qPCR AGGCAGACATCACCGACTTG 60 MTA1_rev_qPCR AGTATCCATGGCGTGGAACA 60 MTA2_fwd_qPCR TATGTGGGTGGCTGGTAATG 59 MTA2_rev_qPCR GCCTGGCTGATAGTAATGCC 60 MTA3_fwd_qPCR ACTTATAATGCTCGCAGATAAGC 60 MTA3_rev_qPCR ACTTTCCCCTGATATGTGTTG 60 MTA3_fwd_qPCR ATGCTGGCGCTGAGTACGTC 61 MTA3_rev_qPCR CTTGAGGCTGTTGTCATACTTC 60

144

Appendix C: Antibodies used in this study.

Antibodies Host Conjugate Source MTA1 Rabbit - CST (#D40D1)

MTA2 Rabbit - abcam (#171073) MTA3 Rabbit - abcam (#176346) HDAC1 Mouse HRP CST (#10E2)

FLAG Mouse HRP Sigma (#A8592) HA Mouse HRP abcam (#ab173826) MBD2 Rabbit HRP abcam (#ab188474) MBD3 Rabbit HRP abcam (#ab157464) Β-Actin Mouse HRP SC (#47778)

145

Appendix D. Complete list of possible interacting partners of the NuRD.

146

147

148

Appendix E. Inter crosslinked peptides.

149

Appendix F. Intra crosslinked peptides.

150

151

152