<<

Structural and Functional Characterization of NuRD Complexes Jesus Gomez de Segura

To cite this version:

Jesus Gomez de Segura. Structural and Functional Characterization of NuRD Complexes. Struc- tural Biology [q-bio.BM]. Université Grenoble Alpes, 2019. English. ￿NNT : 2019GREAV063￿. ￿tel- 03275313￿

HAL Id: tel-03275313 https://tel.archives-ouvertes.fr/tel-03275313 Submitted on 1 Jul 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESIS / THÈSE To obtain the title of / Pour obtenir le grade de DOCTEUR DE LA COMMUNAUTE UNIVERSITE GRENOBLE ALPES Discipline / Spécialité: Biologie Structurale et Nanobiologie Arrêté ministériel: 25 mai 2016

Presented by / Présentée par Jesus Gomez de Segura

Thesis supervisor / Thèse dirigée par Prof Christiane Schaffitzel Co-supervisor / et codirigée par Prof Andrew Mc Carthy

Thesis prepared at / préparée au sein du European Molecular Biology Laboratory (EMBL), Grenoble Outstation dans l'École Doctorale de Chimie et Sciences du Vivant

Structural and functional characterisation of NuRD complexes Caractérisation structurale et fonctionnelle des complexes NuRD

Public defense on / Thèse soutenue publiquement le 16.12.2019 Jury member / Devant le jury composé de:

Prof. Andrea Dessen President of the jury/ Présidente de jury DIRECTRICE DE RECHERCHE, CNRS DELEGATION ALPES,

Dr. Juan Reguera Reviewer / Rapporteur CHARGE DE RECHERCHE HDR, UNIVERSITE AIX-MARSEILLE

Prof. Patrick Schultz Examiner / Examinateur DIRECTEUR DE RECHERCHE, CNRS DELEGATION ALSACE

Prof. Frank Sobott Examiner / Examinateur PROFESSEUR, UNIVERSITE DE LEEDS - GRANDE BRETAGNE ii

Contents

Contents iii

List of Figures v

1 Preface 1

2 Introduction 3

2.1 Regulation of : ...... 4

2.2 The NuRD complex...... 19

2.3 Scope...... 35

3 Electron Microscopy Studies of Human NuRD 36

3.1 Introduction...... 37

3.2 Methods...... 38

3.3 Results...... 43

3.4 Discussion...... 47

4 Electron Microscopy Studies of NuRD 50

4.1 Introduction...... 51

4.2 Methods...... 52

4.3 Results...... 64

4.4 Discussion...... 75

5 Discussion 78

5.1 dNuRD provides a reliable approximation for the study of hNuRD...... 79

5.2 NuRD complexes are inherently heterogeneous and flexible...... 80 iii

5.3 Structure and assembly pathway for NuRD complexes...... 82

5.4 Final remarks...... 84

References 87

Acknowledgements 119

A Appendix 120

B Appendix 121 iv

List of Figures

2.1 Structure of the chromatin...... 5

2.2 DNA methylation...... 8

2.3 Most common and studied histone modifications...... 13

2.4 Coordination of epigenetic marks...... 20

2.5 Catalytic subunits of hNuRD...... 25

2.6 GATAD2 and MTA subunits of hNuRD...... 28

2.7 RbAp and MBD subunits of hNuRD...... 32

2.8 DOC1 is part of the core NuRD complex...... 35

2.9 Proposed modes of NuRD recruitment to DNA for gene silencing...... 37

2.10 Crystallized interaction of hNuRD subunits...... 42

3.1 Purification of hNuRD...... 53

3.2 Negative stain EM of the hNuRD complex...... 54

3.3 First reconstructions of the hNuRD complex...... 55

3.4 CBP affinity purification of CHD4 and GATAD2A...... 57

3.5 Computationally predicted Calmodulin binding sites in GATAD2A...... 58

4.1 Graphical representation of particle sorting according to classification stability...... 75

4.2 Purification and EM processing of the endogenous dNuRD sample...... 78 v

4.3 dNuRD cryo-EM...... 79

4.4 Purification and negative stain EM of dNuRD recombinant subcomplexes...... 82

4.5 3D reconstructions of dNuRD recombinant subcomplexes...... 85

4.6 PMMR Cryo-EM and reconstructions...... 89

4.7 dNuRD is built from preformed subcomplexes...... 94

4.8 Differences between the dNuRD subcomplexes show the localisation of the subunits

composing them...... 96

5.1 Comparison of the dNurd and hNuRD models...... 100

5.2 Nucleosome bound to NuRD models...... 104 vi Nomenclature

ART ADP-ribosiltransferase

ADP Adenosine-5’-diphosphate

ATP Adenosine-5’-triphosphate

BAH Bromo adjacent homology

CaM Calmodulin

CD Chromodomains

CDK2AP1 Cyclin-dependant kinase 2 associated protein 1

CHD1 Chromodomain DNA-binding protein 1

DOC1 Deleted in oral cancer 1

DNA Deoxyribonucleic Acid

DNMT DNA methyltransferase dNuRD Drosophila NuRD

DSF Differential Scanning Fluorometry dsRNA Double stranded RNA

ELM2 Egl-27 and MTA1 homology 2

EM Electron microscopy

FIB Focused Ion Beam flp-FRT flippase – flippase recognition target FSC Fourier Shell Correlation

HKMT Histone methyl transferase

HMG High Mobility Group viii Nomenclature hNuRD Human NuRD iPSC Induced Pluripotent Stem Cells

ISWI Imitation Switch lncRNA Long non-coding RNA kDa Kilodalton m4C N4-methylcytosine m5c 5-methylcytosine m6A N6-methyladenine miRNA micro RNA

MARH mono-ADP-ribose protein hydrolase

MRA Multi-Reference Alignment

MSA Multi-Variate Statistical Analysis

MTA Metastasis associated protein ncRNA non-coding RNA ncRNA Non coding RNA

NS-EM Negative Stain Electron Microscopy

NuRD Nucleosome Remodelling and Deacetylase

PAD Protein arginine deiminase

PIC Pre-Initiation complex

PHD Plant homeodomain

PM p55–MTA-like

PMR p55–MTA-like–Rpd3

PMMR p55–MTA-like–MBD-like–Rpd3

PMMR-G p55–MTA-like–MBD-like–Rpd3-Simjang(GATA)

Pol I RNA polymerase I

Pol II RNA polymerase II

Pol III RNA polymerase III ix

PRC Polycomb repressive complex

PRMT Protein arginine methyltransferase

RbAp Retinoblastoma associated protein

RBBP Retinoblastoma-binding protein

RCT Random Conical Tilt

RNA Ribonucleic Acid

RNAi RNA interference

SANT Swi3, Ada2, NcoR and TFIIIB

SH3 Src homology 3

SHREC Snf2/ repressor complex

SUMO Small Ubiquitin-like Modifier

SWI/SNF SWItch/sucrose non-fermentable

TEM Transmission Electron Microscopy

TET Ten Eleven Translocation

TEV Tobacco Etch Virus

TSA Thermofluor Stability Assay 1 Preface

This doctoral thesis focuses on the structural and functional characterisation of NuRD complexes, poorly understood to date. The introduction of the manuscript provides background on the rich field of epigenetics and the relevance of the NuRD complex as one of the central regulators of the chromatin. The first half of the work revolves around the human NuRD complex, while the second half of the studies were performed with drosophila NuRD complexes. These parts, although interconnected, are written independently following a the typical paper formatting, divided in “introduction”, “methods”, “results” and “discussion”. A concluding chapter summarises the results of the previous chapters and discusses the common conclusions. The work presented here contributed to the publication of a scientific paper and comprises unpublished data which could contribute to a further publication. 2 2 Introduction

Resumé en français

Le concept d'épigénétique a été proposé pour la première fois il y a plus de 75 ans par C.H. Waddington. Cependant, de nombreux aspects de la régulation épigénétique du génome restent, à ce jour, mystérieux. Chez les eucaryotes, l'ADN est emballé étroitement à l'intérieur du noyau cellulaire, formant une structure appelée chromatine. L'unité répétitive de la chromatine, le nucléosome, est composée d'ADN enroulé autour d'un coeur protéique - les histones - et joue un rôle central dans la régulation de la transcription de l'ADN. Le niveau d’empaquetage de l’ADN, les modifications des histones et d’autres modifications des chromosomes qui n’impliquent pas de modification de la séquence de l’ADN peuvent entraîner des modifications héréditaires de l’expression des gènes et du phénotype qu’elles produisent. C'est ce que nous appelons la régulation épigénétique. Le complexe de remodelage des nucléosomes et de désacétylation des histones NuRD est l’un des principaux complexes régulateurs de la chromatine chez la plupart des eucaryotes. Il s'agissait du premier complexe montré à combiner une activité de remodelage des nucléosomes, qui peut déplacer les nucléosomes le long de l'ADN, et une activité histone-désacétylase, qui modifie les queues d'histones. Malgré son importance, on en sait peu sur la structure et le mécanisme du complexe NuRD. Cela est dû à la complexité du complexe, qui présente une composition très hétérogène et flexibilité structurelle. L'objectif de ce travail de thèse est d'obtenir structures à haute résolution de complexes NuRD et d'utiliser ces informations pour expliquer les mécanismes du complexe et son processus d'assemblage. Pour ce faire, nous utilisons des techniques de cryo-microscopie électronique et nous nous concentrons sur les complexes NuRD endogènes humains et de drosophile, ainsi que sur les complexes NuRD recombinants de drosophile. 4 Introduction

2.1 Regulation of transcription: epigenetics

In bacteria and eukaryotic cells the genetic information is stored in one or more double stranded DNA macromolecules (Avery et al., 1944; Hershey & Chase, 1952). This information is transcribed into RNA (Weiss & Nakamoto, 1961; Geidushek et al., 1961; Chamberlin et al., 1963) and then translated into proteins (Schweet & Heintz, 1966). These, in turn, fulfil structural, enzymatic and signalling roles, being the main responsibles of giving shape to the . However, the variability of the DNA cannot be accounted for all the variability observed between individuals. There is another set of heritable characteristics that, independently from the DNA sequence, modulate gene expression and thus the phenotype. These phenomena have been encompassed within the field of “epigenetics”. The term was first proposed in 1942 by C.H. Waddington as “epigenotype” (Waddington, 1942), and nowadays it is used to define a range of stably heritable phenotypical traits induced by modifications of the chromosomes and its level of packaging within the cell without alterations to the DNA sequence (Berger et al., 2009). However, the concept has been widely discussed, and other definitions of epigenetics have focused on the heritability of traits, including phenomena that not necessarily imply the participation of the chromosomes in the process (Holliday, 1994; Deans & Maggert, 2015). This kind of regulation is one of the main means of controlling differential gene expression in different tissues or cell development states. They are crucial in the management of pluripotency and differentiation processes (Chen & Dent, 2014).

2.1.1 DNA organization

The DNA in eukaryotic cells, with the exception of the mitochondrial DNA, is found in the nucleus. Here, thanks to a range of scaffold proteins, each molecule of DNA is packed as a chromosome, a structure that allows to store the long strands of DNA in the reduced space within the nucleus (Figure 2.1a) (Yoshikawa & Yoshikawa, 2002). The basic blocks that will form the scaffold are the core histones, H2A, H2B, H3 and H4. Two copies of each of the core histones form a 2.1 Regulation of transcription: epigenetics 5

Figure 2.1. Structure of the chromatin. a) In the less compacted form of chromatin the DNA double helix is packed around the nucleosomes. In the classic model of chromatin further packaging is performed thanks to additional scaffolding proteins, until 30-nm fibers are formed. More recent studies suggest that the natural organization of the chromatin consists of dynamic 10-nm fibers (Maeshima et al., 2014). The maximum level of DNA packaging is the condensed mitotic chromosome. (Figure based on Maeshima et al., 2014). b) X-ray crystal structure of the nucleosome (Davey et al., 2002). The front and side views of the nucleosome show how DNA is coiled around the histone octamer. Additionally, histone tails can be seen protruding out of the nucleosome, which is important for their interaction with different factors. 6 Introduction core histone octamer (Olins & Olins, 1974; Kornberg, 1974; Thomas & Kornberg, 1975, Woodcock et al., 1976). The DNA helix coils around the positively charged surface of the octamer, reducing the electrostatic repulsion created by the abundant phosphate groups of the DNA chain. 147 base pairs forming 1.7 turns of a left handed superhelix interact with the octamer, creating what is known as a nucleosome core particle (Figure 2.1b) (Davey et al., 2002). These core particles are spaced in the DNA by linker segments of 20 to 80 base pairs. The resulting repetitive unit, consisting of the histone octamer and an average of 200 DNA base pairs, is known as nucleosome (Kornberg, 1974). The nucleosome is the basic unit of the DNA packaging and one of the focal points of epigenetic regulation of transcription.

The combination of the DNA and the scaffolding proteins is called chromatin. The existence of the chromatin is known since the nineteenth century, when W. Flemming described it for first time using optical microscopy, but the precise structure of this complex is still enigmatic. In its simplest form, chromatin is a string of nucleosomes united by their linker DNA segments, resembling a “beads on string” motif (Olins & Olins, 1974). With an approximate wideness of 11 nm, this structure is known as a 10-nm fiber (Figure 2.1a). With the addition of another histone, H1, the chromatin is further packed. The addition of further scaffold proteins will mark the transition from the lightly packed state of the chromatin, known as euchromatin, to a heavily packed protein rich state, known as heterochromatin. These states have been commonly associated with transcriptionally active and inactive chromatin, respectively, but now it is known that heterochromatic regions can be transcribed into RNA, and complete silencing requires additional steps (Braunstein et al., 1996; Värv et al., 2010). Negative Stain Electron Microscopy (NS-EM) studies showed that the 10-nm fibers would fold into solenoids of approximately 30 nm diameter, consequently called 30-nm chromatin fibers (Figure 2.1a) (Finch & Klug, 1976). Several models and variations have been proposed to date to explain the structure of the 30-nm fibers (van Holde & Zlatanova, 2007). According to the most spread model of chromatin packaging these fibers would, with the help of further scaffolding proteins, continue folding hierarchically into fibers of larger diameters (Sedat & Manuelidism, 1978). The maximum level of packaging of the chromatin are the mitotic chromosomes. However, in the last years the existence of the 30-nm fibers in vivo has been a 2.1 Regulation of transcription: epigenetics 7 matter of doubt. Several studies show evidence suggesting it might only be an artefact derived from the conditions of the preparations (far from physiological), chemical fixation and staining. Instead, in the physiological conditions, the 10-nm chromatin fibers would fold in an irregular and dynamic fashion, even at the highest packaging levels, such as the mitotic chromosomes (Nishino et al., 2012; Maeshima et al., 2014).

2.1.2 Epigenetic regulation

Transcription takes place within the structured, dynamic and crowded nuclear environment (Venkatesh & Workman, 2015). Chromatin and its scaffold proteins are not static elements of the nucleus; they are subjected to a variety of modifications and play a central role in the regulation of transcription. Epigenetics target chromatin and its components and thereby adds a new layer of information to the genomic sequence. While other elements of transcription regulation, such as promoters, enhancers or repressors, are directly dependent on the nucleotide sequence, epigenetic modifications target unspecific and repetitive motifs, nucleosomes, for example. Instead of modifying the contents of the genome (the DNA sequence), epigenetics modifies how these contents are interpreted and how the DNA and RNA synthesis machinery of the cell interacts with it. Epigenetic phenomena can also be defined as a set of markers that define an epigenetic code, with special emphasis in the role of the histones as epigenetic markers (Strahl & Allis 2000; Turner 2000 and 2007; Imhof & Becker, 2001; Jenuwein & Allis 2001). However, it is uncertain that the so called “histone code” is a straight forward code. Instead, the epigenetic markers might produce synergistic effects (Fischle et al., 2003; Freitag & Selker, 2005; Henikoff, 2005; Weaver & Bartolomei, 2014). Epigenetic modifications affecting gene expression, their nature and effects have been extensively described in the literature (See Cavalli & Heard, 2019 for a recent example review). 8 Introduction

2.1.2.1 Methylation of DNA

The DNA methylation was the first epigenetic mechanism for transcription regulation to be described (Razin & Riggs, 1980). Consequently, it has been the most extensively studied. The processes behind methylation and demethylation of the DNA are now fairly well understood. However, it is still unclear how DNA methylation performs its regulatory function and how DNA regions are committed to a specific methylation pattern (see Ambrosi et al., 2017).

DNA methylation is well conserved in eukaryotic organisms, with the notable exception of most yeasts, which show no or extremely low methylation (Capuano et al., 2014). For a long time was thought to lack any methylation (Simoson et al., 1986), but recently was shown to be capable of adenine methylation (Greer et al., 2015). Another widely spread laboratory model organism, Drosophila melanogaster, shows very low methylation

Figure 2.2: DNA methylation. Methylation of cysteine residues is the most common DNA methylation in mammals. It happens in the 5C carbon atom of the pyrimidine ring, using S-adenosyl methionine as donor for the methyl group. Demethylation of 5-methyl cytosine can happen spontaneously. Passive demethylation occurs directly in the 5-methyl cytosine, but Ten-eleven translocation (TET) enzymes can create a 5-hydroxymethyl cysteine intermediary to facilitate this process. 5-methyl cytosine can also be de-aminated, transforming the nucleotide in a thymine. 2.1 Regulation of transcription: epigenetics 9 levels (Gowher et al., 2000). DNA methylation is also present in bacteria, but has been significantly less studied in prokaryotes than in eukaryotes (Adhikari & Curtis, 2016). Of the four nucleotides present in the DNA, two can be methylated, cytosine and adenine. Two methylated forms of cytosine and one of adenine are currently known: 5-methylcytosine (m5C) (Figure 2.2) (Doskocil & Sormova, 1965), N4-methylcytosine (m4C) (Ehrlich et al., 1985) and N6-methyladenine (m6A) (Dunn & Smith, 1958). These three modifications are present in bacteria (Ehrlich et al., 1985), while plants contain only m5C and m6A nucleotides (Vanyushin, 2006). The most common methylated nucleotide in mammals is m5C, but m6A can also be observed (Kay et al., 1994) and was recently suggested to be important for epigenetic regulation (Wu et al., 2016).

In mammals, DNA methylation mostly happens in cytosines which are part of CpG dinuclotides. Methylation of cytosines out of CpG dinucleotides can also be observed, especially in stem cells (Haines et al., 2001). DNA areas with a relatively large amount of CpG dinucleotides are known as CpG islands. Highly methylated CpG islands are usually found in non-coding regions of the DNA, and not in promoters of active genes (Bird. 1986). This suggests a correlation between DNA methylation and silencing of chromatin. DNA methyltransferases (DNMTs) are the enzymes that catalyse DNA methylation (Figure 2.2). It is thought that DNMT1 is the responsible enzyme for maintaining DNA methylation after duplication of the cells, because it can recognize hemi-methylated DNA. DNMT3A and DNMT3B catalyse de novo methylation (Goll & Bestor, 2005). How DNA is marked for de novo methylation is unclear, but it is known that H3K9 histone methylation (see section 2.1.2.2) marks the chromatin region for cytosine DNA methylation in Neurospora crassa (Tamaru et al., 2003). Chromatin remodelling proteins (see section 2.1.2.3) and non-coding RNAs (ncRNA) (see section 2.1.2.4) also seem to be important for maintenance and de novo DNA methylation. In contrast, non-methylated CpG islands show resistance to de novo methylation (Bird et al., 1985; Gardiner-Garden & Frommer, 1987). Removal of the methylation marks can happen passively, by the lack of DNMT1 activity. Alternatively, Ten- eleven translocation (TET) proteins can oxidise m5C to facilitate removal of the methyl group by DNA repair mechanisms (Figure 2.2) (He et al., 2011; Bhutani el al., 2011). Methylated cytosines can also be deaminated into thymines. This modification not only removes the cytosine methyl group, but also introduces 10 Introduction mutations (Figure 2.2) (Chahwan et al., 2010). Upon fertilization, the undergoes the removal of most of its cytosine methylation marks, with just a few methylation sites remaining (Wang et al., 2014). Interestingly, demethylation of the paternal genome is done actively, while demethylation of the maternal one is passive (Santos et al., 2002); the reason for these different mechanisms of demethylation is enigmatic. After the blastocyst stage the undergoes de novo methylation until normal methylation levels for different cell lineages are reached (Monk et al., 1987; Kafri et al., 1992).

DNA methylation prevents binding of many transcription factors to the DNA, possibly because methyl groups occupy the mayor groove of the DNA helix (Watt & Molloy, 1988). DNA methylation also promotes binding of other factors, such as proteins belonging to the methyl-CpG-binding domain (MBD) family (Meehan et al., 1989; Bird & Wolffe, 1999). These proteins often recruit chromatin remodelling or repressor complexes, which are thought to be the effectors of chromatin silencing (Jones et al., 1998; Nan et al., 1998). In mammals, gene repression by CpG methylation is involved in many relevant processes. During development, methylation helps to regulate the expression of pluripotency genes (Farthing et al., 2008). These genes allow cells to remain uncommitted to a particular cellular lineage, maintaining them pluripotent, potentially capable of differentiating into several different cell types. Methylation also contributes to the establishment of imprinting (a epigenetic phenomena where one of the alleles for a gen is silenced in a parental specific way, i.e. only the allele from one of the progenitors contributes to the genotype) (Weaver & Bartolomei, 2014). Other relevant process in which methylation takes part is X chromosome inactivation (Beard et al., 1995): mammalian females usually have one of their two X chromosomes inactivated and densely packed into a structure called “Barr body” (Barr & Bertram, 1949). This happens because each healthy individual only requires the genetic load of a single X chromosome, and while males already possess one single copy of this chromosome, females possess two. Inactivation simply adjusts their genetic load to the physiological level. Interestingly, in females with chromosomal aberrations leading to multiple copies of the X chromosome all these copies except one will be inactivated, leading to less severe . Methylation also has protective functions, repressing endogenous and other repetitive elements such as transposons (Jahner er al., 1982; Walsh et al., 1998; Zakrzewski et al., 2017), 2.1 Regulation of transcription: epigenetics 11 and thus helps maintaining genome stability (Bourc’his & Bestor, 2004). Dysregulation of DNA methylation can contribute to development of cancer (Jones, 2002) and other diseases. It has also been speculated that DNA methylation found in DNA of actively transcribed genes could regulate alternative splicing (Lev Maor et al., 2015).

2.1.2.2 Histone modifications

Histones are the basic building blocks of chromatin (Kornberg, 1974; Thomas & Kornberg, 1975). As such, they play a central role in the structure of chromatin and its interactions with other proteins. The four core histones are divided in two parts, a structured globular body and flexible “tails” located at the ends of the histones (Arents et al., 1991; Arents & Moudrianakis, 1995). While the bodies of the histones contribute to the structure of the histone octamer, the tails at the N- terminus of the four histones and at the C-terminus of the H2A histone protrude from the surface of the nucleosome (Figure 2.1b) (Davey et al., 2002). These tails can interact with nucleosomal and linker DNA (Angelov et al., 1988; Angelov et al., 2001; Iwasaki et al., 2013), with adjacent nucleosomes (Dorigo et al., 2004; Kan et al., 2009; Yang & Aira, 2011) and with different proteins (Champagne & Kutateladze, 2009; Wu & Xu, 2012). Histone tails and their modifications have been shown to be important to maintain the nucleosome structure (Nurse et al., 2013) or regulation of DNA transcription (Vettese-Dadey et al., 1996), and they are indispensable elements for chromatin compaction (Moore & Ausió, 1997; Dorigo et al., 2003) and formation of the mitotic chromosomes (de la Barre et al., 2001). In addition, they play an important role in the regulation of nucleosome remodelling (Hamiche et al., 2001; Ludwigsen et al., 2017). Despite generally considered flexible, the rigid structure certain domains in the histone tails adopt is seemingly important for their functions (Banères et al., 1997; Wang et al., 2000). Histones, and particularly their tails, are subjected to a series of covalent post-translational modifications. These modifications can alter and regulate their structure and functions. Several amino acid residues can potentially be modified in each histone tail. Combined with the abundance of nucleosomes along the DNA, this gives rise to a plethora of possible combinations that alter the functionality of chromatin. This 12 Introduction has given rise to the idea of a “histone code” that would determine the transcriptional state of a defined DNA segment (Strahl & Allis 2000; Turner, 2000 and 2007; Imhof & Becker, 2001; Jenuwein & Allis 2001). However, the complexity of the histone modification landscape makes it difficult to establish wether such a code exists or wether the final transcriptional state of a gene is the result of accumulation of positive and negative effects of the histone modifications in a DNA region (Fischle et al., 2003; Henikoff, 2005). Eight main types of histone modifications have been characterized: Acetylation (Allfrey et al., 1964; Grunstein, 1997), methylation (Murray, 1964; Allfrey et al., 1964; Rea et al., 2000), deimination (Bannister et al., 2002; Cuthbert et al., 2004), phosphorilation (Paulson & Taylor, 1982), proline isomerization (Nelson et al., 2006), ADP ribosilation (Jump et al., 1979, Hassa et al., 2006), ubiquitylation (Levinger & Varshavsky, 1982; Shilatifard, 2006) and sumoylation (Shiio & Eisenman, 2003).

Acetylation occurs in the amino groups of the side chains of several lysine residues in the tails of the four core histones (Loidl, 1994). Being a reversible process, the acetylation levels of chromatin are regulated by the equilibrium between histone acetyltransferase (HAT) (Brownell et al., 1996) and histone deacetylase (HDAC) (Taunton et al., 1996) protein families (Figure 2.3a). Histone acetylation has been experimentally associated with transcriptionally active chromatin for a long time (Hebbes et al., 1988). Consequently, HATs and HDACs are recruited by DNA-bound activators and repressors respectively and have been often reported to form part of transcription activator or repressor super-complexes (Vidal & Gaber, 1991; Ogryzko et al., 1998; Ikura et al., 2000). Acetyl groups reduce the positive charge of the histones. Therefore, they are thought to weaken the histone interaction with DNA (Hong et al., 1993), facilitating the association of DNA with transcription factors (Lee et al., 1993). Acetylation marks have also been shown to play a role in the regulation of nucleosome remodelling by different complexes (Hamiche et al., 2001), linking together these two important epigenetic mechanisms.

Among the different histone modifications, methylation is probably the most complex one, being capable of triggering antagonizing effects depending on the methylation pattern of the histones. Several lysines and arginines in the four core histones can be methylated to a different extent on their side chain nitrogens: mono-, di- and tri-methylation for lysines (Figure 2.3b) and mono- and di- 2.1 Regulation of transcription: epigenetics 13

Figure 2.3: Most common and studied histone modifications. a) Acetylation of lysines happens thanks to the HATs. In this reaction an acetyl group is transferred from the acetyl-coenzyme A to the amino group in the side chains of lysines. HDACs catalyse the deacetylation of histones. b) Histone lysines can also have their side chain amino group mono-, di- and trimethylated, with different effects to transcription. Lysine methylation is catalysed by HKMTs, while demethylation happens thanks to demethylases like LSD1 or JMJDs. c) Arginine residues in histones can also be methylated thanks to the PRMTs. Having two amino groups suitable for methylation, arginines can be symmetrically or asymmetrically dimethylated. Demethylation occurs by action of the demethylases, or by deimination by PAD, turning the arginine into citrulline. 14 Introduction methylation for arginines exists (Figure 2.3c) (Clarke, 1993). Depending on which specific amino acid residues is modified the methylation mark will be associated with different processes and transcriptional states. Lysine methylation is catalysed by different histone methyl transferases (HKMT) (Martin & Zhang, 2005). Among the better understood lysine methylation sites, H3K4 (Histone H3, lysine residue 4), H3K36 and H3K79 are associated with transcriptionally active chromatin (Bernstein et al., 2002; van Leeuwen et al., 2002; Smolle & Workman, 2013). H3K79 has the peculiarity of being localized in the body of the histone instead of the tail, and is also involved in DNA repair (Martin & Zhang 2005). H3K36, however, has also been associated with repression of transcription in other contexts (Landry et al., 2003). Methylation at H3K9, H3K27 and H4K20 are instead considered repression marks (Cao & Zhang, 2004; Kapoor-Vazirani et al., 2011; Keniry et al., 2016). H4K20 is also involved in DNA repair (Martin & Zhang 2005). Removal of histone methylation marks is possible thanks to histone demethylases like LSD1 (Shi et al., 2004) and JMJD proteins (Tsukada et al., 2006; Cloos et al., 2006; Whetstine et al., 2006) (Figure 2.3b and c). Arginine methylation is catalysed by a family of nine protein arginine methyltransferases (PRMT) (Fuhrmann & Thompson, 2016). Similarly to lysine methylation, arginine methylation can show opposing effects with respect to transcription regulation: symmetrically dimethylated arginines (cases in which the N atoms in the two equivalent guanidino groups of the arginine are methylated once each) are regarded as activator marks, while those asymmetrically dimethylated (one of the guanidino groups has its N dimethylated while the other shows no methylation) are considered repressor marks (Di Lorenzo & Bedford, 2001). Methyl groups are small non polar groups, and as such they don’t present steric impediments for other molecules to interact with histone tails. Instead, they recruit factors and complexes with different effects on the chromatin, usually coupling methylation with other histone modifications and epigenetic processes such as DNA methylation or nucleosome remodelling (Bird & Wolffe, 1999; Wysocka et al., 2006).

Deimination of arginine residues is a process that antagonizes arginine methylation. It represents the main mechanism by which arginine methylation is made reversible. Deimination is catalysed by protein arginine deiminases (PAD) enzymes, and involves the removal of one of the imine groups. This turns the 2.1 Regulation of transcription: epigenetics 15 arginine into a citrulline and effectively removes methyl groups if present (Figure 2.3c) (Wang et al., 2004). This gives the former arginine residue electrostatic characteristics similar to acetylated lysine, which leads to open chromatin states and gene activation (Fuhrmann & Thompson, 2016). In fact, deimination has also been shown to correlate with chromatin decondensation (Christophorou et al., 2014).

Phosphorylation is a well known post-translational modification and it is involved in the regulation of a plethora of processes (Hunter & Karin 1992; MacKintosh, 1998). Many kinases have been shown to phosphorylate serine and threonine residues of the H3 histone, in particular H3S10 (Sassone-Corsi et al., 1999; Duncan et al., 2006). For a long time of this histone has been known to correlate with actively transcribed genes (Mahadevan et al., 1991), but it also plays a central role in the condensation of chromosomes during mitosis (Wei et al., 1999; Giet & Glover, 2001). However, the precise mechanism of this modification is not known. The modification of the charges in the nucleosome and the specific interactions favoured by the phosphorylation might play a role (Kouzarides, 2007).

Proline isomerization refers to a change between the trans conformation of proline and its cis conformation (Ramachandran & Sasisekharan, 1968; Stewart et al., 1990). This conformational change can happen spontaneously, but several prolyl-cis-trans isomerases can catalyse this process (Gothel & Marahiel, 1999). One of these proteins, Fpr4, has H3 histone as its substrate (Nelson et al., 2006). In the context of epigenetic regulation, proline isomerization by Fpr4 has been shown to regulate the methylation level of H3K36. Specifically the presence of a trans P38 residue in the H3 tail inhibits the methylation of H3K36 by Set2 (Krogan et al., 2003; Schaft et al., 2003), and Fpr4 proline isomerization of P38 is reciprocally inhibited by methylation of H3K36. Moreover, binding of JMJD2A, capable of demethylating H3K36 (Chen et al., 2006), is suspected to be dependant of the cis conformation of P38 (Nelson et al., 2006).

ADP ribosilation happens by addition of mono- or poly-ADP-riboses to the side chains of some amino acids, which is catalysed by the ADP- ribosiltransferases (ART) and reversed by the action of mono-ADP-ribose protein hydrolases (MARH) (Hassa et al., 2006; Hottiger et al., 2010). This modification has also been shown to occur in histone tails (Messner et al., 2010). In the context 16 Introduction of DNA repair (Caldecott, 2008; Pearls et al., 2012), ribosilation has been shown to be able to manipulate the state of chromatin (Ju et al., 2006). Recently more aspects of the transcription regulation by ADP ribosilation of histones have been described, mainly its involvement in decompacting chromatin and recruiting remodeller complexes (Quénet et al., 2009; Messner & Hottiger, 2011).

Ubiquitylation and sumoylation, unlike the other histone modifications, add large polypeptides to the histones. Ubiquitin and the Small Ubiquitin-like Modifier (SUMO) proteins have sequence homology and share a similar structure, but they differ in their surface charges. Ubiquitylation can be observed in all the core histones. Modification of H3 and H4 histones have been associated with response to DNA damage (Wang et al., 2006). Ubiquitylations in H2A and H2B show opposite effects: Ubiquitylation and deubiquitylation of H2B is related to transcription activation and the methylation of H3K4 (Henry et al., 2003; Zhu et al., 2005). H2A is ubiquitilated by the Polycomb repressor complex, and is associated with repression (Wang et al., 2006). Deubiquitylation of H2B leads to chromatin silencing (Emre et al., 2005). Thus, ubiquitylation has also links with other epigenetic mechanisms such as histone methylation and nucleosome remodelling. Sumoylation, observed on H2A, H2B and H4 histones, is involved in transcription repression, and seems to directly antagonize histone ubiquitylation and acetylation (Shiio & Eisenman, 2003; Nathan et al., 2006).

2.1.2.3 Nucleosome remodelling

As discussed above, accessibility of the DNA is a major factor in regulation of transcription. Tight packaging of chromatin has been shown to correlate with gen silencing (Grewal & Gia, 2007). Similarly, other epigenetic modifications often lead to modifications to the chromatin structure and recruitment of nucleosome remodelling complexes (Jones et al., 1998; Nan et al., 1998; Hamiche et al., 2001; Wysocka et al., 2006; Reyes-Turcu & Grewal, 2012). Several nucleosome remodelling complexes have been described to be able to actively modify the structure of the chromatin, using the energy from ATP hydrolysis to facilitate this (Vignali et al., 2000). Apart from the recruitment of additional scaffold proteins to further compact it, two main methods of modifying chromatin structure have been 2.1 Regulation of transcription: epigenetics 17 described: modifications to the structure of the nucleosomes and repositioning them. Both activities are catalysed by different subfamilies of the Snf2 family of ATPases (Ryan & Owen-Hughes, 2011). In addition to the mandatory ATPase catalysing the remodelling activity, a chromatin remodelling complex will typically contain DNA or histone binding subunits, able to recognize epigenetic marks (Xiao et al., 2001; Pamblanco et al., 2001; Karras et al., 2005; Clapier & Cairns, 2009; Nguyen et al., 2013). Occasionally they also contain subunits able to write new epigenetic mark on the chromatin (Cai et al., 2005; Job et al., 2016). Due to these characteristics nucleosome remodelling complexes are a platform for interaction and coordination of different epigenetic mechanisms and therefore play central roles in gene regulation, e.g. stem cell differentiation (Koludrovic et al., 2015) and pluripotency maintenance (Zhao et al., 2017), DNA damage response (Morrison & Shen, 2009), cancer (Kadoch et al., 2013) or ageing (Liu et al., 2012).

The stability of the nucleosome is a relevant factor in the regulation of protein access to the DNA. A stable nucleosome will tightly bind DNA, making it unavailable for proteins and complexes that preferentially bind naked DNA. In fact, the length of nucleosomal and linker DNA were measured for first time thanks to the resistance of nucleosomal DNA to DNAses (Hewish & Burgoyne, 1973). Histone octamers are built from H2A/H2B and H3/H4 dimers thanks to the action of assembly factors and histone chaperones (Akey & Luger, 2003). Inversely, the interactions between dimers can be disrupted and nucleosomes disassembled by the action of nucleosome remodelling complexes at the expense of ATP (Lorch et al., 2006; Böhm et al., 2011). Such instability can reduce the binding of DNA or even promote a complete transition to naked DNA; facilitating the access of transcription factors. Additionally, instability of the nucleosomes can facilitate the exchange of histones (Bruno et al., 2003). This also provides a means to substitute the predominant version of the histones with histone “variants”. These “non-allelic” variants of histones have different biophysical characteristic. Additionally they are subjected to their own set of post-translational modifications and can alter the functions and structure of the nucleosome, adding a layer of regulation possibilities to the already discussed histone modifications. H1, H2A and H3 are particularly prone to this, while no histone variant has been identified for H4 (Kamakaka & Biggins, 2005). Some of these variants are particularly relevant and play distinct roles in the regulation of the chromatin. CENP-A, a 18 Introduction variant of H3 in mammalian cells, is precisely localized to the centromeres (Palmer et al., 1990). It has been shown that this histone is essential for formation of kinetochores and segregation of chromosomes (Amor et al., 2004). Another version of H3, H3.3, localizes to active chromatin (Ahmad & Henikoff, 2002). Many variants have been described for H2A. MacroH2A, a variant three times bigger than the original H2A, is greatly enriched in the inactive X chromosome of female mammals (Constanzi & Pehrson, 1998). H2A.X and its phosphorylation facilitate the repair of DNA double-strand breaks. H2A.Z, a particularly abundant variant of H2A, is associated with activation of gene expression (Larochelle & Gaudreau, 2003). Deletion of this variant produces an uncontrolled spread of heterochromatin and is lethal in most organisms, but it is not so in budding and fission yeast (Fan et al. 2004).

In the Chromodomain Helicase DNA-binding protein 1 (CHD1) (Lusser et al., 2005) and Imitation SWItch (ISWI) (Tsukiyama et al, 1999) Snf2 subfamilies, nucleosome repositioning effects are obtained by ATP-dependent sliding of the nucleosomes. Complexes that include these catalytic subunits function as motors that can reposition the nucleosomes in the chromatin. Linker DNA and H4 histone tails have been shown to promote their activity (Narlikar, 2010). Some remodellers can space the nucleosome in regular intervals; an arrangement which favours gene silencing (Sun et al., 2001). However, the opposite effect has also been observed in cases where members of the SWItch/sucrose non fermentable (SWI/SNF) and ISWI families remove the nucleosomes from gene promoter zones, facilitating the initiation of transcription (Clapier & Cairns, 2009). The mechanism of nucleosome sliding is not understood, partially because of the lack of high resolution structural insights in the active complexes (Leschziner, 2011), but it has been speculated that nucleosome sliding involves additional ATP hydrolysis cycles (Narlikar, 2010). Recent Cryo-EM structures show that unwrapping of the outer DNA turns of the nucleosome is a common feature among different chromatin remodellers (Sundaramoorthy, 2019). It is hypothesised that this reduces the contacts between DNA and histones, facilitating sliding of the nucleosomes. 2.1 Regulation of transcription: epigenetics 19

2.1.2.4 Other epigenetic phenomena

Epigenetics is a fast evolving field. It is constantly subjected to new discoveries, but also to revision and sometimes scepticism.

For instance, one of these controversial cases is RNA interference (RNAi). Many genes encode RNA transcripts that are never translated into proteins, non- coding RNAs (ncRNA). In mammals, long and small ncRNAs are more abundant than the actual protein encoding genes, representing a 42% of the human genes (while protein encoding genes represent a 33% and the remaining 25% are mostly pseudogenes) (GENCODE version 32; The ENCODE Project Consortium, 2019). Functional RNAs such as ribosomal RNA or transfer RNA have been known for a long time, but the vast majority of ncRNAs were thought to be little more than transcriptional “noise”. But then RNAi was discovered as a mechanism to silence genes, mediated by exogenous double stranded RNA (dsRNA), which in nature is present in the genomes of dsRNA viruses (Fire et al., 1998). Since then, a wide range of endogenous ncRNA classes have been described, with a wide range of functions in the cell (Nakagawa & Kageyama, 2014). This has changed the way ncRNA was perceived: about 80% of the human genome encodes some kind of RNA transcript (The ENCODE Project Consortium, 2019). Some of these RNAs function in transcriptional and post-transcriptional regulation of gene expression. Moreover, they play an important role in epigenetic processes such as DNA methylation and heterochromatin assembly (Lee, 2012; Marchese & Huarte, 2014). One well known example of this is the implication of the long non-coding RNA (lncRNA) Xist in the inactivation of the mammalian X chromosome. This lncRNA is expressed in the inactive X chromosome and promotes the formation of heterochromatin by first binding to some specific areas of the chromosome and then spreading until it coats the X chromosome entirely (Brown et al., 1991; Pollex & Heard., 2012). Interestingly, the action of Xist is regulated by other lncRNAs, especially Tsix, the complementary transcript of Xist transcribed from the antisense strand of the gene. This regulation is believed to depend on DNA methylation of the Xist promoter directed by Tsix, implicating another epigenetic phenomena (Lee et al., 1999; Nesterova et al., 2008). Small interfering RNAs, such as micro RNAs (miRNA), also have functions in epigenetic regulation. The first miRNAs shown to influence epigenetic regulation were the miR-29 family RNAs. They are able to 20 Introduction

Figure 2.4: Coordination of epigenetic marks. NcRNA: non coding RNA; MBD: Methyl Binding Domain protein; TF: Transcription Factors; RNA Pol II: RNA polymerase II. The modifications of the transcriptional state of the chromatin are thought to be result of the accumulation of different epigenetic marks over extended zones of DNA. Chromatin remodeling complexes play a central role in the coordination of different epigenetic processes. Positive transcriptional marks are acetylation of histones, large spacing and ejection of nucleosomes, and ultimately open and transcriptionally active euchromatin. Repressive marks and factors, instead, recruit chromatin silencing complexes, leading to hypermethylation of the DNA, deacetylation of histones, repositioning and assembly of nucleosomes and recruitment of further scaffolding proteins, leading to transcriptionally repressed and compact heterochromatin. reactivate silenced tumour suppressor genes in cancer cells by targeting DNMTs (Fabbri et al., 2007). Small interfering RNAs (siRNA) have been shown to be essential for the formation of the centromeric chromatin in fission yeast possibly by influencing histone methylation (Creamer & Partridge, 2011).

2.1.2.5 Ubiquitous and collaborative regulation of gene expression

Epigenetic regulation is a central component of the regulation of gene expression. Every process involving DNA transcription, physiological or pathological, is susceptible of regulation by epigenetic mechanisms. The possibility of establishing transcriptional states that can be inherited by ensuing generations of cells is particularly important for multicellular organisms, in which each cell shares a 2.1 Regulation of transcription: epigenetics 21 common genotype, but display a great diversity of phenotypes. Not surprisingly, epigenetic regulation is the key element required for maintaining the pluripotent state of stem cells and their differentiation into different tissue specific cell lines (Li & Zhao, 2008; Mohamed Ariff et al., 2012; Wutz, 2013; Feng & Chen, 2015; Adam & Fuchs, 2016). Epigenetics also represents a mechanism to produce a lasting response to the environment (Lillycrop et al., 2005, Latham et al., 2012). Longitudinal studies with monozygotic twins offer an exceptional opportunity to observe the evolution of the epigenetic landscape and the influence of the environment on it (Fraga et al., 2005). In some cases the epigenetic traits can be inherited not only by cellular lineages, but also from parents to the offspring (Daxinger & Whitelaw, 2012). This poses a peculiar case of inheritance of acquired traits. Cancer, is closely linked to epigenetic regulation (Jones & Baylin, 2002), and many epigenetic effector and reader proteins are targets for cancer therapy (Tomita et al., 2016, Savino et al., 2017). Studies about ageing (Ben-Avraham et al., 2014; Sen et al., 2016), memory (Puckett & Lubin, 2011; Rayman & Kandel, 2017), obesity (Park et al., 2017) and stress (Denhardt, 2018) demonstrate that epigenetic regulation is a very widespread phenomena, and underpin its importance.

While extensive work has been performed to uncover the processes underlying the various epigenetic phenomena, the mechanisms by which the final effects in gene expression regulation are achieved are unclear. Most epigenetic processes do not represent isolated events, but are connected. They often trigger each other in sequence. Experimental data suggest that this accumulation of several epigenetic marks over large regions of DNA actually promotes and maintains the changes in the transcriptional state of chromatin (Fischle et al., 2003; Freitag & Selker, 2005; Henikoff, 2005; Weaver & Bartolomei, 2014). It manipulates the packaging level as well as facilitates or impedes the interactions of other factors with nucleosomes and DNA (Figure 2.4) (Watt & Molloy, 1988; Fischle et al., 2005). In eukaryotes the transcription of mRNA and many non- coding RNAs is catalysed by RNA polymerase II (Pol II) (Cramer et al., 2001; Sainsbury et al., 2015), which interacts with a wide variety of factors. For example, the initiation of transcription requires the assembly of the pre-initiation complex (PIC), a super-complex composed of more than 50 proteins and a molecular weight of more than 2.5 MDa (Hahn, 2004; Robinson et al., 2016). The need to 22 Introduction accommodate such a huge machinery underscores the importance of the packaging level of chromatin. In this context, protein super-complexes hosting different subunits specialized in recognizing epigenetic motifs and producing further marking and remodelling effects in the chromatin have been shown to play a crucial role as platforms for coordination, amplification and maintenance of the epigenetic mechanisms and chromatin states (Figure 2.4). Consequently, chromatin remodelling complexes have become the focus of numerous and ambitious studies, aiming at structural and biochemical characterization. 2.2 The NuRD Complex 23

2.2 The NuRD Complex

2.2.1 A nucleosome remodelling and histone deacetylase complex

Among the nucleosome remodelling complexes, the NuRD complex (Nucleosome Remodelling and Deacetylase complex, also known as Mi-2) is remarkable because it was the first complex identified that combined catalytic subunits for these two epigenetic processes, nucleosome remodelling and histone deacetylation. In addition, the NuRD complex also includes epigenetic readers, such as MBD and other DNA and histone binding domains. NuRD was first described in 1998, purified almost simultaneously from Xenopus laevis and HeLa cells (Tong et al., 1998; Wade et al., 1998; Xue et al., 1998; Zhang et al., 1998). It is highly conserved in animals and plants, and widely expressed across different tissues (Denslow & Wade, 2007), but not in yeast. However, more recently a very similar complex has been described in fission yeast, the Snf2/histone deacetylase repressor complex (SHREC) (Sugiyama et al., 2007; Job et al., 2016). SHREC includes a Mit1 subunit which is a Snf2 chromatin remodelling ATPase homologue to the nucleosome remodelling subunit in NuRD (Cramer et al., 2014), and a Clr3 subunit, which is a HDAC known to target H3K14 acetylated lysines (Bjerling et al., 2002). Interestingly, SHREC also includes a Clr2, a subunit with a MBD-like domain (Job et al., 2016), further increasing the similarity with the NuRD complex.

The approximate composition of human NuRD (hNURD) has been known since it was first purified from HeLa cells, but recent studies have provide further insights into the stoichiometry of the complex (Le Guenzennec et al., 2006; Smits et al., 2013; Kloet et al., 2014). These studies reveal a complex and dynamic composition. Seven core subunits (those that always appear in stoichiometric amounts in holo NuRD complexes) were identified, in contrast to the six described in previous studies: CHD3/4, HDAC1/2, GATAD2A/B, MTA1/2/3, RbAp46/48, MBD2/3 and DOC1. Together they assemble into a complex of approximately 1 MDa size. For most of the core NuRD subunits at least two paralogues can be present in different copy numbers. Furthermore, the precise stoichiometry is 24 Introduction subject to environmental changes (Kloet et al., 2014) adding to the compositional flexibility described above(Allen et al., 2013; Ori et al., 2013; Torchy et al., 2015).

2.2.1.1 CHD3/4

The Chromodomain helicase DNA-binding protein 1 (CHD1) subfamily is a group of helicases with nucleosome-spacing activity, which are part of the Snf2 family of helicase-related ATPases. Proteins of this family contain a characteristic Snf2 domain consisting of two RecA-like folds and seven conserved helicase-related sequence motifs. They lack the capacity shared by other helicases to separate nucleic acid strands. Instead they produce an ATP-dependent torsional strain to DNA, providing the force to remodel nucleosomes and other DNA-protein complexes (Ryan & Owen-Hughes, 2011).

The nucleosome remodelling activity in hNuRD is provided by the CHD3 or CHD4 subunits. These are members of the CHD ATPase subfamily, also known as Mi-2α and Mi-2β. These proteins of ~230 kDa and ~220 kDa respectively represent the biggest subunit found in hNuRD complexes. Only one copy of either ATPase is present in hNuRD (Kloet et al., 2014), making them mutually exclusive. In most cases CHD4 is found in hNuRD complexes (Low et al., 2016). Recently it has been shown that CHD5 can also be found in distinct hNuRD complexes (Kolla et al., 2015). The signature domain of the Snf2 family, the ATPase domain, is in the middle area of the CHD3/4 sequence (Figure 2.5a) (Woodage et al., 1997). It is composed of an ATP binding domain followed by a helicase C-terminal domain, which are both required for nucleosome remodelling activity. The N-terminal part comprises two well conserved plant homeodomain (PHD) zinc fingers (Figure 2.5b and c) (Mansfield et al., 2011), and two tandem chromodomains (CD) (Figure 2.5d) (Wiggs et al., to be published). The PHD fingers have been shown to bind H3 histones (Musselman et al., 2009a; Mansfield et al., 2011), preferentially H3K9 trimethylated, but not H3K4 methylated histones (Musselman et al., 2009b). The CD domains can associate with DNA (Bouazoune et al., 2002). Both sets of domains are necessary for the catalytic activity of CHD3/4 (Musselman et al., 2012; Watson et al., 2012), and a mechanism were the PHD fingers regulate CD binding to DNA has been proposed (Morra et al., 2012). A recent study described 2.2 The NuRD Complex 25

Figure 2.5: Catalytic subunits of hNuRD. a) Schematic view of the domains in CHD3 and 4- CHD3 and 4 are responsible for the nucleosome remodeling activity of hNuRD thanks to their helicase domains. The PHD fingers (b and c) and the tandem chromodomains (d) are required for the helicase function. The N-terminal HMG box-like domain (e) mediates the interaction with poly(ADP-ribose). f) A SAXS envelope shows the combined structure of the helicase (red) the chromodomains (green) and the PHD fingers (blue) (Morra et al., 2012). g) Schematic view of HDAC1 and 2 showing the catalytic domain. h) The active site of HDAC1/2 is located in a narrow hydrophobic tunnel. i) The catalytic domain of HDAC2 and its zink ion bound to a benzamide inhibitor (ion and inhibitor depicted in grey). (Figure based on Allen et al., 2013; Torchy et al., 2015; Domain data obtained from Uniprot database). 26 Introduction another domain closer to the N-terminus, a High Mobility Group (HMG) box-like domain capable of binding poly(ADP-ribose) (Figure 2.5e) (Silva et al., 2016), suggesting that ribosylation could take part in NuRD regulation. In the C-terminal part, pericentrin-interacting domains have been described, which allow CHD3/4 to interact with centrosomes (Sillibourne et al., 2007). These domains have been suggested to participate in the recruitment of different co-repressors (Kehle et al., 1998; Murawsky et al., 2001; Shimono et al., 2003), and they appear to be required for transcriptional repression by hNuRD (Ramirez et al., 2012). Several phosphorylation sites found in the sequence of CHD3/4 are important for the regulation of its activity (Urquhart et al., 2011). Currently a Cryo-EM structure of CHD4 bound to a nucleosome and is pending publication. The unpublished data can be accessed in the preprint server bioRxiv (Farnung et al., pending publication).

CHD proteins were originally discovered as autoantigens in dermatomyosis disease (Ge et al., 1995; Seeling et al., 1995). Since then many relevant functions of these helicases have been described, as part of NuRD, other complexes, or as independent proteins (O’Shaughnessy & Hendrich, 2012). One of these functions is the DNA damage response. CHD4, has been shown to appear in DNA damage sites, probably recruiting the NuRD complex to them. Moreover, CHD4 is important in the regulation of the repair process. Additionally, depletion of CHD4 increases DNA damage levels (Larsen et al., 2010; Polo et al., 2010; Smeenk et al., 2010; Qi et al., 2016; Sun et al., 2016). However, it is not completely clear whether the loss of the DNA repair function is responsible for this defect or whether impairing of genome integrity maintenance by NuRD is the main reason for the increased damage. CHD3 and CHD4 were shown to either promote (Chudnovsky et al., 2014; D’Alesio et al., 2016) or suppress (Kim et al., 2011; Guillemette et al., 2015) tumor development. Accordingly, CHD4 is proposed as a target for cancer therapy (Nio et al., 2015; Sperlazza et al., 2015). CHD3/4 have also been associated with microtubule stabilization in the bipolar spindle (Yokoyama et al., 2013), adipocyte thermogenesis (Nie et al., 2016) and intellectual disability (Weiss et al., 2016). 2.2 The NuRD Complex 27

2.2.1.2 HDAC1/2

HDACs are a family of lysine deacetylases found in eukaryotic cells. They catalyse the deacetylation Nε-acetylated lysine residues, a reversible modification responsible for regulation of many proteins (Patel et al., 2011). Histone tails are among the substrates of this protein family, thus making HDACs important regulators of chromatin silencing. HDAC1 was the founding member of the family (Taunton et al., 1996), and since then additional members have been discovered and classified into four classes (Yang & Seto, 2008).

HDAC1 and 2 are members of the zinc-dependant class I of HDACs (Yang & Seto, 2008). hNuRD contains two copies of these proteins, each with a molecular weight of approximately 55 kDa (Kloet et al., 2014). They share around 85% of their amino acid sequence and their catalytic domain in the N-terminus is well conserved in all eukaryotes (Figure 2.5g). The C-terminal tail is more variable and can be phosphorylated or sumoylated (de Ruijter et al., 2003; Sengupta & Seto, 2004). The catalytic core consist in a β-sheet of eight strands surrounded by α-helices (Figure 2.5h). The active site forms a narrow channel with the zinc ion necessary for deacetylase activity accommodated in the cleft (Figure 2.5i) (Millard et al., 2013; Lauffer et al., 2013).

Studies in T-Cells suggest that HDAC1 and 2 are responsible for at least 60% of the deacetylation in the nucleus (Dovey et al., 2013). In addition to NuRD, they form part of a number of other repressor complexes like Sin3A, CoREST or NcoR/SMRT (Zhang et al., 1998; Ordentlich et al., 1999; You et al., 2001; Reichert et al., 2012), and they have other functions in transcription regulation and other areas (Reichert et al., 2012; Kelly & Cowley, 2013; Moser et al., 2014). Contrary to the widespread role of deacetylation in chromatin silencing, deletion of HDAC1/2 surprisingly results in down-regulation of a considerable number of genes, including pluripotency related genes (Zupkovitz et al., 2006; Kidder & Palmer, 2012). Furthermore, HDAC1 and 2 often appear localized to active gene promoters, and have been show to correlate with histone acetylation and transcriptional activation in CD4+ cells (Clayton et al., 2006; Wang et al., 2009; Zupkovitz et al., 2006; Kidder & Palmer, 2012), suggesting that transcriptional regulation by these enzymes is a complex and dynamic process. HDAC1 has 28 Introduction

Figure 2.6. GATAD2 and MTA subunits of hNuRD. a) Schematic view of the GATAD2A and B domains. b) Crystal structure of the GATAD2A coiled coil motif, important for its interaction with MBD2/3. c) Schematic view of the domains in MTA1, 2 and 3. d) Crystal structure showing the MTA1 ELM2 and SANT domains, involved in dimerization and other protein-protein interactions. The C-terminal part of the model (colour coded in blue and dark blue) is predicted to be unstructured. e and f) A few α-helixes in the first and second MTA1 SH3 domains have been crystallized in complex with other proteins. (Figure based on Allen et al., 2012; Torchy et al., 2015; Domain data obtained from the Uniprot database). 2.2 The NuRD Complex 29 proven to be an essential factor for development (Montgomery et al., 2007; Zupkovitz et al., 2010). The deletion of HDAC2 has not been shown to be lethal, but it does affect survival rates and creates a range of detrimental phenotypes, such as cardiac abnormalities (Montgomery et al., 2007). Conditional deletions lead to less severe phenotypes, likely due to a reciprocal functional compensation between HDAC1 and 2. Similarly to CHD3/4, HDAC1/2 also play a role in the regulation of the (Wilting et al., 2010; Yamaguchi et al., 2010) and DNA damage repair (Miller et al., 2010), and their contribution to tumorigenesis has motivated numerous of studies (He et al., 2016; Lin et al., 2016; Zhu et al., 2016; Witt et a., 2017). Most of these indicate that HDACs promotes growth, invasion and metastasis of tumours. Consequently they have been usually regarded as therapeutic targets in cancer therapy (Gao et al, 2017; Abdizadeh et al., 2017; Chen et al., 2017; Savino et al., 2017). Other functions of HDAC1 include a roles in splicing (Hnilicová et al., 2011) and in defence against neurotoxicity by amyloids in Alzheimer;s disease (Tao et al., 2017).

2.2.1.3 GATA2D2A/B

GATA-type zinc fingers are specific DNA binding domains. They recognize the [AT]GATA[AG] sequence, but are also able to bind proteins, which has proven to be important for their function (Liew et al., 2005). The binding domain is composed of two irregular antiparallel β-sheets and an α-helix and contains a zinc ion coordinated by four cysteines (Omichinsky et al., 1993). The first described protein displaying these domains was GATA-1 (known at the time as Eryf1), an erythroid- specific involved in erythroid development and α and β-globin expression regulation (Evans & Felsenfeld, 1989). Transcription factors of the GATA family typically contain two tandem zinc finger domains.

GATAD2A and B were discovered in 2001 as components of the hNuRD complex when they co-purified with MBD2 (Feng et al., 2002). Two copies of either paralog seem to be present in the hNuRD complex (Kloet et al., 2014). They are ubiquitously expressed proteins of 66 and 68 kDa respectively, with two conserved domains, CR1 in the N-terminus and CR2 in the C-terminus (Figure 2.6a) (Brackertz et al., 2002; Feng et al., 2002). CR1 is a coiled coil motif important for 30 Introduction the interaction of GATAD2A/B with MBD2/3 (Figure 2.6b) (Brackertz et al., 2006; Gnanapragasam et al., 2011; Walavalkar et al., 2013). This domain is also relevant for the interaction with other components of hNuRD such as CHD4. These interactions and the activity of the complex are enhanced by sumoylation of GATAD2A (lysines 30 and 487) and GATAD2B (lysine 33) (Gong et al., 2006). The CR2 domain is a histone binding domain containing a GATA-type zinc finger, which can bind both DNA and histone tails (Feng et al., 2002; Brackertz et al., 2006).

The activity of GATAD2A/B within the NuRD complex has been described to enhance MBD2/3 mediated repression of transcription. (Brackertz et al., 2006). As other components of NuRD, they have been shown to play important roles in cell differentiation (Vásquez-Doorman & Petersen, 2016) and development. Their non functional mutants result in early unviable (Marino & Nusse, 2007). GATAD2A/B have also been associated with DNA damage response (Spruijt et al., 2016) and cancer (Wang et al., 2017; Xin et al., 2017). GATAD2B deletions are implied to induce a particular syndrome featuring intellectual disability and defective synapse morphology (Willemsem et al., 2013; Tim-Aroon et al., 2017).

2.2.1.4 MTA1/2/3

Metastasis associated proteins 1, 2 and 3 (MTA1/2/3, also called metastasis tumor antigens), along with two isoforms of MTA1 and one of MTA3, form an own family of co-regulator proteins. Two copies of paralogs from this group are found in hNuRD complexes (Kloet et al., 2014). With molecular weighs of 80, 75 and 67 kDa, these proteins compromise four conserved types of domains with a high amount of predicted disordered regions between them and in the N-terminal region (Figure 2.6c). The N-terminal bromo-adjacent homology domain (BAH) is implicated in protein-protein interactions (Callebaut et al., 1999). Proteins containing BAH domains are often involved in chromatin and gene silencing. They have the ability to bind H3 or H4 histone tails (Chambers et al., 2012; Kuo et al., 2012). MTA1/2/3 also comprise an Egl-27 and MTA1 homology 2 domain (ELM2) (Figure 2.6d). Structures indicate that this domain is involved in protein-protein interactions (Millard et al., 2013 and 2014). This domain, unstructured when isolated, adopts a structured conformation to participate in the interaction with 2.2 The NuRD Complex 31

HDAC1. Part of the domain forms a homodimerization surface with four α-helices. Close to the ELM2 domain is the SANT (Swi3, Ada2, NcoR and TFIIIB) domain (Figure 2.6d) (Aasland et al., 1996), which also participates in the interaction with HDAC1 (Millard et al., 2013 and 2014). This domain, composed of three α-helixes, can also be found in other proteins involved in transcription regulation (Boyer et al., 2004; Millard et al., 2013). Finally, MTA proteins contain a GATA-type zinc finger domain, thought to be able to interact with DNA and proteins (Liew et al., 2005). Additionally, MTA1 has two Src homology 3 (SH3) binding domains (Figure 2.6e and f), which can bind other proteins presenting SH3 domains such as some cytoskeletal and signalling proteins (Ren et al., 1993; Desrochers et al., 2017). Crystal structures also show that two regions in the N-terminus of MTA1 (residues between 468-518 and 655-710, both including a common KRAARR sequence) are capable of binding RbAp46/48, another subunit of hNuRD (Alqarni et al., 2014; Millard et al., 2016). MTA1 is susceptible of sumoylation, which regulates its function (Cong et al., 2011).

MTA1, the founding member of the MTA family, was first described as a metastasis associated factor in rat mammary adenocarcinoma (Toh et al., 1994), and has since been shown to be overexpressed in a plethora of carcinomas and metastatic processes (Lai & Wade, 2011; Andisheh-Tadbir et al., 2016; Deng et al., 2017; Dhar et al., 2017). The increased presence of MTA1 correlates with grade, invasiveness and bad prognosis of the tumours (Nicolson et al., 2003). MTA2 shows a similar behaviour, but, interestingly, MTA3 behaves as a direct competitor of MTA1 (Zhang et al., 2006). In breast epithelial cells MTA3 has been shown to be a key up-regulator of the expression of E-cadherins associated with invasive processes through the control of Snail (Fujita et al., 2003a). MTA1 can also be found in co-repressor complexes other than NuRD, for example COREST (Humphrey et al., 2001) or Sin3A (Laherty et al., 1997). It has been shown to influence the chromatin packaging level independently from NuRD (Liu et al., 2015). 32 Introduction

Figure 2.7: RbAp and MBD subunits of hNuRD. a) Schematic view of the WD40 repeats in RbAp46 and 48. RbAp46 (b) and RbAp48 (c) have almost identical structures, shaped into a seven bladed β-propeller. d) Side view of the RbAp48 structure. The N and C-termnal α-helices (red and blue) can be seen packed to the side of the structure. e) Schematic view of the domains in MBD2 and MBD3. f) Crystal structure of the coiled-coil motif in MBD2, important for the interaction with GATAD2A. g) Crystal structure of the MBD domain of MBD2. h) The binding of MBD2 to DNA is mediated by the β-sheet and the L1 loop in the MBD domain. Residues R22, D32, Y34 and R44, inserted in the major groove of the DNA (grey), play a major role in the binding and specificity towards methylated DNA. (Figure based on Allen et al., 2012; Torchy et al., 2015; Domain data obtained from Uniprot database). 2.2 The NuRD Complex 33

2.2.1.5 RbAp48/46

WD40 is a protein fold consisting of four to ten repeats, comprising a β-sheet of four strands typically ending in a tryptophan-aspartate (WD) dipeptide (Smith et al., 1999). Several structures are available for WD40 domains; they show that the β-sheets are commonly arranged in a propeller shape (Pickels et al., 2002; Wu et al., 2003; Hao et al., 2007). This motif specializes in protein binding, often serving as a hub for protein interactions (van Nocker & Ludwig, 2003; Santosh Kumar et al., 2016). While the sequence is not, the structural motif is strongly conserved across eukaryotes. Proteins displaying a WD40 domain are abundant and widely spread (a study in Arabidopsis Thaliana identified 237 WD40-containg proteins divided in 143 families) (Neer et al., 1994; van Nocker & Ludwig, 2003).

Retinoblastoma associated proteins 46 and 48 (RbAp46/48), also known as retinoblastoma-binding proteins 7 and 4 (RBBP7/4), have about 92% of their sequence in common. They are 47 kDa histone chaperones . They are composed of a typical WD40 domain: seven WD40 repeats arranged as a seven bladed β- propeller, with the N-terminal and the C-terminal α-helixes packed against one of its sides (Figure 2.7a, b, c and d) (Murzina et al., 2008; Alqarni et al., 2014). Approximately four to six copies of RbAp46/48 can be found within the complex, but mass spectrometry studies revealed that their binding is sensible to the presence of detergents. Such detergents are commonly used in protein purification, and therefore the amount of RbAp46/48 in hNuRD can vary depending on the method used for isolation of the complexes (Kloet et al., 2014).

The main function of RbAp46/48 is to mediate protein-protein interactions. In addition to NuRD they can be found in many chromatin assembly or remodelling complexes, such as CAF-1 (Verreault et al., 1996), Sin3 (Zhang et al., 1997) or the polycomb repressive complex 2 (PRC2) (Kuzmichev et al., 2004). As part of these complexes RbAp46/48 serves as a recruitment platform for other factors, such as FOG-1 (Lejon et al., 2011), or histones (Murzina et al., 2008). Thanks to their ability to bind H4 histones RbAp46 and 48 have a relevant role in chromatin regulation. They have been associated with development (He et al., 2015), with the cell-cycle and mitotic/meiotic processes (Balboula et al., 2015; Mouysset et al., 2015), and with the regulation of β-catenin (Li & Wang, 2006). They have also 34 Introduction been shown to contribute to tumorigenesis. RbAp46/48 were first described in association with the retinoblastoma tumor suppressor, to which they owe their name (Qian et al., 1993).

2.2.1.6 MBD2/3

MBD proteins play a central role in epigenetic regulation. They can specifically recognize methylated CpG islands. They usually act as a bridge between methylation marks and other factors involved in epigenetic regulation, such as chromatin remodelling complexes (Jones et al., 1998; Nan et al., 1998). Their signature domain, the MBD domain is about 70 amino acids in length. It is composed of a four-stranded antiparallel β-sheet and a C-terminal α-helix followed by a hairpin loop, which together form an α/β sandwich. The structure is completed with two more loops, L1 and L2, that seem to be important for DNA binding (Ohki et al., 2001).

MBD2 and 3 are part of the core of hNuRD. With 43 and 32 kDa respectively, they are among the smallest components of the complex. Mass spectroscopy studies have revealed that they only contribute with one or two copies to the complex, and MBD2 and MBD3 are mutually exclusive (Le Guenzennec et al., 2006; Kloet et al., 2014). In addition to the characteristic MBD domain (Figure 2.7g), MBD2 and 3 also display a conserved coiled-coil motif, implicated in interaction with GATAD2A/B (Figure 2.7f) (Brackertz et al., 2006; Gnanapragasam et al., 2011; Walavalkar et al., 2013). The N-terminal region of MBD2, rich in glycine and arginine amino acids (GR), is also involved in protein- protein interactions (Fujita et al., 2003b).

While MBD2 and 3 share 77% of their sequence, they seem to have opposing functions in transcription (Günther et al., 2013). Interestingly, mammalian MBD3 has lost its ability to specifically bind methylated DNA, and is instead involved in the binding of non-methylated DNA, localizing to active gene promoters (Saito & Ishikawa, 2002; Shimbo et al., 2013). MBD2 instead fulfils its expected role by binding methylated DNA (Figure 2.7h), and it is associated with transcriptional repression (Matsumoto & Toraya, 2008; Cramer et al., 2014), but occasionally also with activation (Fujita et al., 2003b). Yet, controversial recent 2.2 The NuRD Complex 35 studies suggest that both MBD2 and 3 have an inter-dependent relationship with DNA methylation, further complicating the already complex regulatory network of these proteins (Hainer et al., 2016). MBD2/3 play central roles in development and stem cell regulation. MBD3 deletion, but not that of MBD2, is early embryonic lethal (Hendrich et al., 2001). Different isoforms of MBD2 have been shown implicated in opposite roles in cell differentiation: MBD2a, the preferred isoform for NuRD complex formation, has been shown to promote differentiation of cells, while MBD2c facilitates of into induced pluripotent stem cells (iPSC). (Lu et al., 2014). Deletion of MBD2 only involves moderate phenotypes (Hendrich et al., 2001).

2.2.1.7 DOC1

DOC1 (Deleted in oral cancer 1), also known as CDK2AP1 (Cyclin-dependant kinase 2 associated protein 1) has been identified more recently as a hNuRD core subunits (Le Guenzennec et al., 2006; Reddy et al., 2010). Two copies of this protein are present in hNuRD (Kloet et al., 2014).Therefore, this small protein of 12 kDa, as well as its functions and relations with the rest of the complex are not

Figure 2.8: DOC1 is part of the core NuRD complex. a) Schematic view of the domains in DOC1. This small protein is composed of an N-terminal disordered domain and a domain containing 3 α-helixes in the C-terminus. b) Crystal structure of the C-terminal domains of two DOC1 subunits (Ertekin et al., 2012). This domain of DOC1 is capable of homorimerization. 36 Introduction well studied yet. DOC1 is composed by a disordered N-terminal domain, involved in interacting with CDK2AP2 (Buajeeb et al., 2004), and a C-terminal domain composed by two α-helixes, able to assemble DOC1 homodimers (Figure 2.8) (Ertekin et al., 2012). DOC1 was originally described as a tumour suppressor in oral cancer (Todd et al., 1995; Tsuji et al., 1998). Functions of DOC1 in other cancer types have also been shown, normally as a tumour suppressor (Zolochevska & Figueiredo, 2010; Zheng et al., 2011), but occasionally also as promoter of tumorigenesis (Xu et al., 2014). Through its relation with CDK2 and the DNA polymerase-Alpha/primase, DOC1 has been shown to have functions in the cell cycle (Matsuo et al., 2000). As part of NuRD, DOC1 is also an epigenetic regulator of stem cell differentiation, required for gene silencing (Deshpande et al., 2009).

2.2.2 Structural and functional insights into NuRD complexes

It was long assumed that the chromatin states correlated with transcription in a simple binary way: active euchromatin and silent heterochromatin. Furthermore, large parts of the genome were thought to be constitutively silent and regarded as “junk” DNA. However, this paradigm changed with the discovery of transcriptionally active heterochromatin and the rich functions of constitutive heterochromatin and ncRNAs. Similarly, the NuRD complex was initially thought to be a straight forward chromatin silencing effector. This view was supported by the fact that the NuRD complex contains HDAC enzymes and MBD proteins, two of the classical hallmarks of gene silencing. Contrary to this, recent studies have revealed more complex interactions between the NuRD complex and transcription regulation (Hu & Wade, 2012). Surprisingly, NuRD can be localized to promoters of active genes (Shimbo et al., 2013). Currently it is suggested that the NuRD complex is necessary to maintain “adequate” levels of gene repression. Along with other chromatin regulators, NuRD establishes a dynamic equilibrium, described as “fine- tuning”, facilitating a fast positive or negative response to stimuli (Hu & Wade, 2012; Basta & Rauchman, 2016).

The relevance of the NuRD complex in transcription regulation makes it a central component of processes such as stem cell regulation and tumorigenesis 2.2 The NuRD Complex 37

(Lai & Wade, 2011; Laugese & Helin, 2014; Hu & Wade, 2012), but its role is not limited to gene expression. The NuRD complex is also involved in processes such as DNA damage repair (Li & Kumar, 2010; Gursoy-Yuzugullu et al., 2016) or organisation (Yokoyama et al., 2013). Moreover, it plays a role in ageing and stress response, were it has been shown to promote longevity (Pegoraro et al., 2009; Zimmerman & Kim, 2014). NuRD is further involved in host- pathogen interactions. For example, in toxoplasma infection the parasite secretes a factor, TgIST, capable of recruiting NuRD to mediate the repression of the interferon γ-mediated response (Olias et al., 2016). In HIV infection, instead, NuRD plays a beneficial role by repressing the expression of the HIV-1 long terminal repeat, hence affecting virus replication (Cismasiu et al., 2008). Furthermore, the HIV-1 Vpr protein counteracts the function of NuRD by degrading ZIP and sZIP proteins, involved in the recruitment of the complex, with yet to be understood consequences (Maudet et al., 2013). More functions of the NuRD complex include roles in homologous recombination of the chromosomes by interacting with telomeres (Conomos et al., 2014) and establishment of the mammalian circadian clock thorough self-induced and periodic repression of PERIOD proteins transcription (Kim et al., 2014).

Figure 2.9: Proposed modes of NuRD recruitment to DNA for gene silencing. a) hNuRD can be directly recruited to the DNA thanks to different DNA and nucleosome binding domains. MBD2 specifically recognized methylated DNA, recruiting hNuRD to genes with methylated CpG islands. HDAC1/2 deacetylates histone tails (green pentagon) and CHD3/4 perform their remodeling activity, silencing these regions. b) hNuRD can be recruited by the interaction with transcription factors (TF, different transcription factors are represented by different sized shapes), specially thanks to the MTA1/2/3 and MBD3 subunits. These factors direct the silencing activity of NuRD to specific loci. Additionally, HDAC1/2 can interact with acetylated transcription factors and deacetylate them, either activating or deactivating them. 38 Introduction

How the NuRD complex can function in such variety of cellular processes and its exact mechanism of action remain enigmatic to date. It has been proposed that the key to such multiplicity of function lies in the modular and heterogeneous composition of the complex (Bowen et al., 2004; Lai & Wade 2011; Kelly & Cowley, 2013). The analysis of the core subunits reveals different proteins which are products of different genes can replace each other in the hNuRD, and in addition within the same gene different isoforms may also appear, leading to an extremely high degree of compositional heterogeneity (Le Guenzennec et al., 2006; Smits et al., 2013; Kloet et al., 2014). In addition, the NuRD complex interacts with a wide range of transient factors, either ubiquitous or tissue and situation specific, such as Nanog, Ikaros, Helios, LSD1, Sall4, Oct4, BRCA1, FOG1 and many more (see Zhang & Li, 2010 for a review). Compositional variability would allow for different binding partners and recruitment of different NuRD complexes. Extended variability from the different transiently associated factor and tissue specific architectures would further increase the specificity of NuRD targeting to particular DNA loci (Figure 2.9). Different studies have suggested that the different forms of MBD (Kloet et al., 2014), MTA (Xue et al., 1998) or HDAC (Kelly & Cowley, 2013) present in hNuRD are mutually exclusive, supporting this theory.

2.2.2.1 NuRD in stem cell regulation and development

Maintenance of pluripotency in stem cells, differentiation and development are processes primarily regulated by epigenetics. As such, a multitude of examples can be found in the literature of how the NuRD complex is involved in these processes.

The NuRD complex is essential for differentiation and proper lineage commitment in stem cells. MBD3 in particular has been shown to be necessary for embryonic development (Hendrich et al., 2001). When conditional MBD3 knock outs were used to avoid embryonic lethality, the stem cells were shown to be unable to differentiate in most cases. In some contexts these MBD3 ko cells could initiate differentiation, but failed to maintain commitment to specific cell lines and kept expressing pluripotency markers. This suggests a role of NuRD in maintaining proper repression of pluripotency genes. In turn, self-renewal of the embryonic 2.2 The NuRD Complex 39 stem cells was not compromised (Kaji et al., 2006 and 2007). More recent studies support the idea that NuRD dynamically collaborates in a transcriptional equilibrium that can be displaced either towards self-renewal or differentiation. The repressing activity of NuRD results in an overall decrease of the transcription levels of pluripotency associated genes. Opposing this effect, specific transcription factors, for example LIF or Stat3, induce the activation of these genes, maintaining the pluripotent state and, thanks to the subtle differences in the transcription levels, give rise to transcriptionally heterogeneous subpopulations of stem cells. Removal of these factors leaves the NuRD silencing activity unopposed and leads to a definitive silencing of pluripotency genes and lineage commitment. On the contrary, loss of NuRD activity allows uncontrolled transcription of pluripotency genes, hindering differentiation and homogenizing the otherwise transcriptionally heterogeneous stem cell population (Reynolds et al., 2012). While studies focusing on MBD3 clearly show an involvement in stem cell differentiation, other studies with the NuRD subunits CHD4 (Zhao et al., 2017), HDAC1/2 (Jamaladdin et al., 2014) and MTA1 (Liang et al., 2008) have shown opposite results. These subunits were shown to be essential for proper maintenance of pluripotency and self-renewal. It has to be pointed out however, that these proteins are not exclusively found in the NuRD complex. They have a rich network of other interactions which could explain the seemingly contradictory results regarding stem cell regulation. The association of NuRD to stem cells is not restricted to pluripotency maintenance and differentiation. Studies with induced stem cells have shown that hNuRD is required for cell reprogramming in some contexts (dos Santos et al., 2014). However again, other studies performed under different conditions indicate MBD3 as an inhibitor of cell reprogramming. Here, a selective knock down of MBD3 during specific time points increased the efficiency of reprogramming from 1-20% to up 100% (Luo et al., 2013; Rais et al., 2013).

Another interesting example is the role of NuRD in development of the neural system. Distinct NuRD complexes integrating CHD3, CHD4 or CHD5 subunits have distinct functions in the development of the cortex, early proliferation of progenitors and neuronal migration, respectively (Nitarska et al., 2016). This exemplifies the relevance of compositional heterogeneity in the regulation of NuRD activity. More examples of the role of the NuRD complex can be found in the development of (Dege & Hagman, 2014), the heart (Waldron et 40 Introduction al., 2016), the (Denner & Rauchman, 2013) or during haematopoiesis (Li et al., 2009).

2.2.2.2 NuRD in cancer

Epigenetic regulation of DNA transcription is an inherent part of the processes that lead to cancer. Loss of differentiation, which is normally controlled by the silencing of pluripotency related genes, is often correlated with malignancy. Consequently, NuRD is expected to have an extensive role in tumorigenesis and metastatic processes. Most of the subunits of NuRD have been associated with cancer to some degree, either promoting or suppressing tumorigenesis. The role of MTA1/2 (Toh et al., 1994; Nicolson et al., 2003) and HDAC1/2 (He et al., 2016; Lin et al., 2016; Zhu et al., 2016; Witt et al., 2017) in tumour progression is well known and has been extensively studied. On the contrary, MTA3 (Zhang et al., 2006), RbAp46/48 (Chen et al., 2001), MBD3 (Li et al., 2017) and DOC1 (Zolochevska & Figueiredo, 2010) are associated with tumour suppression. CHD4 and GATAD2A are implicated in both tumour progression (Chudnovsky et al., 2014; D’Alesio et al., 2016; Wang et al., 2017) and suppression (Kim et al., 2011; Guillemette et al., 2015; Xin et al., 2017). NuRD partakes of this duality, assuming functions as either a suppressor or a promoter of tumorigenesis. Moreover, NuRD is involved in genome stability and DNA repair (Smeenk et al., 2010; Spruijt et al., 2016). Taken together, NuRD could be involved in tumorigenesis in three different ways: silencing of genes directed by transcription factors, silencing of genes directed by DNA methylation and modification of other proteins (Lai & Wade, 2011).

A good example of NuRD regulation mediated by transcription factors is its interaction with the c-Jun proto-oncoprotein (Aguilera et al., 2011; Liu et al., 2015). The inactive form of this protein is able to recruit the NuRD complex to its target genes, contributing to transcription repression. The activation of c-Jun happens upon phosphorylation of its Ser63 and Ser73 residues, which abolishes the interaction with NuRD. As NuRD occupancy of the c-Jun target genes decreases, c-jun transcription is promoted. This leads to up-regulation of cell proliferation and down-regulation of the p53 tumour suppressor, and an overall increase of tumorigenesis. 2.2 The NuRD Complex 41

NuRD is further involved in the regulation of the p53 tumor suppressor through direct modification of the protein. NuRD is capable to deacetylate p53, leading to its inactivation (Luo et al., 2000). Given the role of p53 in cell-cycle control and , this results in increased tumorigenesis. Consequently, inhibition of the NuRD complex leads to up-regulation of p53 and increased apoptosis in tumour cells (Kai et al., 2010). Interestingly, studies in mouse embryonic fibroblasts showed that, upon depletion of MTA1, the opposite effect regarding p53: Despite of p53 down-regulation, depletion of MTA1 equally led to a cell-cycle arrest mediated by a different, unknown mechanism (Li et al., 2010). This example underscores the importance of distinct tissue-specific functions of the NuRD complex, complicating its characterization.

Due to the numerous examples of NuRD as a tumour growth and metastasis promoter, the entire complex and its subunits are potential targets for cancer therapy (Kai et al., 2010; Nio et al., 2015; Gao et al, 2017). However, due to the seemingly diametrical functions of NuRD in cancer promotion and suppression, a better understanding of the NuRD complex(es) is crucial to develop efficient tissue specific therapies.

2.2.2.3 Limited structural characterization of NuRD complexes

The structure and many of the interactions between the subunits in the full holo- NuRD complex are currently unknown. X-ray structures provided insights in protein-protein interactions within the complex (Figure 2.10). MTA1 appears to be a central subunit, recruiting both HDAC1 and the RbAp subunits. MTA1’s interaction with HDAC1 is mediated by the ELM2 and SANT domains (Figure 2.10a). Moreover, the ELM2 domain also allows homodimerization of HDAC1 (Figure 2.10a) (Millard et al., 2013). The SH3 binding motives of MTA1, located in regions which are predicted to be disordered, can bind one RbAp46/48 subunit each (Figure 10.b and c) (Alqarni et al., 2014; Millard et al., 2016; Schmidberger et al., 2016). These interactions allow the assembly of a subcomplex of MTA1, HDAC1 and RbAp46/48 with a stoichiometry of 2:2:4 (Figure 2.10e). Additionally, the structural basis of the interaction between the coiled-coil domains of MBD2 and GATA2D2A has been described (Figure 2.10d) (Gnanapragasam et al., 2011). 42 Introduction

Figure 2.10: Crystalized interactions of hNuRD subunits. a) Crystal structure of a dimer of MTA1 (blue) and HDAC1 (brown) (Millard et al., 2014). The dimerization is mediated by the ELM2 domain of MTA1. b) Crystal structure of RbAp48 in association with MTA1first SH3 domain (residues 464-515) (Millard et al., 2014). c) Crystal structure of RbAp48 in association with MTA1 second SH3 domain (residues 670-710) (Alqarni et al., 2014). In both cases the interaction of MTA1 with RbAp48 is mediated by an α-helix that is packed between the first blade of the RbAp48 β-propeller and the C and N-terminal α-helices. d) Crystal structure of the coiled-coil domain interaction between GATAD2A and MBD2 (Gnanapragasam et al., 2011). e) The combined information extracted from the crystal structures suggests that MTA1 acts as a hub for dimerization and recruitment of the rest of hNuRD subunits. Together with HDAC1 and RbAp48 it forms a dimeric deacetylase core for the hNuRD complex. GATAD2A and MBD2 interact together thorough their coiled-coil domains. Experimental data suggest that CHD3/4 also interacts with them, while MBD2/3 interacts with the rest of the subunits, but to date none of these associations have been seen in published structures. 2.2 The NuRD Complex 43

GATAD2A/B subunits are also thought to interact with CHD3/4 subunits (Kloet et al., 2014).

It is unclear how the MTA:HDAC:RbAp and MBD:GATA complexes and the CDH and DOC1 subunits interact to form a full holo-hNuRD complex. It has been suggested that an intrinsically disordered domain of MBD2/3 is required for the recruitment of the MTA:HDAC:RbAp subcomplex to GATAD2A/B and CHD3/4 (Desai et al., 2015). However, structural data are lacking to support this.

In addition to the X-ray structures, low-resolution EM structures of NuRD subcomplexes have been published: a dimeric complex of C-terminally truncated MTA1, HDAC1 and RbAp48 (featuring a single RbAp48 per MTA1, due to the truncation of the second SH3 site) resolved to 19Å (Millard et al., 2016). And two RbAp48 proteins bound to small MTA1 fragments with a resolution of 30Å (Schmidberger et al., 2016). Finally an EM structure of 38Å shows a complex of MTA2 and RbAp46 (Brasen et al., 2017). 44 Introduction

2.3 Scope

NuRD plays a critical role of in the regulation of transcription, and therefore in processes such as cell differentiation, tissue development or tumorigenesis. Despite its importance, little is known about it. Moreover, several reported functions of the NuRD complex seem to contradict each other. It is thought that the compositional heterogeneity of the complex is key to the diverse and sometimes antagonizing functions of NuRD complexes. The lack of structural insights is an additional obstacle to understand basic NuRD functions. Elucidating the precise structure of full, functional NuRD complexes would not only provide insights into its interactions with nucleosomes and molecular mechanism, but would also allow to develop pharmaceutical strategies to target NuRD complexes. This would represent an invaluable therapeutic asset, particularly in the treatment of cancer. The aim of this thesis is the functional and structural characterization of the NuRD complex. The complementary use of recombinant protein expression, affinity purification, cross-linking mass spectrometry and electron microscopy techniques provides the means to study the distinct interactions between subunits of the NuRD complex, the architectural principles of its assembly and its structure. The first part of this work focuses on the hNuRD complex for its relevance in human physiology and pathology. The endogenous hNuRD complex was obtained from HeLa cells and used for single particle EM analysis. In parallel, independent subunits were expressed in insect cells to study specific protein subcomplexes and their interactions. The second part of this thesis focuses on the drosophila NuRD complex (dNuRD), a less heterogeneous homologue of the hNuRD, lacking the isoforms of the subunits. Negative stain electron microscopy (NS-EM) techniques are used for systematic analysis of different dNuRD subcomplexes. In parallel, single particle cryo-electron microscopy is used to uncover the structure of endogenous and recombinant dNuRD complexes to the highest possible resolution. 3 Electron Microscopy Studies of Human NuRD

Resumé en français

La première partie de ma thèse vise à démêler la structure du complexe NuRD humain. Pour cela, nous avons utilisé la microscopie électronique en transmission (MET), appuyée par la spectrométrie de masse analytique par nos collaborateurs. Nous savions que NuRD était un complexe particulièrement hétérogène et flexible. Nous avons donc fait des efforts particuliers pour développer des protocoles de purification et de réticulation appropriés afin d’obtenir l’échantillon le plus pur et le plus stable possible.

Nous avons purifié NuRD humain endogène à partir de cellules HeLa en utilisant du MBD3 marqué au GFP comme appât pour le complexe complet. Malgré de faibles rendements, nous avons réussi à purifier suffisamment de complexe NuRD humain pour les études de EM.

Les moyennes de classe 2D et les modèles 3D reconstruits à partir de données NuRD humaines montrent une grande hétérogénéité, ce qui rend difficile l'obtention d'une résolution élevée. En conséquence, nous avons décidé de nous concentrer sur le complexe NuRD de Drosophila. Ce complexe est considérablement moins hétérogène et représente une meilleure cible pour les études de EM.

En parallèle, nous avons étudié les interactions de sous-unités individuelles. Nous avons découvert que GATAD2A est capable de se lier à la calmoduline. Cela indique que le calcium peut aider à réguler l'activité de NuRD via la calmoduline. 46 Electron Microscopy Studies of Human NuRD

3.1 Introduction

This chapter focuses on the work towards uncovering the structure of a functional human NuRD complex. For this we used Transmission Electron Microscopy (TEM), a combination of Negative Stain and Cryo-electron microscopy techniques, supported with additional data from Mass Spectrometry.

Classically, X-ray crystallography has been the method of choice to obtain high resolution models of protein structures. Using diffraction patterns of X-rays passing through protein crystals, this technique can resolve the atomic structure of proteins. However, there are serious restrictions to the use of X-ray crystallography. Crystallization of proteins is unpredictable and does not always succeed. More importantly, samples for purification need to be highly homogeneous, which means that different biologically meaningful conformational states of the protein will be excluded from the study. While discrete conformations may be obtained independently from separate crystallography assays, more subtle variations and intermediates are beyond the reach of the technique. Furthermore, crystallization often requires of mutation or complete removal of protein regions prone to disorder, such as intrinsically disordered domains, and the crystallization conditions may lead to structures of conformations of the studied protein which are not relevant for its function. These restrictions are particularly relevant for macromolecular protein complexes which often rely on conformational and compositional plasticity to fulfil their functions.

Cryo-EM and state of the art image processing, with its own strengths and weaknesses, provides an alternative approach to the study of protein complex structure at high resolution, and hence was chosen for this study. 3.2 Methods 47

3.2 Methods

3.2.1

Mammalian cell cultures: The endogenous human NuRD was produced using a stable HeLa cell line generated by our collaborator Michiel Vermeulen, based on a previously described construct (Kloet et al., 2014). Briefly, HeLa cells were transfected with a N-terminally tagged, inducible MBD3 construct using Lipofectamine (Invitrogen). The construct was created by cloning the tandem tag, comprising a enhanced GPF, two Tobacco Etch Virus (TEV) protease cleavage sites and decahistidine into a pcDNA5 vector, using a flippase – flippase recognition target (flp-FRT) recombination system. Then MBD3, amplified with BamHI and XhoI, was inserted into the cloning site of the vector. The vector also included genes for blasticidin and hygromycin antibiotic resistances. Cell stocks frozen in presence of DMSO were thawed and plated in 100mm diameter plates in DMEM medium supplemented with 10% Fetal Bovine Serum (FBS) and L- glutamine and in presence of penicillin and streptomycin antibiotics. After 16-24 hours of growth confluent plates where lightly treated with trypsine to harvest cells. These where plated again in 150 mm diameter plates with hygromycin and blasticidin antibiotics to select for the desired construct. After three days the plates showing 70-90% confluence where induced during 16 hours using doxycycline in order to produce recombinant GFP-MBD3 in amounts similar to endogenous MBD3 protein levels (Kloet et al., 2014, Baymaz et al., 2014). Cells where collected and cytosolic and nuclear extracts obtained as described before with minimal modifications (Kloet et al., 2014). After 5 minutes exposure to trypsin at room temperature to dislodge them from the plate, pelleted by centrifugation at 500 g during 5 minutes at 4°C and washed in PBS. Following collection the cells were incubated in buffer A [10 mM HEPES at pH 7.9, 10 mM KCl, 1.5 mM MgCl2 and 0.15% NP40, complemented with C protease inhibitor tablets (Roche)] and lysed using a dunce homogeniser with a type B pestle (tight). Cytosolic extracts where separated from the nuclei and the membrane by centrifugation at 48 Electron Microscopy Studies of Human NuRD

3200 g during 15 minutes at 4°C. Nuclei where further incubated during 30 minutes in buffer B (20 mM HEPES at pH 7.9, 420 mM NaCl, 2mM MgCl 2 0.5 mM EDTA, 20% glycerol, 0.1% NP40 and 0.5 mM DTT) to lyse them. Both cytosolic and nuclear extracts were ultracentrifuged at 20000 g for 20 minutes at 4°C and frozen at -80°C for later use.

Insect cell culture: Individual hNuRD subunits were expressed in a Spodoptera frugiperda (Sf 21) (Invitrogen) cell line using the MultiBac baculovirus – insect cell expression system (Berger et al., 2004). The insect cells were transfected with a recombinant bacmid encoding for either wild type GATAD2A and CBP-tagged CHD4 or wild type GATAD2A alone. Sequences encoding these proteins were provided by our collaborators in the Vermeulen group. The first virus generation (V0) was produced in 3 ml cultures in a 6-well plate. This V0 was used to infect 25 ml Sf21 cultures in shaker flasks, producing the V1 generation of viruses. This V1 was finally used for protein production by infection of 400 ml of Sf21 insect cells in Sf-900 medium in 2 litre Erlenmeyer flasks which were incubated at 25°C orbiting at 80 rpm (Corning). The cells were harvested after 72 to 96 hours, after the day of proliferation arrest (DPA). DPA was defined by the plateau of the expression of the the YFP internal expression reporter which is encoded by the baculovirus. YFP levels were determined each 24h to to follow recombinant protein expression. The cells were harvested by centrifugation at 800 x g in a JA8.1000 rotor for 10 min at 4° C. Cells were lysed using short bursts of sonication for a total of 150 seconds at 4°C. Cytosolic extracts were separated by centrifugation at 1700 g for 20 minutes. GATAD2A and CHD4 were co-expressed in insect cells using the MultiBac system (Berger et al., 2004), following the same procedure described above. The baculovirus construct encoding wild type GATAD2A and CBP-tagged CHD4 was provided by our collaborators. 49

3.2.2 Protein biochemistry

Protein purification from mammalian cells: The endogenous NuRD from HeLa cells was purified as previously described (Kloet et al., 2014). After purification via GFP affinity chromatography I eluted the complex by cleavage using Glutathione S-Transferase (GST) tagged TEV protease in TEV cleavage buffer (10 mM HEPES at pH 8, 150 mM NaCl, 1 mM EDTA, 3% glycerol, 0.025% NP40 and 1mM DTT). The TEV protease was eliminated by a GST resin affinity purification step. Subsequently, I concentrated the protein and stabilized the hNuRD complex by Bissulfosuccinimidyl Suberate (BS3) cross-linking. BS3 (Thermo Fisher Scientific) is a lysine-specific cross-linking agent with a linker length of 11.5 Å. The final purification step comprised size exclusion chromatography using an Akta Micro and a Superose 6 column (GE Healthcare Life Sciences). Following this protocol I obtained a sample that showed clearly all bands for the known hNuRD subunits on a Coommassie-stained SDS gel. Subsequent optimization of the incubation times and elution volumes allowed improved protein yields. Despite this improvement, the yields remained low. Purification of 30 ml of nuclear extracts, adjusted to a protein concentration of approximately 6mg/ml, would lead to 600 μL of sample, which would then be concentrated (either by osmotic concentration inside a membrane using highly concentrated polyethylen glycol to extract buffer from the sample, or with 10kDa Amicon concentration 4ml columns) up to 10 times to a final concentration of 0.1-1 mg/ml. The GRAFIX (Gradient Fixation) procedure (Kastner et al., 2008), when applied, consisted in gradient centrifugation in a glycerol gradient in the presence of mild glutaraldehyde crosslinking. About 50-100 μL complex was loaded onto a 4ml centrifuge tube (Beckman 7/16x2–3/8 P.A) containing a gradient of 10-30% v/v glycerol and 0-0.15% glutaraldehyde in the same buffer containing the sample (10 mM HEPES at pH 8, 150 mM NaCl, 1 mM EDTA, 3% glycerol, 0.025% NP40 and 1mM DTT), generated with a Gradient Master device (Biocomp/Wolf laboratories) and centrifuged at 4°C, 34,000 rpm for 18 h in a SW60Ti rotor (Beckman). Centrifuged samples were fractionated into 0.2 ml aliquots using a peristaltic pump (Biorad). The cross-linking activity of 50 Electron Microscopy Studies of Human NuRD glutaraldehyde was quenched with 2 μl of 80 mM glycine pH 7.6 immediately after fractionation.

Protein purification from insect cells; The cytosolic extracts obtained from the insect cells where subjected to affinity chromatography using calmodulin-agarose resin (Sigma-Aldrich). The extracts where incubated in the resin in a calcium containing binding buffer (50 mM TRIS at pH 8, 300 mM NaCl, 1 mM MgCl 2 and 2 mM CaCl2), extensively washed and eluted with three applications of elution buffer containing EGTA (50 mM TRIS at pH 8, 300 mM NaCl, 1 mM MgCl2 and 2 mM EGTA). EGTA was used instead of the more common EDTA chelating agent to avoid depleting the magnesium ions in the buffer, as EGTA shows higher affinity for calcium. Elutions were pulled and concentrated two times with Amikon 10kDa pore size 4ml concentrating columns. were The purification protocol was optimised by increasing the elution time (from 5 minutes to 30 minutes each elution) and increasing the concentration of the EGTA in the elution buffer (along with an increase of magnesium chloride in the buffer, to 3mM each).

After affinity chromatography the co-expressed wild type GATAD2A and CBP tagged CHD4 were subjected to size exclusion chromatography using a 10-30% glycerol gradient generated with a Gradient Master device (Biocomp/Wolf laboratories) in elution buffer without EGTA and with additional 1mM ZnCl2, loaded in a centrifuge tube (Beckman 7/16x2–3/8 P.A) and centrifuged at 4°C, 34,000 rpm for 16 h using a SW60Ti rotor (Beckman). Fractions of 0.2 ml were collected using a peristaltic pump (BioRad). This sample and the pure wild type GATAD2A were also subjected to analytical size exclusion chromatography using an Akta Micro and a Superose 6 column (GE Healthcare Life Sciences). Wild type GATAD2A was also incubated for 2h with 0.025 mg/ml (final concentration) of calmodulin (CaM) (Sigma Aldrich) before loading into the Superose 6 column to address the possible binding between the two molecules. 51

3.2.3 Electron microscopy

Grid preparation: 5 μl of sample was adsorbed onto carbon film for 60 s and blotted by gentle application of blotting paper (Whatman, Grade 1). Afterwards the carbon film was floated into 1% uranyl acetate staining for 30 s and deposited on copper 300 mesh grids (Electron Microscopy Sciences, EMS300-Cu). The graphite carbon film was prepared in house by evaporating carbon over a mica plastic support. We used carbon rods and the thickness of the carbon film (usually between 1 and 50 nm) was controlled by exposure time and visually checked in the microscope, but not directly measured for each preparation.

F20 data collection: In collaboration with Dr. Manikandan Karuppasamy from our lab, I used the F20 microscope at the Institut de Biologie Structurale (IBS, Grenoble GIANT campus) at 80 kV and 40kx magnification (2.85Å/pixel) and collected 250 tilt pairs images (45° tilt) for random conical tilt (RCT) reconstruction (Radermacher, 1988). Moreover, 300 untilted micrographs were collected for refinement of the initial volumes.

Image processing: Particle picking from the 250 micrograph pairs resulted in 39,361 particle pairs. We used the untilted images for iterative 2D Multivariate Statistical Analysis (MSA) and classification resulting in 750 2D class averages. For this, I used the iterative MSA, classification and Multi-Reference-Alignment (MRA) implemented in the IMAGIC-5 software (Van Heel & Keegstra, 1981). 407 of 750 2D class averages were used to start the RCT reconstructions in XMIPP 1 (Marabini et al., 1996). The resulting 407 different volumes were averaged into 25 volumes using ML-tomo in XMIPP 1. We used the most populated 6 volumes as input models for further refinement of the structures with XMIPP 1 and RELION 1.3 (Scheres, 2012) using 65,134 additional untilted particles. The classes showing best results were subjected to further classification and gold standard refinement processes with RELION 1.3. 52 Electron Microscopy Studies of Human NuRD

3.3 Results

3.3.1 Purification and EM analysis of endogenous NuRD

I purified human NuRD complex (hNuRD) from a stable HeLa cell line received from our collaborator Michiel Vermeulen (University of Njimegen). I followed an adapted version of the published protocol (Kloet et al., 2014) and was helped by Alice Aubert from the Berger group (EMBL Grenoble). These HeLa cells carry a GFP-tagged recombinant version of MBD3 which is used as a bait for affinity purification, as described in the methods part. A double TEV cleavage site allow for easy elution from the GFP trap (Figure 3.1a). Initial purifications showed a yield comparable to that previously reported by Dr. Vermeulen (Figure. 3.1b and c). Small adjustments to incubation times and volumes in the protocol allowed considerable improvement of the yields (Figure 3.1d), but these remained considerably low (needing, after optimization, 200 to 400 150mm culture plates for each purification, yielding between 50-200 μl of purified sample at 0.1-1 mg/ml protein concentration). All the proteins known to form the core hNuRD complex were easily detected on the Coommassie stained SDS-PAGE gels with the exception of DOC1 which has a molecular weight of 12kDa. This protein is very small compared to the other subunits of the NuRD complex, which makes its observation in co-purification experiments rather difficult. However, DOC1’s presence in the complex has been previously documented by mass spectroscopy experiments (Kloet et al., 2014). The presence of the hNuRD complex subunits was further confirmed by Western blotting, using specific mouse monoclonal , which were provided by our collaborators in the Laue lab (Figure 3.1e). The presence of DOC1 could not be confirmed due to the lack of monoclonal antibodies against this protein. In order to stabilise the hNuRD complexes and reduce their heterogeneity I performed cross-linking by GRAFIX (Kastner et al., 2008) and by the lysine specific cross-linker BS3. GRAFIX involves gradient centrifugation of the protein in addition to cross-linking with glutaraldehyde, but also results in dilution of low concentration samples. Due to the naturally low levels 3.3 Results 53

Figure 3.1: Purification of hNuRD A) Cartoon of the MBD3 construct used as bait for the purification of full size endogenous hNuRD complex. For affinity purification we used anti GFP fragments immobilised on agarose beads. The MBD3 construct comprises a recombinant MBD3 tagged with N-terminal GFP. A double TEV cleavage site allows removal of the GFP. B) This gel shows hNuRD complex purified in his lab. All subunits except DOC1 are visible. C) Early purifications of hNuRD complex in our lab showed similar quality to that obtained by Dr. Vermeulen. The gel in C was loaded with approximately half the sample volume of the one in B. D) After protocol optimization the sample quality improved. All subunits except DOC1 are visible and different isoforms are easily distinguishable. The volume loaded in this gel was approximately a quarter of that loaded in the gel in B. E) Wester blots showing the purified hNuRD subunits (blots provided by Alice Aubert). MWM (molecular weight marker) units shown in kDa. 54 Electron Microscopy Studies of Human NuRD

Figure 3.2: Negative stain EM of the hNuRD complex. A) A negative stain micrograph of the hNuRD complex, obtained with an F20 microscope (FEI) at 40.000 x magnification. The scale bar is 25 nm long. The micrograph shows similar sized, well defined and abundant particles, albeit heterogeneous. B) The hNuRD 2D class averages corroborate the initial observations. The particles are well defined but heterogeneous. The average size of the particles is 20 nm, but globular and elongated shapes can be observed.

of endogenous hNuRD synthesis cross-linking with a highly concentrated BS3 solution proved to be more effective in this case. To further purify the BS3 cross- linked complexes and to prevent aggregates and the cross-linking of multiple complexes rather than intra-complex a size exclusion chromatography step was added to the BS3 cross-linking protocol. A Superose 6 size exclusion column and the Akta micro system was used for this step. As a result I obtained pure hNuRD complexes in sufficient quantity for EM studies.

3.3.2 First hNuRD models by random conical tilt

In collaboration with Dr. Manikandan Karuppasamy, I collected negative stain tilted micrograph pairs from the hNuRD sample. The micrographs obtained show clear individual particles at good concentrations (Figure 3.2a), but they also show particle heterogeneity, despite our cross-linking efforts aimed at minimising this 3.3 Results 55

Figure 3.3: First 3D reconstructions of the hNuRD complex. A)The most populated six volumes resulting from the XMIPP RCT procedure are shown in different colours. The top left and bottom left volumes appear more defined than the rest and were used for further processing. B and C) The left hand models were further processed in RELION by 3D classification and 3D refinement, to a resolution of 32 and 35 Å respectively. Front and back views are shown for each volume. Next to the3D reconstructions a comparison of their 2D projections and previously obtained reference free 2D class averages is shown. Good agreement between these suggests that the processing is consistent. 56 Electron Microscopy Studies of Human NuRD known characteristic of the complex. Preliminary 2D processing of the particles using IMAGIC-5 (Van Heel & Keegstra, 1981) and RELION 1.3 (Scheres, 2012) confirmed this heterogeneity (Figure 3.2b). I picked tilt pairs from these micrographs and used them to build the first 3D models of hNuRD using the Random Conical Tilt RCT procedure (Radermacher, 1988). This procedure relies on the fact that randomly oriented particles showing the same face in a micrograph will expose different faces when the grid is tilted. Using the untilted images of the particles these can be classified and oriented. The known tilt angle of the corresponding tilted pairs can be used to back project the images into a 3D volume. This back projection (called “back” as opposed to the projection of a view from a 3D volume into a 2D image) is possible thanks to a property of the Fourier transform, which states that the Fourier transform of a 2D projection of a 3D volume will match the central slice of the 3D Fourier transform of said volume which is perpendicular to the projection angle. From the 25 volumes obtained by this procedure I selected the most populated 6 volumes for further processing.

3.3.3 3D classification and refinement

The selected 6 RCT volumes were further processed in RELION (Scheres, 2012) with 65,134 additional particles obtained from untilted micrographs. 3D classification procedures resulted in a set of low resolution 3D models, showing both globular and elongated shapes (Figure 3.3a). The best defined two volumes were further processed. 3D refinement with RELION resulted in 3D reconstructions of 32 and 35 Å respectively (Figure 3.3b) (measured by RELION according to the gold standard refinement criterion, Scheres, 2012), showing some degree of preferential orientation in the angular distribution of the particles, but with no major missing views. 3.3 Results 57

Figure 3.4: CBP affinity purification of CHD4 and GATAD2A. A) Coommassie stained SDS gel of co-expressed CBP tagged CHD4 and wild type GATAD2A after purification by calmodulin affinity chromatography. This initially suggested that these proteins formed a stable complex. B) Anti-CBP western blot revealed signal for both tagged CHD4 and wild type GATAD2A. A third band can be detected at an apparent molecular weight of 150 kDa, which represents an nonspecific interaction of a protein in the lysates, as shown by the control using uninduced insect cells which shows only the contaminant band.. This blot was provided by Dr. Olga Kuzmina in the Berger group. C) Coommassie stained SDS-PAGE and anti-CBP western blot (WB) of wild type GATAD2a purified using caclmodulin resin. This shows that GATAD2A can bind calmmodulin on its own. D) Attempts to further purify the suspected CHD4-GATAD2A complex using glycerol gradient centrifugation revealed that the proteins elute in separate fractions, with CHD4 travelling further in the gradient (the collected fractions are displayed in the gel starting from the bottom of the gradient in the left and progressing to the top in the right) due to its larger molecular weight (highlighted by the guides). MWM (molecular weight marker) units shown in kDa. 58 Electron Microscopy Studies of Human NuRD

3.3.4 Co-expression and purification of GATAD2A and CHD4, CaM-GATAD2A interaction

Co-expression and purification of GATAD2A and CBP-CHD4 was successful, leading to two bands of the expected size after CBP affinity purification (Figure 3.4a). This initially led us to assume a direct interaction between these two proteins. However, complex formation could not be confirmed by size exclusion

Figure 3.5: Computationally predicted Calmodulin binding sites in GATAD2A. The putative Calmodulin binding sites in GATAD2a are shown in red relative to other domains and features in the protein. The intensity of the colour represents the likelihood assigned to each amino acid by the algorithm, with a minimum value of 0 (white) and a maximum value of 9 (intense red). The areas outside of the predicted CaM binding sites and the other domains are shown I black. The first putative binding site located between amino acids 330 and 360 shows a region of considerable length with the maximum possible likelihood of containing a Calmodulin binding domain. The second and third areas, between amino acids 440 and 470 and 500 and 530 scored lower in the prediction software. Both the first and second predicted domains overlap with the CR2 histone binding domain, and the first also contains some phosphorylation sites. Figure built with information from the Calmodulin binding domain predictor tool (http://calcium.uhnres.utoronto.ca/ctdb/ctdb/home.html) and the NCBI (http://www.ncbi.nlm.nih.gov). 3.3 Results 59 chromatography and glycerol gradient centrifugation. Instead, I detected the two proteins in different fractions (Figure 3.4d).

A Western blot using antibodies directed against the CBP-tag revealed that both proteins could bind CaM (Figure 3.4b) since both bands (CHD4-CBP and GATAD2A) were stained in the blot. To exclude that the second band at lower molecular weight is a breakdown product of CHD4-CBP, I next generated an expression construct for GATA2Da alone. Next, I expressed non-tagged human GATAD2A and purified it via a CBP affinity column using the same protocol as for its co-purification with CBP-tagged CHD4. In fact, GATAD2A bound to the CBP affinity column and could be detected in a Western Blot using an antibody against the CBP tag (Figure 3.4c). This suggests that we discovered a previously unknown direct interaction between GATAD2A and CaM. To study this further, I used an online tool to predict CaM-binding sites in protein sequences (http://calcium.uhnres.utoronto.ca/ctdb/ctdb/home.html). In fact, a putative CaM- binding site was detected by the programme within the highly conserved histone- binding domain of GATAD2A (Figure 3.5). The CaM-binding site is directly adjacent to the zinc finger domain of GATAD2A. Thus, it is tempting to speculate that CaM binding to GATAD2A could modulate the histone tail binding of the zinc fingers. 60 Electron Microscopy Studies of Human NuRD

3.4 Discussion

3.4.1 Possible calcium dependent NuRD regulation

Calcium is one of the major regulators within the cell, and CaM is the principal effector of calcium. In the presence of calcium, CaM folds around hydrophobic α- helixes (Babu et al., 1988) and then interacts with other proteins (Gerstein & Krebs, 1998). CaM is present in the cytoplasm and the nucleus and considered a key component of many signalling pathways including muscle contraction, neuronal excitation, cell growth and proliferation (Sorensen et al., 2013; Berchtold & Villalobo, 2014; Hell, 2014). In cardiac cells, HDAC5, a histone deacetylase from the same family as HDAC1 and 2, is recruited to the myogenic genes by MEF2a and represses the growth of muscle cells. This MEF2a-HDAC5 interaction is repressed by CaM via CaM kinase. HDAC5 is exported from the nucleus after being phosphorylated by CaMK. Furthermore, CaM directly interferes with the binding of HDAC5 to MEF2a, leading to cardiac hypertrophy (Berger et al., 2003). Interestingly, GATAD2A contains several phosphorylation sites and a regulation mechanism involving phosphorylation has been shown to allow GATAD2A to recruit HDAC2 to promote chromatin repression (Ding et al., 2017). Furthermore, phosphorylation susceptible serine residues 340 and 343 overlap with a predicted CaM binding site. Similarly, the SWI/SNF (BAF in humans) complex, a chromatin remodelling complex and transcription activator (Hu & Wade, 2012) has been shown to be activated by calcium/calmodulin (Lai et al., 2009). Intriguingly, some of the chromatin remodelling activities of SWI/SNF (BAF) show some degree of antagonism to those described for NuRD. For instance, in embryonic stem cells NuRD and SWI/SNF (BAF) directly compete for the binding sites in gene promoters, and their antagonistic effects contribute to fine tune expression of relevant genes (Yildirim et al., 2012).

The results presented above show a previously unknown interaction between GATAD2A and CaM. Interestingly, some of the described effects of SWI/SNF (BAF) suggest its functions could be antagonistic to the functions of NuRD. The described GATAD2A – CaM interaction now provides a basis for speculating about 3.4 Discussion 61 the effect of calcium in the regulation of the NuRD complex in the nucleus, which would be part of a wider calcium ion-mediated chromatin regulation network. This is an exciting possibility that has been subject of further studies in the Berger Lab and will lead to new insights into the role and the regulation of NuRD activity in the epigenetic regulation landscape.

3.4.2 hNuRD is a heterogeneous complex

The NuRD complex is a naturally heterogeneous entity. This is due to its diversity in subunit isoform composition and its conformational flexibility which may help to adjust its functions to different situations and different cell lineages. Efforts were made to constrain this conformational heterogeneity of the human NuRD complex by using mild chemical cross-linking, but the results show that the purified hNuRD is still a very heterogeneous complex. The 2D class averages and 3D reconstructions shown in figure 3.3 clearly show structural heterogeneity. This significantly complicates the process of obtaining a high-resolution 3D reconstruction of the endogenously purified complex. The low yield of the endogenous hNuRD purification complicate the production of sufficient amounts of highly concentrated hNuRD complex, as required for cryo-EM studies. The lack of previous knowledge of the NuRD complex structure (very few EM structures of this complex have been published, and all at very low resolutions; Millard et al., 2016, Schmidberger et al., 2016, Brasen et al., 2017), combined with the high conformational and compositional heterogeneity of hNuRD, means we lack sufficient information to proceed to high resolution EM studies.

Here, I was able to obtain a number of low resolution negative stain EM volumes of the human NuRD complex. Because of the difficulties posed by the compositional and conformational heterogeneity of the hNuRD complex, we decided to postpone further work with it. Instead we decided to focus future work on the drosophila melanogaster NuRD complex (dNuRD). The reason for this decision is that the dNuRD complex is less heterogeneous than its human counterpart. Each subunit of dNuRD exists in only one isoform. Therefore, we 62 Electron Microscopy Studies of Human NuRD reasoned that dNuRD would provide a better starting point for high resolution structural characterisation of a NuRD complex. 4 Electron Microscopy Studies of Drosophila NuRD

Resumé en français

Le complexe NuRD de drosophile est moins hétérogène que son analogue humain et la production de protéines dans les cellules d'insectes permet d'obtenir de meilleurs rendements. Par conséquent, nous avons décidé que la drosophile NuRD serait une cible appropriée pour les études EM. Dans ce chapitre, je présente nos travaux sur la caractérisation des complexes NuRD de drosophile. Nous avons utilisé une approche combinée purifiant le complexe NuRD endogène à partir de cellules de drosophile et reconstituant les sous-complexes NuRD de drosophile à partir de sous-unités recombinantes individuelles. Cela nous a permis d'identifier un sous-complexe NuRD stable ayant une activité de désacétylation d'histone indépendante, comprenant les sous-unités p55, de MTA-like, de MBD- like et de Rpd3 (PMMR). Nous avons obtenu les premières structures EM de ce complexe. De plus, nous avons obtenu des structures de différents sous- complexes qui, combinées aux données de spectrométrie de masse fournies par nos collaborateurs, nous ont permis de proposer un chemin d’assemblage holo- NuRD: le MTA-like agit comme une protéine d’échafaudage central, recrutant ainsi Rpd3, p55 et MBD-like le complexe PMMR. La sous-unité de remodelage des nucléosomes Mi-2, ainsi que DOC1, est recrutée dans le complexe par Simjang grâce à son interaction connue avec le type MBD. Ces découvertes ont conduit à la publication d'un article scientifique. Cependant, nous n'avons pas été en mesure de produire une structure haute résolution du complexe. Il est clair que le complexe est très flexible, ce qui complique la détermination de sa structure. 64 Electron Microscopy Studies of Drosophila NuRD

4.1 Introduction

Our analysis of human NuRD (hNuRD) revealed great heterogeneity and flexibility within the complex. In order to overcome these problems we chose to first study the drosophila homologue to hNuRD, dNuRD. In contrast to hNuRD, each subunit in dNuRD is encoded by a single gene, limiting the compositional heterogeneity of the complex. These drosophila genes encode for proteins named Mi-2 (CHD3/4 in humans), MTA-like (MTA1/2/3), p66 or simjang (GATAD2A/B), Rpd3 (HDAC1/2), p55 (RbAp46/48), MBD-like (MBD2/3) and CG18292 (DOC1). Additionally, we explored the possibility of producing recombinant dNuRD proteins using the MultiBac insect cell expression system. This greatly simplifies the expression and cell harvesting protocol, while considerably improving the yields over the levels of endogenously expressed material. With more sample available and lower levels of heterogeneity we aim to obtain a high resolution dNuRD structure that will allow us to understand the shape and characteristics of the complex, but also provide a solid base to subsequently tackle the hNuRD complex with higher chances of success. 4.2 Methods 65

4.2 Methods

4.2.1 Cell cultures

Endogenous dNuRD was purified from a stable Drosophila melanogaster S2 Schneider cell line provided by Dr. Ernest Laue. The establishment of this cell line is described in detail in published research, along with the purification protocol we used (Zhang et al., 2016). Briefly, cells were transformed with a plasmid for expression of a recombinant p55 protein tagged at its C-terminal end with a GPF tag followed by a double TEV cleavage site and a decahistidine tag. Cells were grown in ExpressFive SFM medium supplemented with 2mM Glutamine (Thermo Fisher Scientific) at 25°C in 2 L Erlenmeyer flasks orbiting at 80 rpm (Corning). The cultures were induced using 0.1 mM of CuSO4 once they reached a density of 8 million cells/ml. After 18 hours of incubation the cultures were harvested by centrifugation at 800 x g in a JA8.1000 rotor for 10 min at 4° C. Recombinant PMR (p55–MTA-like–Rpd3) and PMMR (p55–MTA-like–MBD-like– Rpd3) complexes were obtained with the baculovirus insect cell expression system MultiBac (Berger et al., 2004) using the protocol we described in Zhang et al., 2016. The Multiplication Module (Fitzgerald et al., 2006) was used to insert the synthetic genes (Genscript) encoding for MTA-like, Rpd3 and p55 into an acceptor plasmid pKL, resulting in the transfer plasmid pKL_PMR. The construct for MTA- like also contained an N-terminal 10xHistidine–3xFLAG 1xTEV tandem tag. The pSPL_MBD-like plasmid was created by inserting the gene encoding for MBD-like tagged with maltose binding protein (MBP) into an pSPL plasmid. The pLox_PMMR plasmid was created by Cre-LoxP plasmid fusion (Fitzgerald et al., 2006) of the previously described pSPL_MBD-like and pKL_PMR plasmids. Transfer plasmids pKL_PMR and pLox_PMMR were transformed into separate DH10EMBacY cell cultures carrying the EMBacY baculoviral genome. Composite baculoviruses were generated following the previously described protocols (Fitzgerald et al., 2006) and used to infect Sf21 insect cells. These cultures were used to express the corresponding protein complexes following the protocol described in chapter 3 for insect cell cultures, and the resulting cell pellets were stored at -80⁰ C for later purification. 66 Electron Microscopy Studies of Drosophila NuRD

Recombinant co-expression of the full dNuRD complex was carried out using the MultiBac baculovirus inset cell system. N-terminally MBP-tagged MBD-like and a CHD4-like expression cassettes were inserted into a pSPL plasmid and combined with the pKL-PMR construct described above (Zhang et al., 2016). A second construct containing Simjang and CG18292 was created inserting the corresponding synthetic genes (Genscript) into the MultiBac plasmid pUDCM. A complete pLox-dNuRD plasmid, encoding for all seven dNuRD subunits, was obtained by Cre-LoxP mediated plasmid fusion of the two plasmids. This final construct was inserted into the EMBacY baculoviral genome, and viruses and insect cell cultures were obtained following the protocol described above for PMR and PMMR complexes (Zhang et al., 2016).

4.2.2 Protein biochemistry

We described the purification protocols used for the different dNuRD complexes in Zhang et al., 2016.

Cell pellets previously obtained for purification of the endogenous dNuRD were resuspended in two volumes of Buffer A (10mM Hepes pH 7.5, 1.5mM MgCl2, 10mM KCl, 0.1% NP-40, and 0.5mM DTT) freshly supplemented with 0.1mM PMSF, 10μ g/ml Leupeptin (Sigma) and 10 μg/ml Pepstatin A (Sigma) and incubated on ice for 20 minutes. The cells were subsequently homogenised using a type B (tight) dounce homogeniser to lyse them. After 30 strokes the lysate was centrifugated at 1700 x g for 15 minutes at 4°C to separate the cytoplasmic extract and the nuclei. The nuclei were then incubated in extraction buffer [20 mM HEPES at pH 7.9, 420 mM NaCl, 2mM MgCl2 0.5 mM EDTA, 20% glycerol, 0.1% NP40, 0.5 mM DTT and freshly supplemented protease inhibitors (0.1mM PMSF, 10 μg/ml Leupeptin, and 10 μg/ml Pepstatin)] to lyse them. Nuclear extracts were collected by ultracentrifugation at 20000 x g for 30 minutes at 4°C and then were incubated in GFP-Trap beads (ChromoTek) during 2h at 4°C in Extraction Buffer. Next, the beads were washed 3 three times with Washing Buffer [20 mM HEPES at pH 7.9, 300 mM NaCl, 2mM MgCl2 0.5 mM EDTA, 20% glycerol, 0.1% NP40, 0.5 mM DTT and freshly supplemented protease inhibitors (0.1mM PMSF, 10 μg/ml Leupeptin, and 10 μg/ml Pepstatin)], and two times in TEV Digestion Buffer 4.2 Methods 67

(50 mM Hepes pH 7.5, 200 mM NaCl, 2% v/v glycerol, and 0.5 mM DTT). The complexes were then eluted by adding 2 μg of purified TEV protease. The eluate was subjected to GRAFIX (Kastner et al., 2008) in the presence of mild cross- linker glutaraldehyde. Briefly, 200μL of complex (70–100 μg) were loaded onto a gradient of 10-30% v/v glycerol and 0-0.15% glutaraldehyde in GRAFIX Buffer (50 mM HEPES pH 7.5, 150 mM NaCl, and 0.5mM NP-40) in a Beckman (7/16x2–3/8 P.A) centrifuge tube. The gradient was made by mixing 10% and 30% glycerol solutions in a Gradient Master (Biocomp/Wolf Laboratories). The tubes were centrifuged at 34,000 rpm for 18h at 4°C in a SW60Ti rotor (Beckman). The centrifuged gradients were fractionated into 0.2 ml aliquots using a peristaltic pump (biorad) and 80 mM glycine at pH 7.6 was immediately added to quench the cross-linking activity of the glutaraldehyde. Finally, the cross-linked complexes were subjected to size exclusion chromatography (SEC) on a Superose 6 Increase 3.2/300 column (GE Healthcare Life Sciences) in SEC Buffer (50mM Hepes pH 7.5, 150mM NaCl, and 2% glycerol).

Insect cell pellets containing the overexpressed recombinant protein complexes were resuspended in two bed volumes of Buffer A [10mM Hepes pH 7.5, 1.5mM MgCl2, 10mM KCl, 0.1% NP-40, 0.5mM DTT and freshly supplemented protease inhibitors (0.1mM PMSF, 10μ g/ml Leupeptin and 10μ g/ml Pepstatin)] and incubated for 20 minutes on ice. Next the cells were lysed by 30 strokes using a type B (tight) dounce homogenizer. Cytoplasmic extracts and nuclei were separated by centrifugation at 1700 x g for 15 minutes at 4°C. Cytoplasmic extracts were ultracentrifugated at 20000 x g for 30 minutes at 4°C before being collected. Flag-tag affinity purification was performed using anti-FLAG M2 affinity gel (Sigma, pre-equilibrated in modified Buffer A’ containing 150 mM NaCl and 10% glycerol). The cytoplasmic extracts were incubated in the affinity gel for 1.5 hours on a rolling wheel at 4°C. After washing with at least 10 bed volumes, the complexes were eluted using a 3xFLAG peptide following manufacturer’s protocols (Sigma). The eluted complexes were subjected to size exclusion chromatography (SEC) on a Superose 6 Increase 3.2/300 column (GE Healthcare Life Sciences) in SEC Buffer (50mM Hepes pH 7.5, 150mM NaCl, and 2% glycerol). Alternatively, the complexes were subjected to GRAFIX (Kastner et al 2008) procedure before size exclusion, following the protocol explained above for the endogenous complex. 68 Electron Microscopy Studies of Drosophila NuRD

In order to optimise the buffer used for purification of the dNuRD/PMMR complex I performed a Proteoplex assay (Chari et al,. 2015). In essence, the Proteoplex is a Thermofluor Stability Assay (TSA) (or Differential Scanning Fluorometry, DSF) (Boinvin et al., 2013) which uses a new thermodynamic framework optimised for complex melting curves (Proteoplex MacroDSF), such as the ones expected in the case of multi-subunit complexes. TSA is a technique widely used by crystallographers to screen for stabilising buffers for single molecule proteins. The proteins are incubated with a dye (typically SYPRO Orange) which emits fluorescence upon binding to the hydrophobic regions of proteins. This signal is measured at increasing temperatures, which gradually destabilises the protein and exposes more hydrophobic areas, and interpreted using simple Boltzmann regression, providing a reading related to the stability of the molecule. Screening with different buffers provides a measurable way to compare protein stability in such buffers. It had been generally thought that applying such an assay to a protein complex would yield convoluted curves with multiple events, rendering their interpretation impossible. However, it was shown that the framework developed for the Proteoplex can provide comprehensive interpretation of such curves (Chari et al,. 2015). In fact, stable complexes tend to show a cooperative unfolding behaviour, which translated into a single transition step melting curve. The Proteoplex assay was performed in the High Throughput Crystallisation Facility at EMBL Grenoble following the specifications of the developers (Chari et al,. 2015). This facility is equipped with a Tecan Genesis station and a TeMo 96 line pipetting system for liquid handling. The screening used a Rubic buffer screen (Newman 2004, Boinvin et al., 2013) and data was interpreted using the Proteoplex MacroDSF software provided by the developers (Chari et al,. 2015).

4.2.3 Electron microscopy

4.2.3.1 Grid preparation

Negative stain grid preparation was performed as follows: 5 μl of sample were adsorbed onto a thin carbon film for 60 seconds followed by 2% uranyl acetate staining for 30 seconds. 4.2 Methods 69

For electron cryo-microscopy grid preparation, 5 μl of sample were adsorbed onto 300 mesh 1.2/1.3 Quantifoil grids (Electron Microscopy Sciences) for 60 seconds. Alternatively the grids were coated with a thin layer of carbon support (prepared by evaporating carbon from a rod over mica plastic support and then transferring the carbon layer to the grids by suspending it in water over the grids and slowly lowering the water level) and floated onto a volume of 40 μl of sample for periods ranging from 5 minutes to 60 minutes in order to obtain different particle concentrations on the grid. Following incubation the grids where plunge frozen into liquid ethane with the help of a Vitrobot Mark IV (FEI).

4.2.3.2 Data collection

Jeol 1200 negative stain EM data collection was performed as follows: 320, 150, and 120 low-dose images of PMMR, PMR, PM (p55–MTA-like), respectively, were recorded using a JEOL 1200 EX II microscope operated at 100 kV and equipped with a 2k × 2k CCD camera (Gatan). The nominal magnification was 40,000 x.

For F20 data collection, endogenous dNuRD negative stain micrographs were collected using a F20 microscope (FEI) housed in the microscopy platform at the IBS (Grenoble GIANT campus), operated in low-dose mode at room temperature at 200 kV and equipped with a 4k × 4k CCD camera (Gatan). The nominal magnification was 40000x. 135 tilt pair micrographs (at 45 and 0 degrees) and 100 non-tilted micrographs were collected.

For T12 data collection, 122 PMMR-G [p55–MTA-like–MBD-like–Rpd3– Simjang(GATA)] negative stain micrographs were collected using a T12 microscope (FEI) operated at 100 kV, room temperature and in low-dose mode, equipped with a 2k × 2k CCD camera (Gatan), with a nominal magnification of 44000x.

Titan Krios (FEI) data collection was performed with dNuRD and PMMR. Cryo-EM micrographs were collected on a Titan Krios microscope located at the EMBL Heidelberg, operated at 300 kV and equipped with a Falcon II direct electron detector (dNuRD) or a K2 Summit direct electron detector (PMMR), with a nominal 70 Electron Microscopy Studies of Drosophila NuRD magnification of 40000x or 37037x respectively. 4296 dNuRD and 3389 PMMR movies were collected.

4.2.3.3 Image processing:

Preprocessing and initial 2D processing: Contrast Transfer Function correction and phase-flipping for the negative stain datasets were done with bctf (Bsoft) (Heymann 2001). 14280 dNuRD, 4445 PMMR-G, 4753 PMMR, 3610 PMR and 7901 PM particles were manually picked using e2boxer.py (EMAN2) (Tang et al., 2007). 2D multi-variance statistical analysis and classification using IMAGIC-5 (Van Heel & Keegstra, 1981) was performed with these particles, with an output of 200 reference-free classes in the case of dNuRD, PMMR-G and PMMR, and 100 reference free classes in the cases of PMR and PM complexes. The particles were subjected to four rounds of Multi-Variate Statistical Analysis (MSA) intercalated with three rounds of Multi-Reference Alignment (MRA).

The cryo-EM movies were motion corrected using Unblur 1.0.2 and Summovie 1.0.2 (Campbell et al., 2012; Grant et al., 2012) and particles were automatically picked using Eman2 e2boxer.py. For dNuRD 184,000 particles were manually picked using e2boxer.py (Tang et al., 2007). For PMMR 196,667 particles were collected using the gaussian autopicker included in the e2boxer.py program. CTF correction and phase-flipping were performed in CTFFIND4 (Rohou & Grigorieff 2015). Further processing was performed in RELION 1.4 (Scheres 2012).

Initial volumes: Different initial volumes for the datasets were obtained as follows: The dNuRD tilt pair micrographs were manually picked using e2RCTboxer tilt pair picker from the Eman2 software (Tang et al., 2007), resulting in 15,418 particle pairs. These were classified in 2D and turned into 300 reference free class averages. 96 of these were selected for Random Conical Tilt (RCT) reconstruction. XMIPP-ML-tomo (Marabini et al 1996) created 6 average 3D volumes from this data. Subsequently, dNuRD initial volumes were manually selected by class size and definition of the volume. Additionally, a hNuRD volume filtered to 60Å was used as initial volume for the endogenous dNuRD processing to have a second reference volume. In the case of PMMR-G a previously obtained PMMR volume was used. Initial volumes for the PM, PMR and PMMR complexes where produced 4.2 Methods 71 independently using EMAN2 (Tang et al., 2007) and SPIDER (Frank et al., 1996). These volumes where compared to discard the effects of model bias and evaluate the consistency of our methods. The script used to build the SPIDER models was custom written by me (appendix 1) and is based on a script previously developed by Jaime Martin-Benito for reconstruction and refinement of helical dihedral structures (Arranz et al., 2012). Briefly, this script generates a initial noise volume from the data provided. Particles are assigned orientations randomly and back projected into a volume. This stochastic procedure will result in a reference free and model bias free initial volume, but at the same time will allow to infer some basic information about the complex. Provided a dataset of accurately centred particles, the result of will be a noisy sphere of the approximate diameter of the complex. In the case of compact, globular proteins with an evenly distributed density the resulting sphere will also show an even distribution of its density. For the opposite case, such as elongated fusiform or otherwise complex shapes with uneven mass distribution, the resulting volume will feature two distinguishable concentric density thresholds, one highly concentrated sphere with a diameter matching the shorter dimension of the particle and one less dense sphere with a diameter comparable to that of the largest dimension of the particle. More complex shapes will result in extra thresholds of this type, and, additionally, a reducing density gradient will be observable as the radius of the volume increases, due to the increasing ratio between surface and angular shift between projections, which leads to higher spread of the densities existing in the particles. These threshold and gradient effects rise from an effect similar to the rotational averaging often used in image processing programs in order to simplify 2D data (for example, SPIDER (Frank et al., 1996) has several commands that involve this procedure). Simplifying, each pixel in a 2D image is classified according to distance to the centre of the image (or its radius) and arithmetically averaged with all the pixels at the same distance. Substitution of each pixel with its corresponding averaged value results in a series of concentric circles that represent a rotational average of the densities in the particle. These circles have the same value across all of their circumference, so the 2D image can be simplified into a unidimensional string of density values with length equal to half the particle diameter, effectively reducing the information to process to half of the square root of the original. The main difference with my volume generating script is that the later doesn’t generate an 72 Electron Microscopy Studies of Drosophila NuRD arithmetic rotational average of the volume, but relies in the randomized back projection of a number of particles, resulting in a statistical type of averaging, which will increase in accuracy directly with the number of particles. While this option was not explored during the current work, application of a 3D arithmetic rotational averaging to the volume and reducing the information to a unidimensional string is technically possible and would allow operating with a data volume of half of the cubic root of the original, opening the possibility of in house customisable simplified processing to complement the well established processing programs which often offer solid results but leave little room for user interference. Once the noisy initial volume is created the script offers the option to iteratively refine the result, effectively becoming a volume reconstruction and refinement tool. During each iteration the starting volume is projected in a set of predefined angles (adjusted for particle diameter and the resolution desired), particles are cross- correlated against these projections and back projected into a volume using the angles provided by the highest cross-correlation match. As the first iteration uses a noise sphere as initial volume the cross-correlation process can’t rely in any real feature of the images to assign reliable values, effectively creating an additional layer of randomness to further avoid possible model bias. Only iterating through this stochastic matching process will some shapes become defined and provide a basis for model reconstruction. Additionally, the script provides the option for independently refining two halves of the dataset in each iteration, allowing gold standard refinement, Fourier Shell Correlation (FSC) for resolution estimation and to accordingly filter the volume to prevent over-fitting.

2D processing: Further 2D image processing was performed using RELION 1.3 (Scheres 2012). In addition to the built in 2D classification and particle sorting RELION features I classified and selected particles based on the stability shown across the RELION classification using a bash script I specifically wrote for this purpose (appendix 2). In essence, this script reads the movements of each particle through the classification iterations, which are stored as metadata in data.star files created by RELION. Each particle is assigned a stability score following an equation designed to weight different aspects useful to evaluate stability: stabilization in a class as the classification process proceeds, length of the periods 4.2 Methods 73 of stability and continuity of periods of stability. The equation is composed of three main parts, which are explained below:

As explained below in detail, X represents the number of the iteration currently analysed, Y represents the number of iterations in which the particle has remained in the same class without breaking continuity and Z the number of interruptions of that continuity without producing a break. In case the stability conditions for each part of the equation are met the script will process it and add the value derived to the total stability scoring of the particle being analysed. The first part of the equation is a simple continuity check of the stability of the particle, and it will be processed as long as the particle is classified into the same class as in the previous iteration. The score assigned to the particle will be four times the square power of X, X being the iteration number analysed. This implies that the relevance of the stability is low in the first few iterations, when the particles still move between poorly defined classes, and rises exponentially as the classification proceeds and the particles become stabilised in higher quality classes. The second part of the equation addresses the length of the stable periods. For each iteration that the particle has stayed in the same class the score (Y) will increase by one and will afterwards be multiplied by the number of the current iteration (X), adding the result multiplied by four to the stability score of the particle. This part of the equation will be processed in each iteration and its stable iteration count will be reset to zero when the continuity of the stable period is broken. When a particle is classified to a class different than it was in the previous iteration the stable period will not be immediately considered broken, but nothing will be added to the particle score, and the iteration count will not increase (represented in the function by -Z). The period will be considered continuous if the particle returns to its previous class in the following iteration, but if it remains in a different class for at least two iterations it will be considered broken and the count will be reset to zero. The third part of the equation adds a minor correction for particles not stabilised in a class, but that move between few similar classes. It will be processed if a particle after moving from one class to other, moves back to the original class in the following iteration. It adds the squared power of the iteration number to the score in case the 74 Electron Microscopy Studies of Drosophila NuRD conditions are satisfied. The ratio of four to one between the two first parts of the equation and the last was arbitrarily determined after empirical testing of the equation. Each part of the equation can be multiplied by different arbitrary factors to change its relative relevance in the final result. For a particle being completely stable across the classification the maximum score can be calculated and graphically depicted (Figure 4.1a) as follows:

In order to simplify the calculations the equation was considered a continuous function, and thus was resolved using integrals. For a discrete function summands would be used. A continuous function can be defined as a discrete function with an infinite number of points, and therefore the calculation using integrals would apply [simplified, this is the same principle behind the definition of integrals using the Riemann integral as a set of Riemann sums (one of the commonly accepted ways of defining and calculating integrals) with a partition size of zero (Hughes-Hallett et al., 2005)]. For the case in hand, this would mean a classification with an infinite number of iterations, also implying that the simplified calculation with derivates becomes more accurate with increasing number of iterations. At the end of the process the results can be visualised (Figure 4.1b) and a second script offers the possibility of applying an arbitrary or a statistical threshold to discard unstable particles and write a new star file with the remaining particles that stably contribute to a discrete class. This method, although experimental, provides a novel way of particle sorting and allows for more efficient use of the information that can be extracted from the well established classification routines of RELION, both 2D and 3D.

Focused classifications were performed with RELION following the workflow proposed by (Bai et al., 2015). The aim of this procedure, similar to masking, is to 4.2 Methods 75

Figure 4.1: Graphical representation of particle sorting according to classification stability. a) The curve represents the maximum score each particle can be granted (y axis) in each iteration (x axis). Dividing the curve in quarters shows the increasing weight of the later iterations in the final score, represented by the area below the curve (calculated by integrating the function and shown in percentages). b) Representation of the sorting results (taken from actual data for the PMMR complex). The x axis shows the final score of the particles and the y axis the cumulative number of particles with each score. This provides a quick visual guide of the general stability of the classification and allows to choose a threshold below which particles can be discarded. isolate an area of interest from a volume to allow for the reconstruction process to be optimised for this area, ignoring densities outside of it. Masking is a simple method of achieving this effect, which multiplies the initial and subsequently reconstructed volumes by a binary mask of ones (areas of interest) and zeros. Usually the edges of this mask are smoothed (assigned decreasing values from 1 to 0) to avoid a harsh transition, which creates sharp edges in the output volume that would otherwise impair the reconstruction and the resolution estimations. This effectively creates a smaller area of data surrounded by blank space. This is effective in isolating the area of interest, but has some drawbacks such as the possibility of the noiseless area and the edges affecting the alignment and the fact that different areas of the experimental images might align to the area of interest, potentially leading to wrong interpretations. Focused classification is in practice an upgraded version of masking. A binary mask with smooth edges is used to calculate both the area of interest and the areas to be discarded in the volume. Using the information from previous rounds of alignment the areas of the volume to be discarded are back-projected into 2D images and subtracted from the experimental images. This creates a new dataset which conserves the signal contributing to the areas of interest in the volume and the background noise across all the image, but lacks the signal for the discarded areas. This fixes some 76 Electron Microscopy Studies of Drosophila NuRD of the major disadvantages of the masking technique, as there is no signal other than the selected area to be aligned and the surrounding noise is conserved. The drawback of this technique is that the subtraction needs to be very accurate in order to avoid artefacts. Small misalignments of the particles relative to the volume reconstructed might lead to faulty subtractions, and therefore this technique will show optimal results when the input volume is already a high-resolution volume. Focussed classification is a particularly useful tool when dealing with discrete conformational and compositional flexibility, and is the basic principle behind extended protocols such as multi-body refinement.

Refinement and post-processing: Volume refinement and post-processing was performed in RELION. This included Gold Standard reconstruction based resolution calculation and filtering as well as high resolution noise substitution techniques (Chen et al., 2013). This last procedure is done to address the overestimation in resolution caused by tight masks around the volumes, which are routinely used to show the best possible resolution estimates. High resolution noise substitution is based in the randomization of the phases of the data signal above a frequency threshold, effectively turning the volume into random noise at the resolution levels involved. It is assumed that any cross-correlation found between half maps treated like this is either spurious or forced by a tight mask driving alignment of otherwise featureless noise. Therefore, the cross-correlation level of the phase randomised volumes is treated as a baseline for the cross- correlation calculation of the original volumes, truncating the FSC curve whenever it becomes statistically irrelevant. This might result in loss of estimated resolution for volumes not resolved to highest possible real resolution, but it is an important step to avoid over-fitting of masks and overestimation of resolution values. 4.2 Methods 77

4.3 Results

4.3.1 Purification and EM analysis of endogenous dNuRD

I expressed the endogenous dNuRD from a stable Drosophila melanogaster S2 Schneider cell line provided by our collaborators in Laue group. This cell expresses a modified p55 protein carrying a GFP tag which I used as bait for affinity purification of the full complex. After incubating the cell extracts with the GFP-trap beads I eluted the complexes by TEV protease cleavage. Expression in insect cells resulted in significantly higher protein yields than those obtained for hNuRD, which allowed me to attempt more demanding purification procedures in terms of protein amount and concentration. The eluted recombinant complexes were therefore subjected to mild cross-linking with glutaraldehyde during gradient centrifugation (GRAFIX) (Kastner et al 2008). I concentrated the gradient fractions containing the complexes and further purified them by a size exclusion chromatography step. Coommassie-stained SDS-PAGE gels showed a good amount of clearly distinguishable dNuRD complex (Figure 4.2a) from a clean peak that eluted at 1.33 ml of retention volume (Figure 4.2b) corresponding to a molecular weight of approximately 650-700 kDa according to the calibration curve of the Superose 6 (SEC-MALLS revealed a 600 kDa molecular weight for the complex, which, considering the shape and other effects making difficult weight calibration in size exclusion columns, is a close result). All the components of dNuRD can be identified in the gel except CG18292, the smallest subunit (12 kDa). However, Mi-2 seems to be mostly lost during the purification, as well as Simjang to some degree. Mass spectrometry analysis of this sample by our collaborators in Laue group confirmed that the complexes consistently comprised a set of MTA-like, RPD3, MBD-like and p55 copies, while Mi-2, Simjang and CG18292 were present in sub-stoichiometric amounts (Zhang et al., 2016). Interestingly, I was able to purify roughly equivalent amounts of dNuRD complexes from the cytosolic and the nuclear extracts, despite the complex being expected to 78 Electron Microscopy Studies of Drosophila NuRD

Figure 4.2: Purification and EM processing of the endogenous dNuRD sample. A) We were able to purify dNuRD from both the nucleus and the cytosol. Clear bands at the expected size for all the components except can be seen, even though Mi-2 is partially lost. In the cytosolic sample some cytosolic contaminant can also be observed, which could be excluded in further purification steps. SEC profile figure was extracted from Zhang et al., 2016. B) The size exclusion chromatography profile shows a clean peak for dNuRD at 1.33 ml retention volume. C) 2D class averages obtained with imagic show well-defined but heterogeneous particles, similarly to what was observed for hNuRD. D) First volumes obtained by random conical tilt. The elongated volumes were the most populated, and therefore used for further processing. E) We obtained a consensus volume of dNuRD with reported 32Å resolution. This volume shows defined but low resolution density that can accommodate the different subunits that build dNuRD, but lacks the detail to accurately fit them. 4.3 Results 79 localise mainly to the nucleus, the cellular compartment where it performs its functions.

In collaboration with Dr. Manikandan Karuppasamy, I first collected a negative stain dataset of endogenous dNuRD. Preliminary processing of the dataset mirrored the initial hNuRD processing. 2D averages showed similar shapes to those observed in the human NuRD complex, albeit slightly smaller and less heterogeneous, while still being quite variable (Figure 4.2c). Familiar elongated and L shapes can be seen among the 2D class averages. I used random conical tilt reconstruction in XMIPP-ML-tomo with particles picked from tilt pair micrographs to obtain a set of initial volumes for dNuRD (Figure 4.2). I selected the most populated volume, an elongated shape, for further processing. Alternatively using one of the elongated hNuRD volumes as starting model for the processing led to similar results. After further refinement in RELION I obtained a volume with an estimated resolution of 32Å (Figure 4.2e). From this point we started the Cryo-EM workflow in order to obtain higher resolution.

We obtained a Cryo-EM dataset of dNuRD with the Titan Krios high end microscope located in the EMBL Heidelberg electron microscopy platform (Figure

Figure 4.3: dNuRD cryo-EM. a) We collected a dNuRD cryo-EM dataset from a Titan Krios microscope. The micrographs showed well distributed particles of the expected size for the complex b) After initial processing and 3D refinement I obtained a consensus volume with a reported resolution of 29Å, similar in size to those previously observed but with a more defined L shape. c) After further cleaning the dataset of the less stably classified particles I was able to obtain a new volume with 24.9Å resolution. 80 Electron Microscopy Studies of Drosophila NuRD

4.3a). We processed the micrograph movies as explained in the methods section above and manually picked a set of 184k particles. After preliminary 2D and 3D classification and refinement in RELION of the complete dataset in a single class we obtained what we called a consensus volume of dNuRD. I further processed this volume by 3D classification in 5 classes and refinement of the better defined and most populated classes. The resulting volume reached 29Å resolution (Figure 4.3b). However the 3D classification had not revealed mayor differences between the different classes or helped significantly improving the quality of the 3D reconstruction. In order to improve the result I decided to further purge the dataset of suboptimal particles. To achieve this, after having already used all RELION classification options to get rid of bad particles, I opted for a personalised filter. I wrote a novel computational procedure to classify the particles according to their stability thorough the 2D or 3D classification procedures. Full details are can be found in methods and appendix 2 sections. Shortly, this algorithm contained in a relatively simple bash script reads RELION outputs and studies the behaviour of individual particles across the iterative process of classification. Particles that stay in the same class for many iterations, specially towards the end of the classification, are deemed more stable and thus more suitable for reconstruction. Applying this method to the dNuRD classification process allowed me to improve the nominal resolution of the final volume from 29Å to 24.9Å. (Figure 4.3c) This proved the newly written method to be an effective tool of particle classification and prompted me to routinely add it to my RELION workflow. However, it is worth noting that it’s relative effectiveness will inevitably decrease as the resolution of the volumes improves, as datasets capable of yield high resolution results are composed of inherently stable particles.

The results I obtained from the endogenous dNuRD datasets remained in a range of low resolution. We suspected that large amounts of conformational and compositional heterogeneity remained in the complex to interfere with obtaining higher levels of definition by single-particle analysis. As a first step to control these issues our collaborators in the Laue group purified a recombinant full NuRD complex from insect cells. However, this complex was proved to be similar to the endogenous dNuRD in its limitations. Once again mass spectrometry analysis revealed that most of the purified complexes were in fact dNuRD subcomplexes, which we subsequently named PMMR after their individual components (p55, 4.3 Results 81

MTA-like, MBD-like and Rpd3). Mi-2, Simjang and CG18292 were again present in sub-stoichiometric amounts, while p55, being the bait used for purification in this case, was present in abundant excess (Zhang et al., 2016). We therefore decided to first characterise the different dNuRD subcomplexes in order to uncover their structure starting from their basic components.

4.3.2 Purification and EM analysis of PMMR-G, PMMR, PMR and PM subcomplexes.

Mass spectrometry studies of our purified dNuRD samples showed that most of the holo-dNuRD complex were in fact subcomplexes comprising p55, MTA-like, MBD-like and Rpd3 subunits. The stoichiometry of these complexes was estimated to be of 2:2:1 for MTA-like, Rpd3 and MBD-like subunits, while p55, the purification bait, was present in excess (Zhang et al., 2016). Following this discovery we speculated that these could be physiologically relevant subcomplexes of NuRD. Working with smaller and less complex protein assemblies could also help us mitigating the adverse effects of the heterogeneity and flexibility, so we decided to recombinantly express and study these subcomplexes. Considering that p55, MTA-like and Rpd3 have connections between them already described in the literature (Alqarni et al., 2014; Millard et al., 2016) we opted for a PMR complex and a PMMR (the full species we had characterised from our purifications) for recombinant expression. The plasmids for this, described in the methods section, were obtained from our collaborators in Laue group.

In collaboration with Alice Aubert I transfected the PMMR construct into Sf21 insect cells to overexpress it using the MultiBac system. Meanwhile our collaborators in Laue group did the same with PMR. The complexes were subjected to affinity purification using a Flag-tag located in the N terminus of MTA-like. After Flag purification they the PMR and PMMR complexes were subjected to size exclusion chromatography purification using a S6 Increase 3.2/300 column. Alternatively, a GRAFIX (Kastner et al 2008) step was added before the size exclusion chromatography step. The yield of the purification was good and we observed bands for all the expected subunits in a Coommassie 82 Electron Microscopy Studies of Drosophila NuRD 4.3 Results 83

Figure 4.4: Purification and negative stain EM of dNuRD recombinant subcomplexes. (page 82) a and b) Gels show good amounts of purified PMMR and PMR, displaying all the expected subunits. Along with PMR, the byproduct of the purification, PM, can be seen. c) Both PMMR and PM showed clean size exclusion chromatography peaks at 1.34 and 1.35 ml retention volumes respectively. This reveals an effective size similar to that observed for dNuRD. d) Negatvie stain datasets were collected for the four subcomplexes studied, PM, PMR, PMMR and PMMR-G. While the three larger complexes show similarly sized particles, PM shows smaller more abundant particles. e) The 2D class averages further confirm the similarity of the subcomplexes and endogenous dNuRD. PM differentiates itself from the group with its smaller size, approximately half of the size of PMR, while dNuRD and PMMR-G appear only marginally bigger than the other complexes. Figures show for PM, PMR and PMMR were extracted from Zhang et al 2016.

stained SDS-PAGE gel (Figure 4.4a and b). Both PMMR and PMR eluted in clean peaks at 1.34 and 1.35 ml retention volumes, respectively (Figure 4.4c). This further convinced us that the species purified from the endogenous dNuRD sample were equivalent to those from the PMMR sample, since the retention volumes match, despite holo-dNuRD supposedly carrying an Mi-2 subunit of considerable size. Interestingly, in the cases were PMMR or PMR was not cross- linked an extra peak containing only p55 and MTA-like subunits appeared in the SEC profile. The interaction between these two subunits is well characterised in the literature (Alqarni et al., 2014). This led us to suspect that this side product subcomplex, which we named PM, could also be physiologically meaningful, and therefore we decided to collect it and further process it in parallel to PMMR and PMR.

I prepared negative stain EM grids of all the subcomplexes and collected small datasets for each using a Jeol 1200 microscope (Figure 4.4d). I manually picked 4753 PMMR, 3610 PMR and 7901 PM particles and subjected them to 2D classification (Figure 4.4.e) in IMAGIC-5 (Van heel & keegstra, 1981). I obtained initial volumes of the complexes using EMAN2 (Tang et al., 2007). Additionally, I created alternative initial volumes using a custom-made spider script (see appendix 1). This script first constructs a noise sphere from the data itself and then iteratively refines it against the particles until a 3D volume is created and stabilised. I choose this stochastic approach to address possible model bias and consistency of the initial volumes obtained. For PMMR. I also used the previously obtained dNuRD model as an additional starting volume, based on our 84 Electron Microscopy Studies of Drosophila NuRD experimental findings indicating that purified dNuRD and PMMR were equivalent species. I further processed these volumes by RELION 3D classification and refinement, and I made an extensive comparative analysis.

4.3.2.1 The PMR volume can accommodate two PM models

The refined models for PMR (Figure 4.5b) and PM (Figure 4.5.c) reached 31 and 26 Å, respectively. PMR shows clearly divided and equivalent upper and lower halves. If these halves are extracted from the full volume and compared to each other they show a cross-correlation value of 0.96 [measured by the fitting and cross correlation measurement tool integrated in (Pettersen et al., 2004)]. Interestingly, the densities close to the point where the connection between halves is made can comfortably fit a structure of a MTA-like and Rpd3 dimer as resolved by X-ray crystallography (4BKX) (Millard et al., 2014). I performed focused classification with RELION in this central section of the volume. While the reported resolution went down to 36 Å due to the post-processing filters designed to mitigate mask related overestimations, the resulting volume closely resembles the published Rpd3–MTA-like structure (Figure 4.5d). This strongly suggest that two MTA-like subunits and two Rpd3 subunits, linked together thanks to the dimerization domain of MTA-like, form a structural core for the PMR complex. We refer to this central part of the volume as its backbone. The p55 subunits would be located at both extremes of the PMR volume, interacting with MTA-like but separated from its dimerization domain due to a set of intrinsically disordered domains. MTA-like possesses two p55 interacting domains according to the literature (Alqarni et al., 2014), so we inferred that, in agreement with mass spectrometry experiments, PMR comprises four copies of p55, i.e. two per MTA- like copy.

The model for PM shows a much smaller density (Figure 4.5.c). This volume, oriented properly, closely resembles each half of the PMR volume. Measured with Chimera the two halves show a cross-correlation coefficient of 0.9. The main observable difference is a missing density in PM, which could represent the Rpd3 missing in this complex. Considering that these volumes were obtained in completely independent refinements, this represents good evidence that PMR is 4.3 Results 85

Figure 4.5: 3D reconstructions of dNuRD recombinant subcomplexes a) PMMR shows an L shape generally matching that observed for dNuRD. b) PMR looks smaller in comparison, but is better defined. It appears built by two symmetric halves that dimerise in the middle of the complex (the backbone). c) PM amounts for less than half of the density of PMR and strikingly resembles the shape of each end of PMR, matching our speculations about its location within the complex (see text). d) Focused classification of the backbone of PMR led to a volume matching the shape of a Rpd3 - MTA-like dimer. To the right an X-Ray crystallography structure for this complex (4BKX) is shown fitted within the PMR volume. e) Processing of PMMR-G led to two different compatible volumes, similar in size to those observed for endogenous and recombinant dNuRD but with some extra density that could account for the Simjang subunit. This could represent different possible conformations of dNuRD. comprised of two copies of PM with the addition of a Rpd3 subunit to each, forming a dimer around the dimerization domain of MTA-like.

4.3.2.2 PMMR shows additional density compared to PMR

The PMMR volume obtained from the negative stain EM dataset shows an elongated shape that resembles that of PMR, although slightly bigger and showing 86 Electron Microscopy Studies of Drosophila NuRD some extra densities (Figure 4.5a). This extra density likely accounts for the additional MBD-like subunit present in this complex, but the resolution level at this point didn’t allow for an accurate localization of the different subunits within the density. Also, conformational changes within the PMR complex upon MBD binding cannot be excluded. This volume didn’t reach the refinement levels of PMR and PM, remaining at 60 Å resolution. However, being the most complete dNuRD complex stably purified, we decided to optimise the PMMR purification for Cryo- EM studies in order to obtain a more accurate picture of its composition and structure.

4.3.2.3 PMMR-G reveals the possible location of Simjang relative to the PMMR complex

The PMMR complex is essentially a PMR complex with an additional MBD-like subunit. Similarly, the PMR complex represents two PM complexes combined with one Rpd3 subunit each and linked together. Reconstructing volumes for each of these complexes is an optimal strategy to pinpoint the location of the representative subunits of each of them, assuming no major rearrangements upon binding of the additional subunit. As indicated above, PMMR was the most complete dNuRD complex that could be purified intact from the cell extracts. In order to keep sequentially constructing the picture of holo-dNuRD we decided to reconstitute the holo-complex by independently purifying and adding the remaining subunits Mi-2, Simjang and CG18292. Simjang (homologue of human GATAD2A) is known to closely interact with MBD thanks to a coil-coiled domain (Gnanapragasam et al., 2011). This led us to hypothesize that Simjang could play a bridging role between NuRD’s histone deacetylase and the nucleosome remodelling activities, contained in Rpd3 and Mi-2 respectively. Thus, we choose Simjang as the first additional subunit to add to PMMR. Simjang was independently purified by Alice Aubert and added to PMMR purified according to the protocols described in the methods section.

I collected a negative stain dataset using a T12 microscope (Figure 4.4d). From 112 micrographs I manually picked 4445 particles, which were used fro 2D classification in IMAGIC-5 (Figure 4.e). A PMMR volume filtered at 60 Å was used 4.3 Results 87 as initial model for 3D classification in RELION. The classification led to two classes compatible with the known structural information for the complex (Figure 5.e). Both classes, with a nominal resolution of 40 Å, show additional density compared to PMMR volumes. However, further work is needed in order to improve the quality of the volumes and allow precise localisation of the different subunits, including Simjang.

4.3.3 Cryo-EM analysis of PMMR subcomplex

In order to obtain high-resolution Cryo-EM data for the PMMR complex I tried to further optimise the purification process of the complex. To reach this end I performed a Proteoplex assay with the PMMR sample (Chari et al., 2015). This assay is aimed at optimising the buffers used for purification, with protein stability as the first priority. Adapted from the Thermofluor stability assay used in crystallography, Protepoplex uses a fluorescent dye to report the stability or thermal denaturation of protein complexes. The complexes are exposed to increasing temperatures while in different buffer conditions. A statistic framework allows easy comparison of the different melting curves in order to find out which buffers allow the complex to maintain stability in harsher conditions. I performed this assay using a Rubic buffer screen (Newman 2004, Boinvin et al., 2013) to test different buffer conditions for the PMMR complex, and after the initial screening I included some custom-made buffers to screen more precise conditions. Regrettably, the buffer conditions highlighted during the assay were already close to the conditions used for purification: pH levels around 7.5 and low salt levels were identified as providing optimal thermal stability to PMMR. Other buffers such as acetate and phosphate buffers with higher levels of salt (250-350 mM) seemed to displace the melting curve indicating greater heat resistance, but on closer examination we found that these curves had two or additional steps, indicating that individual subunits were dissociating from the complex. In summary, the Proteoplex assay did not provide useful information which could have allowed us changing our buffers, but it did validate the ones we were already using. We did minor optimizations to the protocols in order to obtain optimal yields, but we resumed purification of the complexes with largely unchanged methods. 88 Electron Microscopy Studies of Drosophila NuRD

In collaboration with Dr Manikandan Karuppasamy I prepared the PMMR sample for cryo-EM and froze grids for data collection. Using a Titan Krios equipped with a K2 Summit direct electron detector (EMBL Heidelberg) we collected a set of 3,389 micrograph movies (Figure 4.6a). These were motion corrected and averaged using Unblur 1.0.2 and Summovie 1.0.2 (Campbell et al., 2012; Grant et al., 2012) and CTF corrected using CTFFIND4 (Rohou & Grigorieff 2015). I used the gaussian automatic picker provided by the Eman2 program e2boxer.py (Tang et al., 2007) to pick the particles. After manual curation of the dataset I was left with 197k particles, which I further processed in RELION 1.4 (Scheres 2012).

4.3.3.1 2D class averages of PMMR reveal high-resolution secondary structure elements.

I classified the PMMR particles in 200 2D classes and applied the previously explained script for particle sorting based on their stability along the classification. After the second iteration of classification and cleaning of suboptimal particles, the 2D class averages showed an array of different conformations, making it obvious that conformational heterogeneity was still present in the dataset. However, the classes showed well defined features and some of them revealed secondary structure details matching those of published structures (Figure 4.6b). On close examination it could be seen that many of these detailed class averages depicted the structure of a RPD3–MTA-like dimer. Surrounding this well-defined structures diffuse areas of poorly defined signal could be seen. Other 2D class averages showed two separated well defined dots, roughly the estimated size of a p55 dimer, connected by a diffuse stripe of blurry signal. These effects are likely created by signal originated from parts of the complex that become selectively misaligned. A common cause for this is the persistence of heterogeneity and flexibility within the 2D classes. 4.3 Results 89 90 Electron Microscopy Studies of Drosophila NuRD

Figure 4.6: PMMR Cryo-EM and reconstructions (page 89) a) PMMR micrographs from a Titan Krios microscope showed well-distributed particles we recognised as PMMR complexes b) Preliminary 2D classification showed several classes with recognisable secondary structure features resembling the Rpd3 – MTA-like backbone. Below a projection of the 4BKX structure portraying HDAC1 and MTA1 is shown for comparison (Please note: scale is not the same). c) I obtained a 18Å resolution 3D volume for PMMR. The different subunits can be fit inside this volume, however, it doesn’t show the sharp features observed in the 2D class averages. d) Using the angular information from the 3D reconstruction I re-projected masked 2D particles showing only the Rpd3 – MTA-like density. Surprisingly, the resulting volume was not a single rod like shown in the 2D classes, but a superposition of two of them (shown in orange), hinting misplacement of a particle population. e) Using a subgroup of particles realigned using the backbones as reference I was able to perform focused refinement on this part of the complex. The resulting volume is show, front and side, with the 4BKX structure fitted inside. The best fit was obtained by treating the different domains as separated rigid bodies to allow some flexibility between them. f) Classification of the particles in several classes revealed great heterogeneity remaining in the dataset. Most classes still show the general arrangement seen before in negative stain EM and a backbone, but several lose densities in the bottom and the top, showing flexibility in those parts.

4.3.3.2 3D reconstruction of PMMR.

I selected the better-defined PMMR 2D classes and proceeded to 3D processing in RELION with the particles they contained. Initial 3D classification in a single class resulted in a consensus volume reaching 22 Å resolution. I used alignment angles and shifts obtained from this reconstruction to recentre and re-extract the particles. After a second round of classifying and cleaning the dataset I was able to reconstruct and refine a new PMMR volume (Figure 4.6c). However, it was surprising that despite having observed clear secondary structure features in the 2D class averages (indicating high resolution information) none of these were reflected in the volume, which reached only 18 Å resolution. I tried to perform focused classification to obtain better local resolutions, but the volume was not defined enough to achieve precise signal subtraction, so it didn’t deliver relevant improvement. In order to understand the causes behind this resolution loss problem I went back to the 2D classifications and started a thorough analysis of the particles and their destiny in the final volume.

First, I manually selected those 2D classes that showed a fusiform density compatible with the known shape of the Rpd3–MTA-like backbone (Millard et al., 2016). I selected both classes that closely resembled the model and showed 4.3 Results 91 secondary structures and classes that only loosely resembled it but were still compatible with it. I forced these particles into single class 2D classifications to have them align to each other and obtain average images of either the full dataset of selected subsets. Class averages containing better quality particles showed features up to the secondary structures once again. In the case of the lower quality particles the secondary structure features were lost, but the average still showed a recognisable backbone shape. I used the alignment information obtained from these classifications to create tight masks for the two 2D images. My aim creating these masks on the 2D level was to isolate the signal originated from the backbone of the particles (the stable parts showing secondary structure features) from the peripheral parts of the complex (the poorly defined cloud-like densities visible in the 2D class averages). I created the 2D masks, tight binary masks with gaussian drop edges) using Scipion (de la Rosa-Trevin et al., 2016), which offers versatile 2D manipulation tools by a combination of ImageJ image editing program and Spider scripts. I applied these masks directly to the particles by multiplying them by the mask, creating a new dataset focused in the backbone (similarly to how RELION focused classification creates new datasets with subtracted densities). Next, I investigated how these particles had been oriented in the final volume. I used the alignment information from the 3D reconstruction to project the masked particles into a 3D volume without performing a new alignment, with the intention of reconstructing the same volume except the densities erased by the mask. Surprisingly, instead of the expected fusiform density attributed to the backbone I obtained a boomerang shaped volume (Figure 4.6d). Upon inspection it was apparent that a local minimum in the alignment process had allowed for the backbone density of some of the particles to be assigned to a different position, pivoting about 120 degrees from one of its extremes. This led to a subset of particles to be differently oriented from the rest. This was a convincing explanation for the resolution loss observed in the transition from 2D to 3D reconstructions. To corroborate this, I used the masked particles for a new 3D reconstruction, but allowing new alignments and using a heavily filtered model of the backbone structure (4BKX) as initial model. This led to the expected straight backbone shape, showing that the previous boomerang shape was a processing artefact rather than features found in the experimental images. 92 Electron Microscopy Studies of Drosophila NuRD

Knowing about this misalignment problem I reprocessed the PMMR data. I used the information from the 3D classification of the masked backbone particles to give the original particles starting orientations and limited the angular sampling of RELION to avoid them to stray far from their starting point. I performed more exhaustive 3D classification in order to discard more particles and mitigate the flexibility problem. This allowed me to more accurately delimit the density corresponding to the backbone. However, the number of particles in this dataset was reduced to approximately 30,000. This, together with less particles representative of the top views of the complex, resulted in the loss of density for the more flexible distal parts of the complex and limited the final resolution I could achieve. I was able to perform focused refinement on the backbone, which led to a volume closely resembling the Rpd3–MTA-like structure. With a nominal resolution of 16.6 Å, the volume was slightly bigger than the published 4BKX structure. However, treating each protein domain as separated rigid bodies - to allow minimal movement and account for some degree of flexibility - resulted in a very comfortable fit (Figure 4.6e).

I performed one final round of 3D classifications to estimate the magnitude of the conformational heterogeneity present in the complex, both with new alignments and based on fixed projection angles from previous reconstructions. The obtained 3D classes reveal great variability, with apparent conformational changes and loss of density for peripheral domains in several cases (Figure 4.6f). 4.4 Discussion 93

4.4 Discussion

4.4.1 dNuRD complex is assembled from preformed subcomplexes

Mass spectrometry analysis of NuRD samples by our collaborators defined what we know as the core holo-NuRD complex, composed of seven different proteins (Kloet et al., 2014). However, our purifications of endogenous NuRD from Drosophila S2 Schneider cells consistently show a complex comprising only a subset of its constituent proteins (p55, MTA-like, MBD-like, and the deacetylase Rpd3) with a well-defined stoichiometry. The three remaining subunits, Simjang, the ATP-dependent helicase CHD4 homologue Mi-2, and the Drosophila DOC1 homologue CG18292, in contrast, are lost during purification, hinting at a more transient attachment to the PMMR complex. PMMR seemingly interacts with Simjang, Mi-2 and Doc1 punctually to bring together the two observed enzymatic functions of the holo-NuRD complex (histone deacetylation and chromatin remodelling). This interaction likely happens through the coiled-coil association previously described between Simjang (GATAD2A) and MBD-like (MBD2) (Gnanapragasam et al., 2011). This is further suggested by the fact that attempts at finding an association between PMMR and MI-2 have failed, despite previous mass spectrometry analysis placing them together in the core NuRD complex. Our findings have been confirmed by comparison of the endogenously purified material with PMMR reconstituted in vitro from recombinant proteins obtained with the Multibac system. These complexes show a clear composition compatible with structures previously described in the literature: A dimeric complex composed of four copies of p55, two copies of MTA-like and two copies of Rpd3. MBD-like is the only protein to break this apparent symmetry, with a single copy present in our purifications. Moreover, we isolated smaller subcomplexes composed of just two subunits, p55 and Rpd3. Additional biochemical studies by our collaborators in Laue group showed that endogenously produced dNuRD (naturally occurring PMMR) and recombinantly generated PMMR and PMR have functional histone deacetylase activity, but no chromatin remodelling activity, again indicating loss of 94 Electron Microscopy Studies of Drosophila NuRD

Figure 4.7: dNuRD is built from preformed subcomplexes. Our findings suggest that dNuRD is assembled, at least partially, in the cytosol. PM and MR combine to form PMR, which later recruits MBD-like to create PMMR. Thanks to their coiled-coil interaction, MBD and Simjang are speculated to form a bridge between the PMMR and Mi-2 portions of the holo-dNuRD complex, housing the histone deacetylase and nucleosome remodeling activities respectively. Simjang, Mi-2 and CG18292 interact in a yet to be described fashion. This assembly pathway was published by our collaborators and us in Zhang et al., 2016. (Figure extracted from Zhang et al., 2016). the Mi-2 subunit. Considering all these findings together, we hypothesize that the holo-NuRD complex is built from preformed sub-modules (Figure 4.7). PM and MR, both with well characterised interactions, come together to form the catalytically active PMR. MBD-like joins to form PMMR and allow further interaction with Simjang. The addition of Mi-2 and CG18292 to the PMMR-G would complete the core holo-dNuRD complex. The fact that we were able to purify dNuRD complexes from the cytosol means that part or all of this assembly process happens before the complexes are imported into the nucleus where they carry out their known chromatin-related functions. It also hints at possible cytosolic functions. Together with our collaborators we published these findings and the proposed holo-NuRD assembly path in Zhang et al, 2016. 4.4 Discussion 95

4.4.2 dNuRD subcomplexes reveal the location of the subunits within the complex

Despite our efforts and efforts in competing laboratories, a high-resolution structure of NuRD remains elusive due to its intrinsic flexibility. However, for dNuRD we obtained several negative stain and Cryo-EM low and mid resolution structures of the sub-modules that compose the complex: PM, PMR, PMMR and PMMR-G. These models provide the first architectural insights into the complexes: Two PM modules come together to form the known structure of the PMR backbone, a symmetric dimer of Rpd3 and MTA-like (Millard et al., 2014). MTA-like acts as the basic scaffold for the PMR complex, providing both the dimerising surface and the interface for recruitment of the other subunits. The literature and our findings suggest that the four p55 copies in the complex, separated by intrinsically disordered domains in MTA-like, are peripheral subunits and to some degree remain flexible within the PMR complex. The addition of MBD-like defines the PMMR complex. Our PMMR reconstructions, compared to PMR, show an additional density which we identified as the location of MBD-like. Reconstructions of endogenous dNuRD, while completely independent of those of PMMR and PMR, showed a very similar overall structure, in agreement with the loss of Mi-2, Simjang and CG18292. Close comparison of dNuRD and PMR models highlights the likely location of MBD-like, and also reveals some apparent conformational changes in the peripheral regions associated to p55 subunits, underscoring their flexible nature (Figure 8.a). Finally, we obtained different volumes for PMMR-G thanks to the addition of Simjang to the PMMR complex. These volumes reveal different possible conformations. Superposition of PMMR and PMMR-G volumes which resemble each other most show increased density close to the area assigned to MBD-like (Figure 4.8). Indeed, the known interaction between GATAD2A and MBD2 (Gnanapragasam et al., 2011) makes this density a prime candidate for binding of the Simjang subunit. 96 Electron Microscopy Studies of Drosophila NuRD

Figure 4.8: Differences between the dNuRD subcomplexes show the localisation of the subunits composing them a) PMR (left) and endogenous dNuRD (right) volumes show very similar shapes. High resolution structures for all the PMR subunits can be fitted into both volumes, matching what is already know about their structure. Interestingly, the endogenous volume shows a well defined additional density, revealing the likely position of the MBD-like subunit b) Close inspection of superimposed PMR (blue) and endogenous dNuRD (mesh) volumes highlights the difference between them. On the top right the extra density speculated to be MBD-like is shown. On the bottom right a possible conformational difference between the two volumes can be observed, with the lower extreme of PMR appearing rotated about 90 degrees downwards compared to dNuRD. c) Comparison of PMMR (left) and PMMR-G (right) volumes shows a similar overall structure. PMMR-G shows an additional density connected to the area assumed to belong to MBD-like, suggesting that this is the location of Simjang (arrow). 4.4 Discussion 97

4.4.3 Heterogeneity and flexibility

In order to reduce the heterogeneity of the complex we decided to study drosophila NuRD instead of the human counterpart. This allowed to remove any compositional heterogeneity derived from multiple isoforms of the subunits which are present in the human complex. We were also able to identify and separately reconstitute distinct dNuRD subcomplexes with different subunit composition, further lowering the impact of compositional heterogeneity. However, our analysis reveals that considerable conformational heterogeneity is still present within the complexes. Considering the known structural details of the NuRD complex and its heterogeneous roles we conclude that this flexibility is pivotal to the physiological function of the complex. This intrinsic heterogeneity remains the main obstacle to obtain high-resolution structures of the holo-NuRD complex or its subcomplexes using currently available EM and crystallography techniques. 5 Discussion

Resumé en français

Bien qu’il soit l’un des principaux régulateurs épigénétiques du génome, la structure et le mécanisme d’activité de NuRD restent largement inconnus. Dans ce travail de doctorat ont été en mesure de fournir des informations structurelles sur la composition du complexe NuRD. Nous avons identifié un complexe noyau d'histone désacétylase (PMMR) et sa structure Cryo-EM à mi-résolution. De plus, nous avons obtenu des structures à basse résolution de plusieurs sous-complexes NuRD, ce qui nous a permis de déduire la voie d'assemblage du complexe holo-NuRD et a conduit à la publication d'un article scientifique.

Ce dernier chapitre de la thèse est consacré à la discussion de nos résultats et de leurs implications. En prenant en considération les informations disponibles dans la littérature, nous spéculons sur la structure du complexe holo-NuRD et son mode d'interaction avec le nucléosome. Nous discutons de l’importance de la flexibilité et de la modularité du complexe pour sa fonction et du défi qu’ils supposent pour les études structurelles. Nous dédions nos remarques finales aux récents développements techniques et technologiques dans le domaine de la biologie structurale et aux travaux futurs à entreprendre afin d’obtenir une structure à haute résolution du complexe NuRD. 99

NuRD complexes have been a highlight of research into epigenetic regulation of the genome, cell lineage differentiation and cancer for decades, leaving as a result a substantial body of literature. However, the structure of the full holo-NuRD complex and its mode of action remain largely unknown. In this PhD work we provide structural insights into these unanswered questions. By comparing a series of drosophila NuRD subcomplexes we are able identify a functional histone deacetylase core NuRD complex, provide a map of the subunits comprising it and propose an assembly pathway for the holo-NuRD complex. These findings are supported by extensive biochemistry and mass spectrometry work performed by our collaborators which are in agreement with the literature published to date.

5.1 dNuRD provides a reliable approximation for the study of hNuRD

Our preliminary results showcased the high complexity and flexibility of hNuRD. We decided to use dNuRD as a simpler model in order to facilitate the single particle approach. While hNuRD is comprised of several isoforms of most of the subunits, dNuRD shows a single variant of each subunit which represents a significant reduction of the compositional heterogeneity of the NuRD complex. Comparison between human and drosophila NuRD complexes by EM shows remarkable similarities. 2D class averages of comparable resolution obtained from hNuRD and dNuRD show virtually identical classes (Figure 5.1a). Some of the negative stain 3D volumes of both samples also show striking similarities, particularly those hNuRD volumes showing elongated shapes (Figure 5.1b). Looking at these comparisons we can be confident that the compositions and architectures of these complexes are equivalent and that dNuRD can be used as a reliable base to approach future studies of hNuRD. However, hNuRD reconstructions also showed a different set of volumes with less resemblance to the dNuRD models we have seen so far (see chapter 2). These volumes, more bulky and globular in shape, might actually represent holo hNuRD complexes, while the smaller elongated shapes would represent the newly discovered and characterised histone deacetylase core complexes (PMMR). Moreover, other 100 Discussion

Figure 5.1: Comparison of the dNuRD and hNuRD models. a) Comparison of the dNuRD and hNuRD 2D class averages, obtained by negative stain EM. Selected classes from the datasets share almost identical features. b) Side views of dNuRD (left) and hNuRD (right) models. Both models display very similar size, shape and features. 5.1 dNuRD provides a reliable approximation for the study of hNuRD 101 studies have suggested that hNuRD shares the same assembly pathway we proposed for dNuRD, starting from MTA2-RBBP7 modules (Brasen et al., 2017) Further work in the human complexes in order to improve the resolution of these models will allow to prove this hypothesis.

5.2 NuRD complexes are inherently heterogeneous and flexible

Despite all the work invested into solving the structure of NuRD complexes along the years by different researchers, a high resolution structure of the holo or even the core NuRD complex is still elusive. Given its medical importance as the first complex driving differentiation of embryonic stem cells, this is an indication how the inherent properties of the complex complicate the EM approach. Literature shows that chromatin remodelling complexes are typically composed of a reasonably stable core and a seizable number of modular, more peripheral subunits and binding partners (see Lai & Wade, 2011 for a list of examples in the context of cancer). The specific composition of a given complex will determine its specificity and modulate its functions. Additionally, a fair amount of compositional flexibility is required to carry out their functions, which includes specific recognition of chromatin motives, histone tails and cofactors to facilitate chemical modifications and to impose the necessary mechanical forces to displace or even evict nucleosomes from chromatin. NuRD has been often showcased along with Polycomb Repressive Complexes (PRC), as they perform complementary chromatin repressing activities; in fact, several examples are know where NuRD seems to create a facilitating environment for PRC’s activity (Bracken et al., 2019). PRCs are typically composed of four to five core subunits, including p55 (RBBP4/7) NuRD subunits, and several transient binding partners, and PRCs are also known to have diverse conformational states. A 21Å resolution EM structure of the holo-PRC2 complex is available (Ciferri et al., 2012), but most of the high- resolution structural knowledge available comes from X-ray crystallography (for a comprehensive review on the recent developments around PRC2, see Moritz & 102 Discussion

Trivel, 2018). The yeast SWI/SNF complex, one of the best studied chromatin remodelling complexes, was resolved by single particle EM to a resolution of 25Å (Dechassa et al., 2008). All these complexes have in common their relatively large size, their modular nature and a certain degree of conformational flexibility, and are known to perform interconnected repressing or activating activities on chromatin, contributing to an intricate epigenetic regulatory network. A recent review providing insights into the relation of these specific complexes, including NuRD, can be found in Bracken et al., 2019. For a more generalist review on chromatin remodelling complexes and their structural basis see Sundaramoorthy, 2019.

Our findings highlight the enormous flexibility of the NuRD complex and mirror the literature regarding other chromatin regulating complexes (see Sundaramoorthy, 2019 for a recent review with several examples of chromatin remodelers). Both biochemical and structural data by us and others point at distinct NuRD subcomplexes which likely carry independent activities (Low et al., 2016; Brasen et al., 2017). These subcomplexes would come together to assemble the holo- NuRD complex, reuniting its characteristic enzymatic functions of chromatin remodelling and histone deacetylation. Our 2D and 3D analysis of the complex strongly suggest the presence of flexible domains flanking the more stable MTA- Rpd dimer axis, which we named “the backbone” of the PMMR subcomplex. We speculate that this flexibility, together with the characteristic modularity, i.e. subcomplex and subunit isoforms for human NuRD, as well as the ability to assemble a holo-NuRD complex in response to different biochemical stimuli, play a pivotal role in the activation of NuRD and its specificity.

5.3 Structure and assembly pathway for NuRD complexes

The aforementioned heterogeneity and flexibility related complications have in practice impeded the publication of a high-resolution cryo-EM NuRD structure. To date, in stark contrast to the massive amount of biochemical research published 5.3 Structure and assembly pathway for NuRD complexes 103 around the NuRD complex, the structure presented in Millard et al., 2016 remains the most complete published EM structure of NuRD. This structure is based in their previous work around the X-Ray crystallography structure (PDB ID 4BKX) showing HDAC1 in complex with the dimeric ELM2-SANT domain of MTA1. The EM structure was refined to 19Å and shows a MTA1-HDAC1-RBBP4 dimer. Given the experimental difficulties, attempts at theoretical NuRD models can be found in recent literature: The most complete of these models shows a complete picture of the holo-NuRD complex (Figure 5.2a and b); it is based on the known structural details of subunits and domains combined with biochemical knowledge regarding the interactions of the subunits (Torrado et al., 2017). Interestingly, this model closely resembles the experimental models we present in this PhD work for NuRD subcomplexes (Figure 5.2c, d, e and f). The most significant difference between our model and Torrado’s model is the location of the p55(p46/48) subunits around the ends of the Rpd3 (HDAC1) subunits, which we attribute to their flexible link to MTA-like. Similarly to the published theoretical model (Torrado et al., 2017), we show the likely position of a nucleosome bound to the NuRD complex. In the proposed model the nucleosome sits between the MTA-like–Rpd3 dimer and the density which we suggested to be MBD-like This arrangement would allow the Rpd3 deacetylase access to the histone tails and MBD-like subunit access to the DNA. The Mi-2 subunit would be located along the nucleosome edge in the vicinity of MBD-like, thanks to Simjang acting as a bridge between MBD-like and the C- terminal area of Mi-2. Like this, Mi-2 has makes full contact with the DNA strand and is in proximity of the H3 and H4 histone tails, allowing interaction with both for its nucleosome remodelling activity. Thanks to their flexible placement in the MTA- like extremes, the p55 subunits would also be able to relocate themselves in the proximity of the nucleosome and the histone tails, allowing the NuRD complex to clamp into the nucleosome along it’s edge and from its sides.

The series of NuRD subcomplexes we present, resolved to up to 16.6Å, represent a new step in our understanding of the NuRD complex structure and function. We have demonstrated that the PMMR core complex (containing the histone deacetylase activity) and the Mi-2/CHD3-4 unit (the nucleosome remodelling subunit) exist as independent functional entities. The interaction between GATA/Simjang and MBD plays a central role in bridging these two distinct complexes together to form the holo-NuRD complex (Torrado et al., 2017; Mor et 104 Discussion

Figure 5.2: Nucleosome bound NuRD models, comparison between published model and this PhD work. a and b) Holo-NuRD model proposed in Torrado el al., 2017. MDB and GATA are partially modelled after disordered polypeptide chains, while the rest of the volume is based on published structures of domains, subunits or homologues. Front (a) and side (b) views are shown, with a 90 degrees rotation between them. c, d and e) Our experimental PMMR model (my work) shows many similarities to the theoretical model presented by Torrado. Here front (a), side (b) and rear (c) views of drosophila PMMR are shown, displayed at 90 degrees rotation steps, with a fitted nucleosome (PBD entry 1AOI). Atomic models of MTA1-HDAC1 (4BKX) and RBBP4 (4PBY) have been individually fitted inside the model, and the density attributed to MBD-like is highlighted in yellow. Fitting of the subunits is approximate, as the current resolution doesn’t allow for detailed localisation of each subunit. One of the p55 pairs is displaced below the MTA-like – Rpd3 dimer compared to the theoretical model, which we attribute to the flexible link of these subunits to MTA-like. g) The green shape shows the position Mi-2/CHD4 would occupy in the complex. Shape and approximate position of Mi-2 are modelled after unpublished data by Farnung et al. (6RYR, pending publication) 5.3 Structure and assembly pathway for NuRD complexes 105 al., 2018). These findings corroborate suggestions of previous biochemical studies: For instance, it is known that deletion of GATAD2A disrupts the function of NuRD in maintaining cellular lineage commitment, and that this disruption is exploitable for the preparation of induced pluripotent stem cells. However, deletion of other closely related subunits such as MBD proves to be lethal, highlighting the importance of their independent functions and the role of GATA as a scaffolding protein for the NuRD complex (Mor et al., 2018). The fact that we were able to purify significant amounts of assembled PMMR complex from cytosolic extract also opens the way for further speculation concerning possible independent functions of this complex outside of the nucleus. Interestingly, HDAC6 has been shown to function as a tubulin deacetylase, helping regulate microtubule stability in the cytosol (Hubbert et al., 2002; Miyake et al., 2016). Altogether, these findings and the sequentially build subcomplexes allowed us to propose an assembly pathway for the holo-NuRD complex, starting from PM and MR modules and the addition of MBD. GATA bridged the resulting PMMR and the MI-2/CHD and DOC1/ CG18292 subunits (Zhang et al., 2016).

5.4 Final remarks

In this PhD thesis we present novel NuRD complex models, which are more complete than those currently presented in the literature and in agreement with cross-linking mass spectrometry data of our collaborators in Laue group. To date our findings led to the publication of an assembly pathway for holo-NuRD (Zhang et al., 2016). At the same time, a high resolution structure of NuRD remains elusive. We speculate that the inherent flexibility characteristic of the human and drosophila NuRD complexes poses an obstacle for the single particle Cryo-EM characterisation at high resolution. Future work on the complex will have to take this into account and come forward with new sample preparation strategies to tackle these issues, ideally finding a way to restrict the flexibility to few distinct conformations, e.g. by locking it down onto a substrate nucleosome. Multidisciplinary approaches have distinct advantages when approaching such complex problems: the combination of EM, mass spectrometry and computational 106 Discussion modelling has been a worthy tool in the study of NuRD complexes (see above). Further collaboration along these lines is essential for progress. The development of EM techniques and technologies is also bound to be a crucial factor in shaping the filed in the close future. Recent technological developments such as new generation direct electron detectors and further automation of the grid preparation processes are currently being made available to scientists. The integration of GPU acceleration in the EM computational workflow during the recent years supposed a major step in the management of time and resources, allowing for more potent workstations and servers and thus making possible the process of previously unthought volumes of data. Advances in the computational scripts used for data processing also brought major changes to the cryo-EM field. The most relevant exponents of these changes are the newer versions of RELION (Zivanov et al., 2018) and Cryosparc (Punjani et al.,2017) (versions 3 and 2 respectively) proposing new and exciting concepts such as Bayesian polishing for improved beam-induced particle movement correction (Zivanonv et al., 2019) and stochastic gradient descent, a machine learning algorithm that allows for several fold faster processing of EM data (Punjani et al.,2017).

All these technical advances provide crucial advantages for the study of heterogeneous and flexible complexes. First of all, improved acquisition quality allows for reconstruction of volumes using less data; needing less averaging of particles means that less, more similar particles can be used for optimal results. Ultimately, with the improvement of cryo electron tomography techniques, we would ideally be able to obtain high resolution reconstructions from individual particles. It is also worth to mention the recent developments in Focused Ion Bean Milling technology (FIB-Milling) (Giannuzzi et al., 1998; Villa et al., 2013), which allows to obtain thin sections of frozen cells without the mechanical artefacts derived from classical cryo-sectioning. This development aims at, in the future, studying proteins directly in their native environment, inside the cell. The improvements in micrograph acquisition speed and in processing capacity also allow for larger datasets, which in turn allows for more extensive classification and in silico purification of the samples.

These developments, together with the knowledge we have already accumulated and new sample preparation strategies, open the way for new, promising attempts at uncovering the high-resolution structures of the NuRD complexes which would 5.4 Final remarks 107 reveal the interplay of the modular subunits, target specificity and molecular mechanisms. References

Aasland R., Stewart A.F., Gibson T. 1996. The SANT domain: a putative DNA-binding domain in the SWI-SNF and ADA complexes, the transcriptional co-repressor N-CoR and TFIIIB. Trends Biochem Sci. 21(3):87-8. Abdizadeh T., Kalani M.R., Abnous K., Tayarani-Najaran Z., Khashyarmanesh B.Z., Abdizadeh R., Ghodsi R., Hadizadeh F.. 2017. Design, synthesis and biological evaluation of novel coumarin-based benzamides as potent histone deacetylase inhibitors and anticancer agents. Eur J Med Chem. 132:42-62. Adam R.C. & Fuchs E., 2016. The Yin and Yang of Chromatin Dynamics In Stem Cell Fate Selection. Trends Genet. 32(2):89-100. Aguilera C., Nakagawa K., Sancho R., Chakraborty A., Hendrich B., Behrens A.. Aguilera et al., 2011. c-Jun N-terminal phosphorylation antagonises recruitment of the Mbd3/NuRD repressor complex. Nature, 469(7329):231-5. Adhikari S. & Curtis P.D. 2016. DNA methyltransferases and epigenetic regulation in bacteria. FEMS Microbiol Rev, 40(5):575-91. Ahmad K. & Henikoff S. 2002. The histone variant H3.3 marks active chromatin by replication- independent nucleosome assembly. Mol. Cell 9: 1191-1200. Akey C.W. & Luger K. 2003. Histone chaperones and nucleosome assembly. Curr Opin Struct Biol. 13(1):6-14. Allen H.F., Wade P.A., Kutateladze T.G. 2013. The NuRD architecture. Cell Mol Life Sci. 70(19): 3513–3524. Allfrey V.G., Faulkner R., Mirsky A.E. 1964. Acetylation and methylation of histones and their possible role in the regulation of RNA synthesis. proc natl acad sci usa 51:786-794. Andisheh-Tadbir A., Dehghani-Nazhvani A., Ashraf M.J., Khademi B., Mirhadi H., Torabi- Ardekani S. 2016. MTA1 Expression in Benign and Malignant Salivary gland Tumors. Iran J Otorhinolaryngol. ;28(84):51-9. Alqarni S.S., Murthy A., Zhang W., Przewloka M.R., Silva A.P., Watson A.A., Lejon S., Pei X.Y., Smits A.H., Kloet S.L., Wang H., Shepherd N.E., Stokes P.H., Blobel G.A., Vermeulen M., Glover D.M., Mackay J.P., Laue E.D. 2014. Insight into the architecture of the References 109

NuRD complex: Structure of the RbAp48-MTA1 sub-complex. J.Biol.Chem. 289: 21844. Ambrosi C., Manzo M., Baubec T., 2017. Dynamics and context-dependent roles of DNA methylation. J Mol Biol, 429(10): 1459-1475. Amor D.J., Bentley K., Ryan J., Perry J., Wong L., Slater H., Andy Choo K.H. 2004. Human centromere repositioning “in progress”. Proc Natl Acad Sci U S A. 101(17): 6542- 6547. Angelov D., Stefanovsky V.Y., Dimitrov S.I., Russanova V.R., Keskinova E., Pashev IG. 1988. Protein-DNA crosslinking in reconstituted nucleohistone, nuclei and whole cells by picosecond UV laser irradiation. Nucleic Acids Res, 16: 4525-4538. Angelov D., Vitolo J.M., Mutskov V., Dimitrov S., Hayes J.J. 2001. Preferential interaction of the core histone tail domains with linker DNA. Proc. Natl. Acad. Sci. U.S.A. 98: 6599- 6604. Arents G., Burlingame R.W., Wang B.C., Love W.E., Moudrianakis E.N. 1991. The nucleosomal core histone octamer at 3.1 Å resolution: a tripartite protein assembly and a left- handed superhelix. Proc. Natl. Acad. Sci. U.S.A. 88:10148–10152. Arents G., Moudrianakis E.N. 1995. The histone fold: a ubiquitous architectural motif utilized in DNA compaction and protein dimerization. Proc. Natl. Acad. Sci. U.S.A 92:11170– 11174. Arranz R., Coloma R., Chichón F.J., Conesa J.J., Carrascosa J.L., Valpuesta J.M., Ortín J., Martín- Benito J.. 2012. The structure of native influenza virion ribonucleoproteins. Science. 338(6114):1634-7. Avery O.T., Macleod C.M., McCarty M. 1944. Studies on the Chemical Nature of the Substance Inducing Transformation of Pneumococcal Types: Induction of Transformation by a Desoxyribonucleic Acid Fraction Isolated from Pneumococcus Type III. The Journal of Experimental Medicine. 79 (2): 137–58. Babu YS., Bugg C.E., Cook W.J.. 1988. Structure of calmodulin refined at 2.2 A resolution. J Mol Biol. 204(1):191-204. Bai X.C., Rajendra E., Yang G., Shi Y., Scheres S.H. 2015. Sampling the conformational space of the catalytic subunit of human γ-secretase. Elife. 1;4. Balboula A.Z., Stein P., Schultz R.M., Schindler K. 2015. RBBP4 regulates histone deacetylation and bipolar spindle assembly during oocyte maturation in the mouse. Biol Reprod. 92(4):105. Banères J.L., Martin A., Parello J. 1997. The N tails of histones H3 and H4 adopt a highly structured conformation in the nucleosome. J Mol Biol., 273(3):503-8. Bannister A.J., Schneider, R., and Kouzarides, T. 2002. Histone methylation: dynamic or static? Cell 109: 801–806. 110 References

Barr M. L. & Bertram, E G. 1949. A Morphological Distinction between Neurones of the Male and Female, and the Behaviour of the Nucleolar Satellite during Accelerated Nucleoprotein Synthesis. Nature. 163 (4148): 676-7. Basta J. & Rauchman M., 2016. The Nucleosome Remodeling and Deacetylase (NuRD) Complex in Development and Disease Transl Res. 165(1): 36–47. Baymaz H.I., Fournier A., Laget S., Ji Z., Jansen P.W., Smits A.H., Ferry L., Mensinga A., Poser I., Sharrocks A., Defossez P.A., Vermeulen M. 2014. MBD5 and MBD6 interact with the human PR–DUB complex through their methyl-CpG- binding domain. Proteomics, 14(19): 2179-2189. Beard C., Li E., 1995 Jaenisch R, Loss of methylation activates Xist in somatic but not in embryonic cells. Genes Dev, 9(19): 2325-34 Ben-avraham D., Muzumdar R.H., Auzmon G. 2014. Epigenetic genome-wide association methylation in aging and longevity. Epigenomics. 4(5): 503–509. Berchtold M.W. & Villalobo A. 2014. The many faces of calmodulin in cell proliferation, programmed cell death, autophagy, and cancer. Biochim. Biophys Acta, 1843(2):398- 435. Bernstein B.E., Humphrey E.L., Erlich R.L., Schneider R., Bouman P., Liu J.S., Kouzarides T., Schreiber S.L. 2002. Methylation of histone H3 Lys 4 in coding regions of active genes. Proc Natl Acad Sci U S A. 99(13):8695-700. Berger I., Bieniossek C., Schaffitzel C., Hassler M., Santelli E., Richmond T.J. 2003. Direct interaction of Ca2+/calmodulin inhibits histone deacetylase 5 repressor core binding to myocyte enhancer factor 2. J. Biol. Chem., 278: 17625-17635. Berger I., Fitzgerald D.J., Richmond T.J. 2004. Baculovirus expression system for heterologous multiprotein complexes. Nat. Biotechnol., 22: 1583-1587. Berger SL, Kouzarides T, Shiekhattar R, Shilatifard A 2009. "An operational definition of epigenetics". Genes Dev. 23 (7): 781–3. Bhutani N., Burns D.M., Blau H.M. 2011. DNA demethylation dynamics. Cell, 146(6):866-72. Bird A, Taggart M, Frommer M, Miller OJ, Macleod D. 1985. A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell, 40(1): 91-9. Bird A.P. & wolffe A.P. 1999. Methylation induced represion --- Belts, braces and chromatin. Cell, 99(5): 451-4. Bird A.P. 1986. CpG rich islands and the function of DNA methylation. Nature, 321 (6067): 209- 13. Bjerling P., Silverstein R.A., Thon G., Caudy A., Grewal S., Ekwall K. 2002. Functional divergence between histone deacetylases in fission yeast by distinct cellular localization and in vivo specificity. Mol Cell Biol. 22(7):2170-81. References 111

Böhm V., Hieb A.R., Andrews A.J., Gansen A., Rocker A., Tóth K., Luger K., Langowski J. 2011. Nucleosome accessibility governed by the dimer/tetramer interface. Nucleic Acids Res. 39(8):3093-102.

Boivin S., Kozak S., Meijers R. 2013. Optimization of protein purification and characterization using Thermofluor screens. Protein Expr Purif. 91(2):192-206. Bouazoune K., Mitterweger A., Langst G., Imhof A., Akhtar A., Becker P.B., Brehm A. 2002. The dMi-2 chromodomains are DNA binding modules important for ATP-dependent nucleosome mobilization. EMBO J 21:2430–2440. Bourc’his D. & Bestor T.H. 2004. Meiotic catastrophe and retrotransposon reactivation in male germ cells lacking Dnmt3L. Nature,. 431(7004): 96-9. Bowen N.J., Fujita N., Kajita M., Wade P.A.. 2004. Mi-2/NuRD: multiple complexes for many purposes. Biochim Biophys Acta. 1677(1-3):52-7. Boyer L. A., Latek R. R., Peterson C. L. 2004. The SANT domain: a unique histone-tail-binding module? Nature Reviews Molecular Cell Biology, 5 (2): 158–163. Bracken A.P., Brien G.L., Verrijzer C.P. 2019. Dangerous liaisons: interplay between SWI/SNF, NuRD, and Polycomb in chromatin regulation and cancer. Genes Dev. 33(15-16): 936–959. Brackertz M., Boeke J., Zhang R., Renkawitz R. 2002. Two Highly Related p66 Proteins Comprise a New Family of Potent Transcriptional Repressors Interacting with MBD2 and MBD3. J Biol Chem. 277: 40958-40966. Brackertz M., Gong Z., Leers J., Renkawitz R. 2006. p66alpha and p66beta of the Mi-2/NuRD complex mediate MBD2 and histone interaction. Nucleic Acids Res 34:397–406. Brasen C., Dorosz J., Wiuf A., Boesen T., Mirza O., Gajhede M.. 2017. Expression, purification and characterization of the human MTA2-RBBP7 complex. Biochim Biophys Acta Proteins Proteom. 1865(5):531-538. Braunstein M., Sobel R.E., Allis C.D., Turner B.M., Broach J.R. 1996. Efficient transcriptional silencing in Saccharomyces cerevisiae requieres a heterochromatin histone acetylation pattern. Mol Cell Biol. 16(8): 4349-56. Brown C.J., Ballabio A., Rupert J.L., Lafreniere R.G., Grompe M., Tonlorenzi R., Willard H.F. 1991. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature, 349(6304):38-44. Brownell J.E., Zhou J., Ranalli T., Kobayashi R., Edmondson D.G., Roth S.Y., Allis C.D. 1996. Tetrahymena histone acetyltransferase A: A homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell, 84(6):843-51. Bruno M., Flaus A., Stockdale C., Rencurel C., Ferreira H., Owen-Hughes T. 2003. Histone H2A/H2B dimer exchange by ATP-dependent chromatin remodeling activities. Mol Cell. 12(6):1599-606. 112 References

Buajeeb W., Zhang X., Ohyama H., Han D., Surarit R., Kim Y., Wong D.T. 2004. Interaction of the CDK2-associated protein-1, p12(DOC-1/CDK2AP1), with its homolog, p14(DOC- 1R). Biochem. Biophys. Res. Commun. 315:998-1003. Cai Y., Jin J., Florens L., Swanson S.K., Kusch T., Li B., Workman J.L., Washburn M.P., Conaway R.C., Conaway J.W. 2005. The mammalian YL1 protein is a shared subunit of the TRRAP/TIP60 histone acetyltransferase and SRCAP complexes. J Biol Chem. 280(14):13665-70. Caldecott K.W. 2008. Single-strand break repair and genetic disease. Nat Rev Genet. 9(8):619-31. Callebaut I., Courvalin J.C., Mornon J.P. et al., 1999. The BAH (bromo-adjacent homology) domain: a link between DNA methylation, replication and transcriptional regulation. FEBS Letters, 446(1): 189-193. Campbell M.G., Cheng A., Brilot A.F., Moeller A., Lyumkis D., Veesler D., Pan J., Harrison S.C., Potter C.S., Carragher B. Grigorieff N. 2012. Movies of ice-embedded particles enhance resolution in electron cryo-microscopy. Structure. 20:1823–1828. Cao R. & Zhang Y., 2004. The functions of E(Z)/EZH2-mediated methylation of lysine 27 in histone H3. Curr Opin Genet Dev. 14(2):155-64. Capuano F., Mülleder M., Kok R., Blom H.J., Ralser M. 2014. Cytosine DNA Methylation Is Found in Drosophila melanogaster but Absent in Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Other Yeast Species. Anal Chem, 86(8): 3697-3702. Cavalli G., Heard E. 2019 Advances in epigenetics link genetics to the environment and disease. Nature; 571(7766): 489-499. Chamberlin M., Baldwin R. L., Berg, P. 1963. An enzymatically synthesized RNA of alternating base sequence: Physical and chemical characterization. J. Mol. Biol. 20, 334–349. Champagne K.S. & kutateladze T.G. 2009. Structural insight into histone recognition by the ING PHD fingers. Curr Drug Targets., 10(5):432-41. Chen C., Hou X., Wang G., Pan W., Yang X., Zhang Y., Fang H.. 2017. Design, synthesis and biological evaluation of quinoline derivatives as HDAC class I inhibitors. Eur J Med Chem. 133:11-23. Chen G.C., Guan L.S., Yu J.H., Li G.C., Choi Kim H.R., Wang Z.Y.. 2001. Rb-associated protein 46 (RbAp46) inhibits transcriptional transactivation mediated by BRCA1. Biochem Biophys Res Commun. 284(2):507-14. Chen S, McMullan G., Faruqi A.R., Murshudov G.N., Short J.M., Scheres S.H., Henderson R. 2013. High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy. 135:24-35. Chen T. & Dent S.Y. 2014 Chromatin modifiers and remodellers: regulators of . Nat. Rev. Genet., 15(2): 93–106. References 113

Chen Z., Zang J., Whetstine J., Hong X., Davrazou F., Kutatcladzo T.G., Simpson M., Mao Q., Pan C.H., and Dai S. 2006. Structural insights into histone demethylation by JMJD2 family members. Cell. 125: 691–702. Chahwan R., Wontakal S.N., Roa S. 2010. Crosstalk between genetic and epigenetic information through cytosine deamination. Trends Genet., 26(10):443-8. Chambers A.L., Pearl L.H., Oliver A.W., Downs J.A. 2012. The BAH domain of Rsc2 is a histone H3 binding domain. Nucleic Acids Research, 41(19): 9169-9182.

Chari A., Haselbach D., Kirves J.M., Ohmer J., Paknia E., Fischer N., Ganichkin O., Möller V., Frye J.J., Petzold G., Jarvis M., Tietzel M., Grimm C., Peters J.M., Schulman B.A., Tittmann K., Markl J., Fischer U., Stark H. 2015. ProteoPlex: stability optimization of macromolecular complexes by sparse-matrix screening of chemical space. Nat Methods. 12(9):859-65. Christophorou M. A., Castelo-Branco G., Halley-Stott R. P., Oliveira C. S., Loos R., Radzisheuskaya A., Mowen K. A., Bertone P., Silva J. C. R., Zernicka-Goetz M., Nielsen M. L., Gurdon J. B., Kouzarides T.(2014. Citrullination regulates pluripotency and histone H1 binding to chromatin. Nature, 507: 104–108. Chudnovsky Y., Kim D., Zheng S., Whyte W., Bansal M., Bray M.A., Gopai S., Theisen M.A., Bilodeau S., Thiru P., Muffat J., Ylimaz O.H., Mitalipova M., Woolard K., Lee J., Nishimura R., Sakata N., Fine H.A., Carpenter A.E., Silver S.J., Verhaak R.G.W., Califano A., Young R.A., Ligon K.L., Mellinghof I.K., Root D.E., Sabatini D.M., Hahn W.C., Chheda M.G. 2014. ZFHX4 interacts with the NuRD core member CHD4 and regulates the glioblastoma tumor initiating cell state. Cell Rep. 6(2): 313–324. Ciferri C., Lander G.C., Maiolica A., Herzog F., Aebersold R., Nogales E.. 2012. Molecular architecture of human polycomb repressive complex 2. Elife. 1:e00005. Clapier C.R. & Cairns B.R. 2009. The biology of chromatin remodeling complexes. Annu. Rev. Biochem. 78:273–304. Clarke S. 1993 Protein methylation. Current Opinion in Cell Biology, 5(6):977–983. Clayton A.L., Hazzalin C.A., Mahadevan L.C. 2006. Enhanced histone acetylation and transcription: a dynamic perspective. Mol Cell. 23(3):289-9 Cong L 1 , Pakala SB , Ohshiro K , Li DQ , Kumar R 2011. SUMOylation and SUMO-interacting motif (SIM) of metastasis tumor antigen 1 (MTA1) synergistically regulate its transcriptional repressor function. J Biol Chem. 286(51): 43793-43808. Conomos D., Reddel R.R., Pickett H.A.. 2014. NuRD-ZNF827 recruitment to telomeres creates a molecular scaffold for homologous recombination. Nat Struct Mol Biol. 21(9):760-70. Cloos P.A., Christensen J., Agger K., Maiolica A., Rappsilber J., Antal T., Hansen K.H., Helin K. 2006. The putative oncogene GASC1 demethylates tri- and dimethylated lysine 9 on histone H3. Nature. 442(7100):307-11. 114 References

Costanzi C., & Pehrson J.R. 1998. Histone macroH2A1 is concentrated in the inactive X chromosome of female mammals. Nature, 393: 599-601. Creamer K.M. & PartridgeJ.F. 2011. RITS- connecting transcription, RNAi and heterochromatin assembly in Fission Yeast. Wiley Interdiscip Rev RNA. 2(5): 632–646. Cramer K.M., Job G., Shanker S., Neale G.A., Lin Y., Bartholomew B., Partridge J.F. 2014. The Mi-2 Homolog Mit1 Actively Positions Nucleosomes within Heterochromatin To Suppress Transcription. Mol Cell Biol. 34(11): 2046–2061. Cramer P., Bushnell D.A., Kornberg R.D.. 2001. Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science. 292(5523): 1863-76. Cuthbert G.L., Daujat S., Snowden A.W., Erdjument-Bromage H., Hagiwara T., Yamada M., Schneider R., Gregory P.D., Tempst P., Bannister A.J., and Kouzarides T. 2004. Histone deimination antagonizes arginine methylation. Cell. 118: 545–553. D'Alesio C., Punzi S., Cicalese A., Fornasari L., Furia L., Riva L., Carugo A., Curigliano G., Criscitiello C., Pruneri G., Pelicci PG., Faretta M., Bossi D., Lanfrancone L. 2016. RNAi screens identify CHD4 as an essential gene in breast cancer growth. Oncotarget. 7(49):80901-80915. Davey C.A., Sargent D.F., Luger K., Maeder A.W., Richmond T.J. (2002). Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 Å resolution. J Mol Biol. 319(5): 1097-113. Daxinger L. & Whitelaw E. 2012. Understanding transgenerational epigenetic inheritance via the in mammals. Nature Reviews Genetics, 13: 153–162. de la Barre A.E., Angelov D., Molla A., Dimitrov S. 2001. The N-terminus of histone H2B, but not that of histone H3 or its phosphorylation, is essential for chromosome condensation. EMBO J. 20:6383–6393. de la Rosa-Trevín J.M., Quintana A., Del Cano L., Zaldívar A., Foche I., Gutiérrez J., Gómez- Blanco J., Burguet-Castell J., Cuenca-Alba J., Abrishami V., Vargas J., Otón J., Sharov G., Vilas J.L., Navas J., Conesa P., Kazemi M., Marabini R., Sorzano C.O., Carazo J.M.. 2016. Scipion: A software framework toward integration, reproducibility and validation in 3D electron microscopy. J Struct Biol. 195(1):93-9. de Ruijter A.J., van Gennip A.H., Caron H.N., Kemp S., van Kuilenburg A.B.. 2003. Histone deacetylases (HDACs): characterization of the classical HDAC family. Biochem J. 370(3):737-49. Deans C. & Maggert K.A. 2015. What do you mean, “Epigenetic”? GENETICS. 199(4): 887-896. Dechassa M.L., Zhang B., Horowitz-Scherer R., Persinger J., Woodcock C.L., Peterson C.L.,

Bartholomew B. 2008. Architecture of the SWI/SNF-Nucleosome Complex. Mol Cell Biol. 28(19): 6010–6021. References 115

Dege C. & Hagman J., 2014. Mi-2/NuRD chromatin remodeling complexes regulate B and T- development and function. Immunol Rev. 2014 Sep;261(1):126-40. Deng L., Tang J., Yang H., Cheng C., Lu S., Jiang R., Sun B.. 2017. MTA1 modulated by miR-30e contributes to epithelial-to-mesenchymal transition in hepatocellular carcinoma through an ErbB2-dependent pathway. Oncogene. 36(28):3976-3985. Denhardt D.T. 2018. Effect of stress on human biology: Epigenetics, adaptation, inheritance, and social significance. J Cell Physiol. 233(3):1975-1984. Deshpande A.M., Dai Y.S., Kim Y., Kim J., Kimlin L., Gao K., Wong D.T. 2009. Cdk2ap1 is required for epigenetic silencing of Oct4 during murine differentiation. J Biol Chem. 284(10):6043-7. Denner D.R. & Rauchman M. 2013. Mi-2/NuRD is required in renal progenitor cells during embryonic kidney development. Dev Biol. 375(2):105-16. Denslow, S. A. & Wade, P. A. 2007. The human Mi‐2/NuRD complex and gene regulation. Oncogene 26: 5433–5438. Desai M.A., Webb H.D., Sinanan L.M., Scarsdale J.N., Walavalkar N.M, Ginder G.D., Williams D.C.. Jr. Desai et al., 2015. An intrinsically disordered region of methyl-CpG binding domain protein 2 (MBD2) recruits the histone deacetylase core of the NuRD complex. Nucleic Acids Res. 43(6):3100-13.

Desrochers G., Cappadocia L., Lussier-Price M., Ton A.T., Ayoubi R., Serohijos A., Omichinski J.G., Angers A.. 2017. Molecular basis of interactions between SH3 domain- containing proteins and the proline-rich region of the ubiquitin ligase Itch. J Biol Chem. 292(15):6325-6338. Dhar S., Kumar A., Gomez C.R., Akhtar I., Hancock J.C., Lage J.M., Pound C.R., Levenson A.S.. 2017. MTA1-activated Epi-microRNA-22 regulates E-cadherin and prostate cancer invasiveness. FEBS Lett. 591(6):924-933. Di Lorenzo A.; Bedford M. T. 2011 Histone arginine methylation. FEBS Lett. 585: 2024–203. Ding X., Liu S., Tian M., Zhang W., Zhu T., Li D., Wu J., Deng H., Jia Y., Xie W., Xie H., Guan

J.S. 2017. Activity-induced histone modifications govern Neurexin-1 mRNA splicing and memory preservation. Nat Neurosci. 20(5):690-699. Dorigo B., Schalch T., Bystricky K., Richmond T.J. 2003. Chromatin fiber folding: requirement for the histone H4 N-terminal tail. J. Mol. Biol. 327:85–96. Dorigo B., Schalch T., Kulangara A., Duda S., Schroeder R.R., Richmond T.J. 2004. Nucleosome arrays reveal the two-start organization of the chromatin fiber. Science, 306:1571– 1573. 116 References dos Santos R.L., Tosti L., Radzisheuskaya A., Caballero I.M., Kaji K., Hendrich B., Silva J.B. 2014. MBD3/NuRD facilitates induction of pluripotency in a context-dependent manner. Cell Stem Cell. 15(1):102-10. Doskocil J. & Sormova Z. 1965. The occurrence of 5-methylcytosine in baterial deoxyribonucleid acids. Biochim Biophys Acta, 95:513-5. Dovey O.M., Foster C.T., Conte N., Edwards S.A., Edwards J.M., Singh R., Vassiliou G., Brandley A., Cowley S.M. 2013. Histone deacetylase 1 and 2 are essential for normal T-cell development and genomic stability in mice. Blood 121(8): 1335-1344. Duncan E.A., Anest V., Cogswell P., Baldwin A.S. 2006. The kinases MSK1 and MSK2 are required for epidermal -induced, but not tumor necrosis factor-induced, histone H3 Ser10 phosphorylation. J. Biol. Chem. 281: 12521-12525. Dunn D.B. & Smith J.D. 1958. The occurrence of 6-methylaminopurine in deoxyribonucleic acids. Biochem J. 68(4):627-36. Ehrlich M., Gama-Sosa M.A., Carreira L.H., Ljungdahl L.G., Kuo K.C., Gehrke C.W. 1985. DNA methylation in thermophilic bacteria: N4-methylcytosine, 5-methylcytosine, and N6- methyladenine. Nucleic Acids Res, 13(4):1399-412. Emre N.C., Ingvarsdottir K., Wyce A., Wood A., Krogan N.J., Henry K.W., Li K., Marmorstein R., Greenblatt J.F., Shilatifard A., Berger S.L. 2005. Maintenance of low histone ubiquitylation by Ubp10 correlates with telomere-proximal Sir2 association and gene silencing. Mol. Cell. 17: 585–594. ENCODE Project Consortium, 2012. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature. 489(7414): 57–74. Ertekin A., Aramini J.M., Rossi P., Leonard P.G., Janjua H., Xiao R., Maglaqui M., Lee H.W., Prestegard J.H., Montelione G.T. 2012. Human cyclin-dependent kinase 2-associated protein 1 (CDK2AP1) is dimeric in its disulfide-reduced state, with natively disordered N-terminal region. J Biol Chem. 287(20):16541-9. Evans T. & Felsenfeld G. 1989. The erythroid-specific transcription factor eryf1: A new finger protein. Cell, 58(5): 877-885. Fabbri M., Garzon R., Cimmino A., Liu Z., Zanesi N., Callegari E., Liu S., Alder H., Costinean S., Fernandez-Cymering C., Volinia S., Guler G., Morrison C.D., Chan K.K., Marcucci G., Calin G.A., Huebner K., Croce C.M. 2007. MicroRNA-29 family reverts aberrant methylation in lung cancer by targeting DNA methyltransferases 3A and 3B. Proc Natl Acad Sci U S A. 104(40):15805-10. Fan J.Y., Rangasamy D., Luger K., Tremethick, D.J. 2004. H2A.Z alters the nucleosome surface to promote HP1a-mediated chromatin folding. Mol. Cell 16: 1-20. Farthing C.R., Ficz G., Ng R.K., Chan C.F., Andrews S., Dean W., Hemeberger M., Reik W. 2008. Global Mapping of DNA Methylation in Mouse Promoters Reveals Epigenetic Reprogramming of Pluripotency, Genes., PLOS, References 117

Feng L. & Chen X. 2015. Epigenetic regulation of germ cells-remember or forget? Curr Opin Genet Dev. 31:20-7. Feng Q., Cao R., Xia L., Erdjument-Bromage H., Tempst P., Zhang Y. 2002. Identification and functional characterization of the p66/p68 components of the MeCP1 Complex. Mol Cell Biol. 22(2): 536. Finch J.T. & Klug A. (1976). Solenoidal model for superstructure in chromatin. Proc Natl Acad Sci U S A 73:1897–1901. Fire A., Xu S., Montgomery M.K., Kostas S.A., Driver S.E., Mello C.C. 1998; Potent and specific genetic interference by double stranded RNA in caenorhabditis elegans. Nature, 391(6669): 806-11. Fischle W., Wang Y., Allis C.D. 2003. Histone and chromatin cross-talk. Curr Opin Cell Biol, 15(2): 172-83. Fischle W., Tseng B.S., Dormann H.L., Ueberheide B.M., Garcia B.A., Shabanowitz J., Hunt, D.F. Funabiki H., Allis C. D., 2005; Regulation of HP1-chromatin binding by histone H3 and phosphorylation. Nature, 438: 1116–1122 Fitzgerald D.J., Berger P., Schaffitzel C., Yamada K., Richmond T.J., Berger, I. 2006. Protein complex expression by using multigene baculoviral vectors. Nature methods, 3 (12), 1021–1032. Fraga M.F., Ballestar E., Paz M.F., Ropero S., Setien F., Ballestar M.L., Heine-Suñer D., Cigudosa J.C., Urioste M., Benitez J., Boix-Chornet M., Sanchez-Aguilera A., Ling C., Carlsson E., Poulsen P., Vaag A., Stephan Z., Spector T.D., Wu Y.Z., Plass C., Esteller M. 2005. Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A. 102(30):10604-9. Frank J., Radermacher M., Penczek P., Zhu J., Li Y., Ladjadj M., Leith A. 1996. SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields. J Struct Biol. 116(1):190-9. Freitag M. & Selker E.U., 2005. Controlling DNA methylation: many roads to one modification. Curr Opin Genet Dev, 15(2): 191-9. Fujita H., Fujii R., Aratani S., Amano T., Fukamizu A., Nakajima T. 2003b. Antithetic effects of MBD2a on gene regulation. Mol Cell Biol. 23(8): 2645-2657. Fujita N., Jaye D.L., Kajita M., Geigerman C., Moreno C.S., Wade P.A. 2003a. MTA3, a Mi-2/NuRD complex subunit, regulates an invasive growth pathway in breast cancer. Cell, 113(2): 207-219. Fuhrmann J. & Thompson P.R., 2016. Protein Arginine Methylation and Citrullination in Epigenetic Regulation. ACS Chem Biol. 11(3): 654–668. Gao S., Li X., Zang J, Xu W., Zhang Y.. 2017. Preclinical and Clinical Studies of Chidamide (CS055/HBI-8000), An Orally Available Subtype-selective HDAC Inhibitor for Cancer Therapy. Anticancer Agents Med Chem. 17(6):802-812. 118 References

Gardiner-Garden M. & Frommer M. 1987. CpG islands in vertebrate genomes. J Mol Biol, 196(2): 261-82. Ge Q., Nilasena D.S., O’Brien C.A., Frank M.B., Targoff IN 1995. Molecular analysis of a major antigenic region of the 240-kD protein of Mi-2 autoantigen. J Clin Invest 96:1730– 1737 Geidushek E. P., Nakamoto T., Weiss S. B. (1961) The enzymatic synthesis of RNA: Complementary interaction with DNA. Proc. Natl. Acad. Sci. U. S. A. 47(9), 1405– 1415. Gerstein M. & Krebs W. 1998. A database of macromolecular motions. Nucleic Acids Res. 26(18):4280-90. Giannuzzi L.A., Drown J.L., Brown S.R., Irwin R.B., Stevie F.A..1998. Applications of the FIB lift-out technique for TEM specimen preparation 41(4):285-90. Giet R. & Glover D.M. 2001. Drosophila is required for histone H3 phosphorylation and condensin recruitment during chromosome condensation and to organize the central spindle during cytokinesis. J. Cell Biol. 152: 669-682. Gnanapragasam M.N., Scarsdale J.N., Amaya M.L., Webb H.D., Desai M.A., Walavalkar N.M., Wang S.Z., Zhu S.Z., Ginder G.D., Williams D.C. Jr. 2011. p66α–MBD2 coiled-coil interaction and recruitment of Mi-2 are critical for globin gene silencing by the MBD2–NuRD complex. Proc Natl Acad Sci U S A. 108(18): 7487–7492. Goll M.G. & Bestor T.H. 2005. Eukaryotic cytosine methyltransferases. Annu Rev Biochem, 74: 481-514. Gong Z., Brackertz M., Renkawitz R. 2006. SUMO modification enhances p66-mediated transcriptional repression of the Mi-2/NuRD complex. Mol Cell Biol 26:4519–4528. Gothel S.F. & Marahiel M.A. 1999. Peptidyl-prolyl cis-trans isomerases, a superfamily of ubiquitous folding catalysts. Cell. Mol. Life Sci. 55: 423–436. Gowher H., Leismann O., Jeltsch A. 2000. DNA of Drosophila melanogaster contains 5- methylcytosine. EMBO, 19(24):6918-23. Grant, T. & Grigorieff N. 2015. Measuring the optimal exposure for single particle cryo-EM using a 2.6 Å reconstruction of rotavirus VP6. eLife. 4(e06980):1-19.

Greer E.L., Blanco M.A., Gu L., Sendric E., Liu J., Aristizábal-Corrales D., Hsu C.H., Aravind L.,

He C., Shi Y. 2015. DNA Methylation on N6-Adenine in C. elegans. Cell, 161(4):868- 78 Grewal S.I. & Jia S. 2007 Heterochromatin revisited. Nat Rev Genet. 8(1):35-46. Grunstein M. 1997. Histone acetylation in chromatin structure and transcription. Nature 389:349– 352. References 119

Guillemette S., Serra R.W., Peng M., Hayes J.A., Konstantinopoulos P.A, Green M.R., Cantor S.B.. 2015. Resistance to therapy in BRCA2 mutant cells due to loss of the nucleosome remodeling factor CHD4. Genes Dev. 29(5):489-94. Günther K., Rust M., Leers J., Boettger T., Scharfe M., Jarek M., Bartkuhm M., Renkawitz R. 2013. Differential roles for MBD2 and MBD3 at methylated CpG islands, active promoters and binding to exon sequences. Nucleic Acids Res. 41(5): 3010–3021. Gursoy-Yuzugullu O, House N, Price BD. 2016. Patching Broken DNA: Nucleosome Dynamics and the Repair of DNA Breaks. J Mol Biol. 428(9 Pt B): 1846-60. Hahn S. 2004. Structure and mechanism of the RNA Polymerase II transcription machinery. Nat Struct Mol Biol. 11(5): 394–403. Haines T.R., Rodenhiser D.I., Ainsworth P.J. 2001. Allele-specific non-CpG methylation of the Nf1 gene during early mouse development. Dev Biol., 240(2):585-98. Hainer S.J., McCannell K.N., Yu J., Ee L.S., Zhu L.J., Rando O.J. Fazzio T.G. 2016. DNA methylation directs genomic localization of Mbd2 and Mbd3 in embryonic stem cells. eLife. 5: e21964. Hamiche A., Kang J.G., Dennis C., Xiao H., Wu C. 2001. Histone tails modulate nucleosome mobility and regulate ATP-dependent nucleosome sliding by NURF. Proc Natl Acad Sci U S A., 98(25): 14316–14321. Hao B., Oehlmann S., Sowa M.E., Harper J.W., Pavletich N.P. 2007. Structure of a Fbw7-Skp1- Cyclin E Complex: Multisite-Phosphorylated Substrate Recognition by SCF Ubiquitin Ligases. Mol.Cell 26: 131-143. Hassa P.O., Haenni S.S., Elser M., and Hottiger M.O. 2006. Nuclear ADP-ribosylation reactions in mammalian cells: where are we today and where are we going?. Microbiol. Mol. Biol. Rev. 70: 789–829. He H., Kong S., Liu F., Zhang S., Jiang Y., Liao Y., Jiang Y., Li Q., Wang B., Zhou Z., Wang H., Huo R. 2015. Rbbp7 Is Required for Uterine Stromal Decidualization in Mice. Biol Reprod. 93(1): 13. He J., Shen S., Lu W., Zhou Y., Hou Y., Zhang Y., Jiang Y., Liu H., Shao Y.. 2016. HDAC1 promoted migration and invasion binding with TCF12 by promoting EMT progress in gallbladder cancer. Oncotarget. 7(22):32754-64. He Y.F., Li B.Z., Li Z., Liu P., Wang Y., Tang Q., Ding J., Jia Y., Chen Z., Li L., Sun Y., Li X., Dai Q., Song C.X., Zhang K., He C., Xu G.L. 2011. Tet-mediated formation of 5- carboxylcytosine and its excision by TDG in mammalian DNA. Science, 333(6047): 1303-7. Hebbes T.R., Thorne A.W., Crane-Robinson C. 1988. A direct link between core histone acetylation and transcriptionally active chromatin. EMBO J. 7(5):1395-402. 120 References

Hell J.W. 2014. CaMKII: claiming center stage in postsynaptic function and organization. Neuron, 81(2):249-265. Hendrich B., Guy J., Ramsahoye B., Wilson V.A., Bird A. 2001. Closely related proteins MBD2 and MBD3 play distinctive but interacting roles in mouse development. Genes Dev. 15(6):710-23. Henikoff S. 2005. Histone modifications: Combinational complexity or cumulative simplicity? Prot Natl Acad Sci U S A. 102(15):5308-9. Henry K.W., Wyce A., Lo W.S., Duggan L.J., Emre N.C.T., Kao C.F., Pillus L., Shilatifard A., Osley M.A., Berger S.L. 2003; Transcriptional activation via sequential histone H2B ubiquitylation and deubiquitylation, mediated by SAGA-associated Ubp8. Genes Dev. 17(21): 2648-2663. Hershey A.D. & Chase M. (1952). Independent functions of viral protein and nucleic acid in growth of bacteriophage. The Journal of General Physiology. 36 (1): 39–56. Hewish D. R. & Burgoyne L. A. 1973. Chromatin sub-structure. The digestion of chromatin DNA at regularly spaced sites by a nuclear deoxyribonuclease. Biochem Biophys Res Commun 52: 504-510. Heymann J.B., Belnap D.M. 2007. Bsoft: image processing and molecular modeling for electron microscopy. J Struct Biol. 157(1):3-18. Hnilicová J., Hozeifi S., Dušková E., Icha J., Tománková T., Staněk D. 2011. Histone deacetylase activity modulates alternative splicing. PLoS One. 6(2):e16727. Holliday R. 1994. Epigenetics: an overview. Dev. Genet. 15(6): 453–457. Hong L., Schroth G.P., Matthews H.R., Yau P., Bradbury E.M. 1993. Studies of the DNA binding properties of histone H4 amino terminus. Thermal denaturation studies reveal that acetylation markedly reduces the binding constant of the H4 "tail" to DNA. J Biol Chem. 268(1):305-14. Hottiger M.O., Hassa P.O., Lüscher B., Schüler H., Koch-Nolte F. 2010. Toward a unified nomenclature for mammalian ADP-ribosyltransferases. Trends Biochem Sci. 35(4):208-19. Hu G. & Wade P.A. 2012. NuRD and Pluripotency: A Complex Balancing Act Cell Stem Cell. 10(5): 497–503. Hubbert C., Guardiola A., Shao R., Kawaguchi Y., Ito A., Nixon A., Yoshida M., Wang X.F., Yao T.P.. 2002. HDAC6 is a microtubule-associated deacetylase. Nature. May 23;417(6887):455-8. Hughes-Hallett D., Gleason A.M, McCallum W.G., Flath D.E., Lock P.F., Tucker T.W., Lomen D.O., Lovelock D., Mumford D., Osgood B.G., Quiney D., Rhea K., Tecosky- Feldman J. 2005. Calculus (4th ed.). Wiley. p.252. References 121

Humphrey G.W., Wang Y., Russanova V.R., Hirai T., Qin J., Nakatani Y., Howard B.H. 2001. Stable histone deacetylase complexes distinguished by the presence of SANT domain proteins CoREST/kiaa0071 and Mta-L1. Journal of Biological Chemistry, 276 (9): 6817–6824. Hunter T. & Karin M. 1992. The regulation of transcription by phosphorylation. Cell. 70: 375–87. Ikura T., Ogryzko V.V., Grigoriev M., Groisman R., Wang J., Horikoshi M., Scully R., Qin J., Nakatani Y. 2000. Involvement of the TIP60 histone acetylase complex in DNA repair and apoptosis. Cell. 102(4):463-73. Imhof A. & Becker P.B. 2001. Modifications of the histone N-terminal domains. Evidence for an "epigenetic code"? Moll Biotechnol, 17(1): 1-13. Iwasaki W., Miya Y., Horikoshi N., Osakabe A., Taguchi H., Tachiwana H., Shibata T., Kagawa W., Kurumizaka H. 2013. Contribution of histone N-terminal tails to the structure and stability of nucleosomes. FEBS Open Bio., 3: 363-369. Jähner D, Stuhlmann H, Stewart CL, Harbers K, Löhler J, Simon I, Jaenisch R. 1982 De novo methylation and expression of retroviral genomes during mouse embryogenesis. Nature, 298(5875): 623-8. Jamaladdin S. Kelly R.D.W., O’Regan L., Dovey Oliver M. Hodson G.E., Millard C.J., Portolan N., Fry A.M., Schwabe J.W.R., Cowley S.M. 2014. Histone deacetylase (HDAC) 1 and 2 are essential for accurate cell division and the pluripotency of embryonic stem cells. Proc Natl Acad Sci U S A 111: 9840-9845. Jenuwein T. & Allis C.D. 2001, Translating the histone code. Science 293(5532): 1074-80. Job G., Brugger C., Xu T., Lowe B.R., Pfister Y., Qu Chunxu., Shanker S., Baños Sanz J.I., Partridge J.F., Schalch T. 2016. SHREC Silences Heterochromatin via Distinct Remodeling and Deacetylation Modules. Mol Cell, 62(2): 207-221. Jones, P. A. 2002 DNA methylation and cancer. Oncogene21, 5358-5360. Jones P.A. & Baylin S.B. 2002. The fundamental role of epigenetic events in cancer. Nat Rev Genet. 3(6):415-28. Jones P.L., Veenstra G.J., Wade P.A., Vermaak D., Kass S.U., Landsberger N., Strouboulis J., Wolffe A.P. 1998. Methylated DNA and MeCP2 recruit histone deacetylase to repress transcription. Nat Genet., 19(2):187-91. Ju B.G., Lunyak V.V., Perissi V., Garcia-Bassets I., Rose D.W., Glass C.K., Rosenfeld M.G. 2006. A topoisomerase IIbeta-mediated dsDNA break required for regulated transcription. Science. 312: 1798–1802. Jump D.B., Butt T.R., Smulson M. 1979. Nuclear protein modification and chromatin substructure. 3. Relationship between poly (adenosine diphosphate) ribosylation and different functional forms of chromatin. Biochemistry, 18: 983 – 990. 122 References

Kadoch C., Hargreaves D.C., Hodges C., Elias L., Ho L., Ranish J., Crabtree G.R. 2013. Proteomic and bioinformatic analysis of mammalian SWI/SNF complexes identifies extensive roles in human malignancy. Nat Genet. 45(6):592-601. Kafri T., Ariel M., Brandeis M., Shemer R., Urven L., McCarrey J., Cedar H., Razin A. 1992. Developmental pattern of gene-specific DNA methylation in the mouse embryo and germ line. Genes Dev., 6(5):705-14. Kaji K., Caballero I.M., MacLeod R., Nichols J., Wilson V.A., Hendrich B 2006. The NuRD component Mbd3 is required for pluripotency of embryonic stem cells. Nat Cell Biol. 8(3):285-92. Kaji K., Nichols J., Hendrich B. 2007. Mbd3, a component of the NuRD co-repressor complex, is required for development of pluripotent cells. Development (Cambridge, Engl) 134:1123–32. Kai L., Samuel S.K., Levenson A.S.. 2010. Resveratrol enhances p53 acetylation and apoptosis in prostate cancer by inhibiting MTA1/NuRD complex. Int J Cancer. 126(7):1538-48. Kamakaka R.T. & Biggins, S. 2005. Histone Variants: Deviants? Genes & Development, 19, 295- 316. Kan P.Y., Caterino T.L., Hayes J.J. 2009. The H4 Tail Domain Participates in Intra- and Internucleosome Interactions with Protein and DNA during Folding and Oligomerization of Nucleosome Arrays. Mol Cell Biol., 29(2): 538-546. Kay P.H., Pereira E., Marlow S.A., Turbett G., Mitchell C.A., Jacobsen P.F., Holliday R., Papadimitriou J.M. 1994. Evidence for adenine methylation within the mouse myogenic gene Myo-D1. Gene, 151(1-2):89-95. Kapoor-Vazirani P., Kagey J.D., Vertino P.M. 2011. SUV420H2-Mediated H4K20 Trimethylation Enforces RNA Polymerase II Promoter-Proximal Pausing by Blocking hMOF- Dependent H4K16 Acetylation. Mol Cell Biol., 31(8): 1594–1609. Karras G.I., Kustatscher G., Buhecha H.R., Allen M.D., Pugieux C., Sait F., Bycroft M., Ladurner A.G. 2005. The macro domain is an ADP-ribose binding module. EMBO J. 24(11):1911-20. Kastner B., Fischer N., Golas M.M., Sander B., Dube P., Boehringer D., Hartmuth K., Deckert J., Hauer F., Wolf E., Uchtenhagen H., Urlaub H., Herzog F., Peters J.M., Poerschke D., Lührmann R. Stark H. (2008) GraFix: sample preparation for single-particle electron cryomicroscopy. Nature Methods, 5: 53–55. Kehle J., Beuchle D., Treuheit S., Christen B., Kennison J.A., Bienz M., Muller J. 1998. dMi-2, a hunchback-interacting protein that functions in polycomb repression. Science 282: 1897–1900. Kelly R.D.W. & Cowley S.M. 2013. The physiological roles of histone deacetylase (HDAC) 1 and 2: complex co-stars with multiple leading parts. Biochem Soc Trans 41(3): 741-749. References 123

Keniry A., Gearing L.J., Jansz N., Liu J., Holik A.Z., Hickey P.F., Kinkel S.A., Moore D.L., Breslin K., Chen K., Liu R., Phillips C., Pakusch M., Biben C., Sheridan J.M., Kile B.T., Carmichael C., Ritchie M.E., Hilton D.J., Blewitt M.E. 2016. Setdb1-mediated H3K9 methylation is enriched on the inactive X and plays a role in its epigenetic silencing. Epigenetics Chromatin, 9: 16. Kidder B.L. & Palmer S. 2012. HDAC1 regulates pluripotency and lineage specific transcriptional networks in embryonic and stem cells. Nucleic Acids Res. 40(7):2925-39. Kim J.Y., Kwak P.B., Weitz C.J.. 2014. Specificity in circadian clock feedback from targeted reconstitution of the NuRD corepressor. Mol Cell. 2014 Dec 18;56(6):738-48. Kim M.S., Chung N.G., Kang M.R., Yoo N.J., Lee S.H.. 2011. Genetic and expressional alterations of CHD genes in gastric and colorectal cancers. Histopathology. 58(5):660-8. Kloet S.L., Baymaz H.I., Makowski M., Groenewold V., Jansen P.W.T.C., Berendsen M., Niazi H., Kops G.J. Vermeulen M. 2014. Towards elucidating the stability, dynamics and architecture of the nucleosome remodeling and deacetylase complex by using quantitative interaction proteomics. FEBS J., 282(9): 1774-85. Kolla V., Naraparaju K., Zhuang T., Higashi M., Kolla S., Blobel G.A., Brodeur G.M.. 2015. The tumour suppressor CHD5 forms a NuRD-type chromatin remodelling complex. Biochem J. 468(2):345-52. Koludrovic D., Laurette P., Strub T., Keime C., Le Coz M., Coassolo S., Mengus G., Laure L., Davidson I. 2015. Chromatin-Remodelling Complex NURF Is Essential for Differentiation of Adult Melanocyte Stem Cells. PLoS Genet. 11(10): e1005555. Kornberg R.D. (1974) Chromatin structure: a repeating unit of histones and DNA. Science, 184(4139): 868-71. Kouzarides T. 2007. Chromatin Modifications and Their Function. Cell, 128(4): 693-705. Krogan N.J., Kim M., Tong A., Golshani A., Cagney G., Canadien V., Richards D.P., Beattie B.K., Emili A., Boone C., Shilatifard A., Buratowski S., Greenblatt J. 2003. Methylation of histone H3 by Set2 in Saccharomyces cerevisiae is linked to transcriptional elongation by RNA polymerase II. Mol. Cell. Biol. 23: 4207–4218. Kuo A.J., Song J., Cheung P., Ishibe-Murakami S., Yamazoe S., Chen J.K., Pater D.J., Gozani O. 2012. The BAH domain of ORC1 links H4K20me2 to DNA replication licensing and Meier–Gorlin syndrome. Nature 484:115–119. Kuzmichev A., Jenuwein T., Tempst P., Reinberg D., 2004. Different Ezh2-Containing Complexes Target Methylation of Histone H1 or Nucleosomal Histone H3. Cell, 14(2): 183-193. Laherty C.D., Yang W.M., Sun J.M., Davie J.R., Seto E., Eisenman R.N. 1997. Histone deacetylases associated with the mSin3 corepressor mediate mad transcriptional repression. Cell, 89 (3): 349–356. 124 References

Lai A.Y. & Wade P.A. 2011. Cancer biology and NuRD: a multifaceted chromatin remodelling complex. Nature Reviews. Cancer, 11 (8), 588–596. Lai D., Wan M., Wu J., Preston-Hurlburt P., Kushwaha R., Grundstrom T., Imbalzano A.N., Chi T. 2009. Introduction of TLR4-target genes entails calcium/calmodulin-dependent regulation of chromatin remodeling. Proc. Natl. Acad. Sci. U.S.A., 106: 1169-1174. Landry, J., Sutton, A., Hesman, T., Min, J., Xu, R.M., Johnston, M., and Sternglanz, R. 2003. Set2- catalyzed methylation of histone H3 represses basal expression of GAL4 in Saccharomyces cerevisiae. Mol. Cell. Biol. 23: 5972–5978. Larochelle M. & Gaudreau L. 2003. H2A.Z has a function reminiscent of an activator required for preferential binding to intergenic DNA. EMBO J. 22: 4512-4522. Larsen D.H., Poinsignon C., Gudjonsson T., Dinant C., Payne M.R., Hari F.J., Danielsen J.M.R., Menard P., Sand J.C., Stucki M., Lukas C., Bartek J., Andersen J.S., Lukas J. 2010. The chromatin-remodeling factor CHD4 coordinates signaling and repair after DNA damage. J Cell Biol. 190(5): 731–740. Latham K.E., Sapienza C., Engel N. 2012. The epigenetic lorax: gene–environment interactions in human health. Epigenomics. 4(4): 383–402. Lauffer B.E., Mintzer R., Fong R., Mukund S., Tam C., Zilberleyb I., Flicke B., Ritscher A., Fedorowicz G., Vallero R., Ortwine D.F., Gunzner J., Modrusan Z., Neumann L., Koth C.M., Lupardus P.J., Kaminker J.S., Heise C.E., Steiner P. 2013. Histone Deacetylase (HDAC) Inhibitor Kinetic Rate Constants Correlate with Cellular Histone Acetylation but Not Transcription and Cell Viability. J.Biol.Chem. 288: 26926-26943. Laugese A.. & Helin K.. 2014. Chromatin repressive complexes in stem cells, development, and cancer. Cell Stem Cell. 14(6):735-51. Le Guenzennec X., Vermeullen M., Brinkman A., Hoeijmakers W.A.M., Cohen A., Lasonder E., Stunnenberg H.G. 2006. MBD2/NuRD and MBD3/NuRD, two distinct complexes with different biochemical and functional properties. Mol. Cell Biol., 26: 843-851. Lee D.Y., Hayes J.J., Pruss D., Wolffe A.P. 1993. A positive role for histone acetylation in transcription factor access to nucleosomal DNA. Cell, 72(1):73-84. Lee J.T., Davidow L.S., Warshawsky D. 1998. Tsix, a gene antisense to Xist at the X-inactivation centre. Nat Genet. 21(4):400-4. Lee J.T. 2012 Epigenetic regulation by long noncoding RNAs. Science. 338:1435–9. Leschziner A.E. 2011. Electron microscopy studies of nucleosome remodelers. Curr. Opin. Struct. Biol. 21:709–718. Lev Mahor G., Yearim A., Ast G. 2015. The alternative role of DNA methylation in splicing regulation. Trends in Genetics, 31(5): 274-280. Levinger L. and Varshavsky A. 1982. Selective arrangement of ubiquitinated and D1 protein- containing nucleosomes within the Drosophila genome. Cell 28: 375 – 385. References 125

Li D.Q., Pakala S.B., Reddy S.D., et al. 2010 Revelation of p53-independent function of MTA1 in DNA damage response via modulation of the p21 WAF1-proliferating cell nuclear antigen pathway. J Biol Chem.;285(13):10044–10052. Li G.C. & Wang Z.Y. 2006. Retinoblastoma suppressor associated protein 46 (RbAp46) attenuates the beta-catenin/TCF signaling through up-regulation of GSK-3beta expression. Anticancer Res. 26(6B):4511-8. Li R., He Q., Han S., Zhang M., Liu J., Su M., Wei S., Wang X., Shen L.. 2017. MBD3 inhibits formation of liver cancer stem cells. Oncotarget. 8(4):6067-6078. Li X. & Zhao X. 2008. Epigenetic Regulation of Mammalian Stem Cells. Stem Cells Dev. 17(6): 1043–1052. Li X., Jia S., Wang S., Wang Y., Meng A.. 2009. Mta3-NuRD complex is a master regulator for initiation of primitive hematopoiesis in vertebrate embryos. Blood. 114(27):5464-72. Liang J., Wan M., Zhang Y., Gu P., Xin H., Jung S.Y., Qin JUN., Wong J., Cooney A.J., Liu D., Songyang Z. 2008. Nanog and Oct4 associate with unique transcriptional repression complexes in embryonic stem cells. Nat Cell Biol, 10: 731-739. Liew C.K., Simpson R.J.Y., Kwan A.H.Y., Crofts L.A., Loughlin F.E., Matthews J.M., Crossley M., Mackay J.P. 2005. Zinc fingers as protein recognition motifs: Structural basis for the GATA-1/Friend of GATA interaction. PNAS 102 (3): 583-588. Lejon S., Thong S.Y., Murthy A., AlQarni S., Murzina N.V., Blobel G.A., Laue E.D., Mackay J.P. 2011. Insights into association of the NuRD complex with FOG-1 from the crystal structure of an RbAp48.FOG-1 complex. J Biol Chem 286: 1196–1203. Li D.Q. & Kumar R. 2010. Mi-2/NuRD complex making inroads into DNA-damage response pathway Cell Cycle. 9(11): 2071–2079. Lillycrop K.A., Phillips E.S., Jackson A.A., Hanson M.A., Burdge G.C. 2005. Dietary protein restriction of pregnant rats induces and folic acid supplementation prevents epigenetic modification of hepatic gene expression in the offspring. J Nutr 135(6):1382-6. Lin C.W., Wang L.K., Wang S.P., Chang Y.L., Wu Y.Y., Chen H.Y., Hsiao T.H., Lai W.Y., Lu H.H., Chang Y.H., Yang S.C., Lin M.W., Chen C.Y., Hong T.M., Yang P.C.. 2016. Daxx inhibits hypoxia-induced lung cancer cell metastasis by suppressing the HIF-1α/HDAC1/Slug axis. Nat Commun. 7:13867. Liu B., Yip R.K.H., Zhou Z. 2012. Chromatin remodeling, DNA damage repair and aging. Curr. Genomics. 13(7): 533–47. Liu J., Wang H., Ma F., Xu D., Chang Y., Zhang J., Wang J., Zhao M., Lin C., Huang C, Qian H., Zhan Q.. 2015. MTA1 regulates higher-order chromatin structure and histone H1- chromatin interaction in-vivo. Mol Oncol. 9(1):218-35. 126 References

Liu Y., Long Y., Xing Z., Zhang.D 2015. C-Jun recruits the NSL complex to regulate its target gene expression by modulating H4K16 acetylation and promoting the release of the repressive NuRD complex. Oncotarget. 6(16): 14497–14506. Loidl P. 1994. Histone acetylation: facts and questions. Chromosoma. 103(7):441-9. Lorch Y., Maier-Davis B., Kornberg R.D. 2006. Chromatin remodeling by nucleosome disassembly in vitro. Proc Natl Acad Sci U S A. 103(9):3090-3. Low J.K., Webb S.R., Silva A.P., Saathoff H., Ryan D.P., Torrado M., Brofelth M., Parker B.L., Shepherd N.E., Mackay J.P.. 2016. In most of the cases CHD4 is the helicase contained in hNuRD complexes. J Biol Chem. 291(30):15853-66. Lu Y., Loh Y.H., Li H., Cesana M., Ficarro S.B., Parikh J.R., Salomonis N., Toh C.X., Andreadis S.T., Luckey C.J., Collins J.J., Daley G.Q., Marto J.A. 2014. Alternative splicing of MBD2 supports self-renewal in human pluripotent stem cells. Cell Stem Cell. 15(1):92-101. Luo J., Su F., Chen D., Shiloh A., Gu W.. 2000. Deacetylation of p53 modulates its effect on cell growth and apoptosis. Nature. 408(6810): 377-81. Ludwigsen J., Pfennig S., Singh A., Schlinder C., Harrer., Forné I., Zacharias M., Mueller-Planitz F. 2017. Concerted regulation of ISWI by an autoinhibitory domain and the H4 N- terminal tail. Elife., 6: e21477 Lusser A., Urwin D.L., Kadonaga J.T. 2005. Distinct activities of CHD1 and ACF in ATP- dependent chromatin assembly. Nat Struct Mol Biol. 12(2):160-6. MacKintosh C. 1998. Regulation of cytosolic enzymes in primary metabolism by reversible protein phosphorylation. Curr Opin Plant Biol. 1: 224–9. Maeshima K., Imai R., Tamura S., Nozaki T. (2014). Chromtin as dynamic 10-nm fibers. Chromosoma. 123(3): 225-37. Mahadevan L.C., Willis A.C., Barratt M.J. 1991. Rapid histone H3 phosphorylation in response to growth factors, phorbol esters, okadaic acid, and protein synthesis inhibitors. Cell 65(5):775-83. Mansfield R.E., Musselman C.A., Kwan A.H., Oliver S.S., Garske A.L., Davrazou F., Denu J.M., Kutateladze T.G., Mackay J.P. 2011. Plant homeodomain (PHD) fingers of CHD4 are histone H3-binding modules with preference for unmodified H3K4 and methylated H3K9. J Biol Chem 286:11779–11791. Marabini R., Masegosa I.M., San Martin M.C., Marco S., Fernandez J.J., de la Fraga L.G., Vaquerizo C., Carazo J.M. 1996, Xmipp: An image processing package for electron microscopy. J. Structural Biology, 116: 237-240. Marchese F.P. & Huarte M. 2014. Long non-coding RNAs and chromatin modifiers. Their place in the epigenetic code. Epigenetics. 9(1): 21–26. References 127

Marino S. & Nusse R. 2007. Mutants in the Mouse NuRD/Mi2 Component P66α Are Embryonic Lethal. PLoS One. 2(6): e519. Martin C. & Zhang Y. 2005. The diverse functions of histone lysine methylation. Nat Rev Mol Cell Biol. 6(11):838-49. Matsumoto M., Toraya T. 2008. cDNA cloning, expression, and characterization of methyl-CpG- binding domain type 2/3 proteins from starfish and sea urchin. Gene. 420: 125–134. Matsuo K., Shintani S., Tsuji T., Nagata E., Lerman M., McBride J., Nakahara Y., Ohyama H., Todd R., Wong D.T. 2000. p12(DOC-1), a growth suppressor, associates with DNA polymerase alpha/primase. FASEB J. 14(10):1318-24. Maudet C., Sourisce A., Dragin L., Lahouassa H., Rain J.C., Bouaziz S, Ramirez B.C., Margottin- Goguet F. HIV-1 Vpr induces the degradation of ZIP and sZIP, adaptors of the NuRD chromatin remodeling complex, by hijacking DCAF1/V prBP. PLoS One. 8: (10):e77320. Meehan R.R., Lewis J.D., McKay S., Kleiner E.L., Bird A.P. 1989. Identification of a mammalian protein that binds specifically to DNA containing methylated CpGs. Cell, 58(3): 499- 507. Messner S., Altmeyer M., Zhao H., Pozivil A., Roschitzki B., Gehrig P., Rutishauser D., Huang D., Caflisch A., Hottiger M.O. 2010. PARP1 ADP-ribosylates lysine residues of the core histone tails. Nucleic Acids Res. 38(19): 6350–6362. Messner S. & Hottiger M.O. 2011.Histone ADP-ribosylation in DNA repair, replication and transcription. Trends Cell Biol. 21(9): 534-42. Millard C.J., Watson P.J., Celardo I., Gordiyenko Y., Cowley S.M., Robinson C.V., Fairall L., Schwabe J.W.R. 2013. Class I Hdacs Share a Common Mechanism of Regulation by Inositol Phosphates. Mol.Cell 51: 57. Millard C.J. & Fairall L. 2014. Towards an understanding of the structure and function of MTA1. Cancer Metastasis Rev 33:857. Millard C.J., Varma N., Saleh A., Morris K., Watson P.J., Bottrill A.R., Fairall L., Smith, C.J., Schwabe J.W. 2016. The structure of the core NuRD repression complex provides insights into its interaction with chromatin. Elife 5: e13941-e13941. Miller K.M., Tjeertes J.V., Coates J., Legube G., Polo S.E., Britton S., Jackson S.P. 2010. Human HDAC1 and HDAC2 function in the DNA-damage response to promote DNA nonhomologous end-joining. Nat Struct Mol Biol. 17(9):1144-51. Miyake Y., Keusch J.J., Wang L., Saito M., Hess D., Wang X., Melancon B.J., Helquist P., Gut H., Matthias P. 2016. Structural insights into HDAC6 tubulin deacetylation and its selective inhibition. Nat Chem Biol.12(9):748-54. Mohamed Ariff I., Mitra A., Basu A.. 2012. Epigenetic regulation of self-renewal and fate determination in neural stem cells. J Neurosci Res. 90(3):529-39. 128 References

Monk M., Boubelik M., Lehnert S. 1987. Temporal and regional changes in DNA methylation in the embryonic, extraembryonic and germ cell lineages during mouse embryo development. Development, 99: 371–382. Montgomery R.L., Davis C.A., Potthoff M.J., Haberland M., Flelitz J., Qi X., Hill J.A., Richardson J.A., Olson E.N. 2007. Histone deacetylases 1 and 2 redundantly regulate cardiac morphogenesis, growth, and contractility. Genes Dev. 21(14): 1790–1802. Moore S.C., & Ausió J. 1997. Major role of the histones H3–H4 in the folding of the chromatin fiber. Biochem. Biophys. Res. Commun. 230:136–139. Mor N., Rais Y., Sheban D., Peles S., Aguilera-Castrejon A., Zviran A., Elinger D., Viukov S.,

Geula S., Krupalnik V., Zerbib M., Chomsky E., Lasman L., Shani T., Bayerl J., Gafni

O., Hanna S., Buenrostro J.D., Hagai T9, Masika H. Vainorius G., Bergman Y., Greenleaf W.J., Esteban M.A., Elling U., Levin Y., Massarwa R., Merbl Y.,

Novershtern N., Hanna J.H. 2018. Neutralizing Gatad2a-Chd4-Mbd3/NuRD Complex Facilitates Deterministic Induction of Naive Pluripotency. Cell Stem Cell. 23(3):412- 425.e10. Moritz L.E. & Trievel R.C. 2018. Structure, mechanism, and regulation of polycomb-repressive complex 2. J Biol Chem. ;293(36):13805–13814. Morra R., Lee B.M., Shaw H., Tuma R., Mancini E.J.. 2012. Concerted action of the PHD, chromo and motor domains regulates the human chromatin remodelling ATPase CHD4. FEBS Lett. 586(16):2513-21. Morrison A.J. & Shen X. 2009. Chromatin remodelling beyond transcription: the INO80 and SWR1 complexes. Nat Rev Mol Cell Biol. 10(6):373-84. Moser M.A., Hagelkruys A., Seiser C. 2014. Transcription and beyond: the role of mammalian class I lysine deacetylases. Chromosoma. 123(1-2): 67–78. Mouysset J., Gilberto S., Meier M.G., Lampert F., Belwal M., Meraldi P., Peter M.. 2015. CRL4(RBBP7) is required for efficient CENP-A deposition at centromeres. J Cell Sci. ;128(9):1732-45. Murawsky C.M., Brehm A., Badenhorst P., Lowe N., Becker P.B., Travers A.A. 2001. Tramtrack69 interacts with the dMi-2 subunit of the Drosophila NuRD chromatin remodelling complex. EMBO Rep 2:1089–1094. Murzina N.V., Pei X.Y., Zhang W., Sparkes M., Vicente-Garcia J., Pratap J.V., McLaughlin S.H., Ben-Shahar T.R., Verreault A., Luisi B.F., Laue E.D. 2008. Structural basis for the recognition of histone H4 by the histone-chaperone RbAp46. Structure 16:1077–1085. Murray K. 1964. The occurrence of epsilon-N-methyl Lysine in histones. Biochemistry, 3:10-5. Musselman C.A., Mansfield R.E., Garske A.L., Davrazou F., Kwan A.H., Oliver S.S., O’Leary H., Denu J.M., Mackay J.P., Kutateladze T.G. 2009a. Binding of the CHD4 PHD2 finger to histone H3 is modulated by covalent modifications. Biochem J 423:179–187. References 129

Musselman C.A., Mansfield R.E., Garske A.L., Davrazou F., Kwan A.H., Olivier S.S., O’Leary H., Denu J.M., Mackay J.P., Kutateladze T.G. 2009b. Binding of the CHD4 PHD2 finger to histone H3 is modulated by covalent modifications. Biochem J 423:179–187. Musselman C.A., Ramírez J., Sims J.K., Mansfield R.E., Oliver S.S., Denu J.M., Mackay J.P., Wade P.A., Hagman J., Kutateladze T.G.. 2012. Bivalent recognition of nucleosomes by the tandem PHD fingers of the CHD4 ATPase is required for CHD4-mediated repression. Proc Natl Acad Sci U S A. 109(3):787-92. Nakagawa S. & Kageyama Y., 2014. Nuclear lncRNAs as epigenetic regulators-beyond skepticism. Biochim Biophys Acta. 1839(3):215-22. Nan X., Ng H.H., Johnson C.A., Laherty C.D., Turner B.M., Eisenman R.N., Bird A.. 1998. Transcriptional repression by the methyl-CpG-binding protein MeCP2 involves a histone deacetylase complex. Nature, 939(6683): 386-9. Narlikar G.J. 2010. A proposal for kinetic proof reading by ISWI family chromatin remodeling motors. Curr Opin Chem Biol. 14(5): 660-665. Nathan D., Ingvarsdottir K., Sterner D.E., Bylebyl G.R., Dokmanovic M., Dorsey J.A., Whelan K.A., Krsmanovic M., Lane W.S., Meluh P.B., Johnson E.S., Berger S.L. 2006. Histone sumoylation is a negative regulator in Saccharomyces cerevisiae and shows dynamic interplay with positive-acting histone modifications. Genes Dev. 20: 966– 976. Neer E.J., Schmidt C.J., Nambudripad R., Smith T.F. 1994. The ancient regulatory-protein family of WD-repeat proteins. Nature, 371(6495):297-300. Nelson C.J., Santos-Rosa H., Kouzarides T. 2006. Proline isomerization of histone H3 regulates lysine methylation and gene expression. Cell. 126: 905–916. Nesterova T.B., Popova B.C., Cobb B.S., Norton S., Senner C.E., Tang Y.A., Spruce T., Rodriguez T.A., Sado T., Merkenschlager M., Brockdorff N. 2008. Dicer regulates Xist promoter methylation in ES cells indirectly through transcriptional control of Dnmt3a. Epigenetics & Chromatin. 1(1): 2. Newman J. 2004. Novel buffer systems for macromolecular crystallization. Acta Crystallogr D Biol Crystallogr. 60:610-2. Nicolson, G. L., Nawa A., Toh Y., Taniguchi S., Nishimori K., Moustafa A. 2003. Tumor metastasis-associated human MTA1 gene and its MTA1 protein product: role in epithelial cancer cell invasion, proliferation and nuclear regulation. Clin. Exp. Metastasis 20: 19–24. Nie T., Hui X., Mao L., Nie B, Li K., Sun W.,Gao X., Tang X., Xu Y., Jiang B., Tu Z., Li P., Ding

K., Han W., Zhang S., Xu A., Ding S., Liu P, Patterson A., Cooper G., Wu D. 2016. Harmine Induces Adipocyte Thermogenesis through RAC1-MEK-ERK-CHD4 Axis. Sci Rep. 6: 36382. 130 References

Nio K., Yamashita T., Okada H., Kondo M., Hayashi T., Hara Y., Nomura Y., Zeng S.S., Yoshida M., Hayashi T., Sunagozaka H., Oishi N., Honda M., Kaneko S.. 2015. Defeating EpCAM(+) liver cancer stem cells by targeting chromatin remodeling enzyme CHD4 in human hepatocellular carcinoma. J Hepatol. 63(5):1164-72. Nishino Y., Eltsov M., Joti Y., Ito K., Takata H., Takahashi Y., Hihara S., Frangakis A.S., Imamoto N., Ishikawa T., Maeshima K. (2012) Human mitotic chromosomes consist predominantly of irregularly folded nucleosome fibres without a 30 nm chromatin structure. EMBO J. 31(7): 1644-53. Nitarska J., Smith J.G., Sherlock W.T. Hillege M. M.G., Nott, A., Barshop W.D., Vashisht A.A., Wohlschlegel J.A., Mitter R., Riccio A. 2016. A Functional Switch of NuRD Chromatin Remodeling Complex Subunits Regulates Mouse Cortical Development. Cell Rep. 17(6):1683-1698. Nguyen V.Q., Ranjan A., Stengel F., Wei D., Aebersold R., Wu C., Leschziner A.E. 2013. Molecular Architecture of the ATP-Dependent Chromatin-Remodeling Complex SWR1. Cell, 154(6): 1220–1231. Nurse N.P., Jimenez-Useche I., Smith I.T., Yuan C. 2013. Clipping of Flexible Tails of Histones H3 and H4 Affects the Structure and Dynamics of the Nucleosome. Biophys J., 104(5): 1081–1088. O’Shaughnessy A. & Hencdrich B. 2013. CHD4 in the DNA-damage response and cell cycle progression: not so NuRDy now. Biochem Soc Trans 41(Pt 3): 777–782. Ogryzko V.V., Kotani T., Zhang X., Schiltz R.L., Howard T., Yang X.J., Howard B.H., Qin J., Nakatani Y. 1998. Histone-like TAFs within the PCAF histone acetylase complex. Cell. 94(1):35-44. Ohki I., Shimotake N., Fujita N., Jee J.G., Ikegame T., Nakao M., Shirakawa M. 2001. Solution Structure of the Methyl-CpG Binding Domain of Human MBD1 in Complex with Methylated DNA. Cell, 105(4): 487-497. Olias P., Etheridge R.D., Zhang Y., Holtzman M.J., Sibley L.D. 2016. Toxoplasma Effector Recruits the Mi-2/NuRD Complex to Repress STAT1 Transcription and Block IFN-γ- Dependent Gene Expression. Cell Host Microbel: 20(1):72-82. Olins A.L. & Olins D.E. (1974). Spheroid chromatin units (v bodies). Science, 183(4122): 330-2. Omichinsky J.G., Clore G.M., Schaad O., Felsenfeld G., Trainor C., Apella E., Stahl S.J., Gronenborn A.M. 1993. NMR structure of a specific DNA complex of Zn-containing DNA binding domain of GATA-1. Science, 261(5120): 438-446. Ordentlich P., Downes M., Xie W., Genin A., Spinner N.B., Evans R.M. 1999. Unique forms of human and mouse nuclear receptor corepressor SMRT. Proc Natl Acad Sci USA 96:2639–2644. References 131

Ori A., Banterle N., Iskar M., Andrés-Pons A., Escher C., Khanh Bui H., Sparks L., Solis-Mezarino V., Rinner O., Bork P., Lemke E.A., Beck M.. 2013. Cell type-specific nuclear pores: a case in point for context-dependent stoichiometry of molecular machines. Mol. Syst. Biol., 9: 648-659. Palmer D.K., O’Day K., Margolis R.L. 1990. The centromere specific histone CENP-A is selectively retained in discrete foci in mammalian sperm nuclei. Chromosoma. 11(1): 32-36. Pamblanco M., Poveda A., Sendra R., Rodríguez-Navarro S., Pérez-Ortín J.E., Tordera V. 2001. Bromodomain factor 1 (Bdf1) protein interacts with histones. FEBS Lett. 496:31–35. Park S.S., Skaar D.A., Jirtle R.L., Hoyo C.. 2017. Epigenetics, obesity and early-life cadmium or lead exposure. Epigenomics. 9(1):57-75. Patel J., Pathak R.R., Mujtaba S.. 2001. The biology of lysine acetylation integrates transcriptional programming and metabolism. Nutr Metab (Lond). 8:12. Paulson J.R. and Taylor S.S. 1982. Phosphorylation of histones 1 and 3 and non histone high mobility group 14 by an endogenous kinase in HeLa metaphase chromosomes. J. Biol. Chem. 257: 6064 – 6072. Pearls C.J., Couto C.A.M., Wang H.Y., Borer C., Kiely R., Lakin N.D. 2012. The role of ADP- ribosylation in regulating DNA double-strand break repair. Cell Cycle, 11(1): 48-56. Pegoraro G., Kubben N., Wickert U., Göhler H., Hoffmann K., Misteli T.. 2009. Ageing-related chromatin defects through loss of the NURD complex.Ageing-related chromatin defects through loss of the NURD complex. Nat Cell Biol. 11(10):1261-7. Pettersen E.F., Goddard T.D., Huang C.C., Couch G.S., Greenblatt D.M., Meng E.C., Ferrin T.E.. 2004. UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem. 25(13):1605-12. Pickles L.M., Roe S.M., Hemingway E.J., Stifani S., Pearl L.H. 2002. Crystal Structure of the C- Terminal Wd40 Repeat Domain of the Human Groucho/Tle1 Transcriptional Corepressor. Structure, 10: 751. Pollex T, & Heard E. 2012. Recent advances in X-chromosome inactivation research. Curr Opin Cell Biol. 24:825–32. Polo S.E., Kaidi A., Baskcomb L., Galanty Y., Jackson S.P. 2010. Regulation of DNA-damage responses and cell-cycle progression by the chromatin remodelling factor CHD4. EMBO J. 29(18):3130-9. Puckett R.E. & Lubin F.D. 2011. Epigenetic mechanisms in experience-driven memory formation and behavior. Epigenomics. 3(5): 649–664. Punjani A., Rubinstein J’L’, Fleet D.J., Brubaker M.A.. 2017. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods. 14(3):290-296. 132 References

Qi W., Chen H., Xiao T., Wang R., Li T., Han L., Zeng X. 2016. Acetyltransferase p300 collaborates with chromodomain helicase DNA-binding protein 4 (CHD4) to facilitate DNA double-strand break repair. Mutagenesis. 31(2):193-203. Qian Y.W., Wang Y.C., Hollingsworth R.E.Jr., Jones D., Ling N., Lee E.Y. 1993. A retinoblastoma- binding protein related to a negative regulator of Ras in yeast. Nature 364:648–652. Quénet D., El Ramy R., Schreiber V., Dantzer F.. 2009. The role of poly(ADP-ribosyl)ation in epigenetic events. Int J Biochem Cell Biol. 41(1):60-5. Radermacher M. 1988. Three-dimensional reconstruction of single particles from random and non random tilt series. J. Electron Microsc. Tech., 9(4):359-394. Rais Y., Zviran A., Geula S., Gafni O., Chomsky E., Viukov S., Mansour A.A., Caspi I., Krupalnik V., Zerbib M., Maza I., Mor N., Baran D. warnering the MPS’s cave, Weinberger L., Jaitin D.A., Lara-Astiaso D., Blecher-Gonon, R. Shipony Z., Mukamel Z., Hagai T., Gilad S, Amann-Zalcensterin D., Tanay A., Amit I., Novershtern N 2013. Deterministic direct reprogramming of somatic cells to pluripotency. Nature, 502(7469):65–70. Ramachandran G.N. & Sasisekharan V. 1968. Conformation of polypeptides and proteins. Adv. Protein Chem. 23: 283–438. Ramírez J., Dege C., Kutateladze T.G., Hagman J.. 2012. MBD2 and multiple domains of CHD4 are required for transcriptional repression by Mi-2/NuRD complexes. Mol Cell Biol. 32(24):5078-88. Rayman J.B. & Kandel E.R. 2017. Functional Prions in the Brain. Cold Spring Harb Perspect Biol. 9(1). Razin A. & Riggs A.D., 1980. DNA methylation and gene function. Science, 210(4470): 604-10. Rea S., Eisenhaber F., O'Carroll D., Strahl B.D., Sun Z.W., Schmid M., Opravil S., Mechtler K., Ponting C.P., Allis C.D., Jenuwein T. 2000. Regulation of chromatin structure by site- specific histone H3 methyltransferases. Nature 406: 593 – 599. Reddy B.A., Bajpe P.K., Bassett A., Moshkin Y.M., Kozhevnikova E., Bezstarosti K., Demmers J.A.A., Travers A.A., Verrijzer C.P. 2010. Drosophila Transcription Factor Tramtrack69 Binds MEP1 To Recruit the Chromatin Remodeler NuRD. Mol Cell Biol. 30(21): 5234–5244. Reichert N., Choukrallah M.A., Matthias P.. 2012. Multiple roles of class I HDACs in proliferation, differentiation, and development. Cell Mol Life Sci. 69(13):2173-2187. Ren R., Mayer B.J., Cicchetti P., Baltimore D. 1993. Identification of a ten-amino acid proline-rich SH3 binding site. Science, 259(5098): 1157-1161. Reyes-Turcu F.E. & Grewal S.I. 2012. Different means, same end-heterochromatin formation by RNAi and RNAi-independent RNA processing factors in fission yeast. Curr Opin Genet Dev. 22(2):156-63. References 133

Reynolds N., Latos P., Hynes-Allen A., Loos R., Leaford D., O’Shaughnessy A., Mosaku O., Signolet J., Brennecke P., Kalkan T., Costello I., Humphreys P., Mansfield W., Nakagawa K., Stroubolules J., Behrens A., Bertone P., Hendrich B. 2012. NuRD Suppresses Pluripotency Gene Expression to Promote Transcriptional Heterogeneity and Lineage Commitment. Cell Stem Cell. 10(5): 583–594. Robinson P.J., Trnka M.J., Bushnell D.A., Davis R.E., Mattei P., Burlingame A.L., Kornberg R.D. 2016. Structure of a Complete Mediator-RNA Polymerase II Pre-Initiation Complex. Cell, 166(6): 1411-1422. Rohou A. & Grigorieff N. 2015. CTFFIND4: Fast and accurate defocus estimation from electron micrographs. J Struct Biol. 192:216–221. Ryan D.P. & Owen-Hughes T. 2011. Snf2-family proteins: chromatin remodellers for ny occasion. Curr Opin Chem Biol. 15(5): 649-656. Sainsbury S., Bernecky C., Cramer P. 2015. Structural basis of transcription initiation by RNA polymerase II. Nat Rev Mol Cell Biol. 16(3):129-43. Saito M. & Ishikawa F. 2002. The mCpG-binding domain of human MBD3 does not bind to mCpG but interacts with NuRD/Mi2 components HDAC1 and MTA2. J. Biol. Chem. 277: 35434–35439. Santos F., Hendrich B., Reik W., Dean W. 2002. Dynamic Reprogramming of DNA Methylation in the Early Mouse Embryo. Developmental Biology, 241(1): 172-182. Santosh Kumar H.S., Kumar V., Pattar S., Telkar S. 2016. Towards the construction of an interactome for Human WD40 protein family. Bioinformation. 12(2): 54–61. Sassone-corsi P., Mizzen C.A., Cheung P., Crosio C., Monaco L., Jacquot S., Hanauer A., Allis C.D. 1999. Requirement of Rsk-2 for rpidermal growth factor-activaated phosphorylation of histone H3. Science, 285(5429): 886-891. Savino A.M., Sarno J., Trentin L., Vieri M., Fazio G., Bardini M., Bugarin C., Fossati G., Davis K.L., Gaipa G., Izraeli S., Meyer L.H., Nolan G.P., Biondi A., Te Kronnie G., Palmi C., Cazzaniga G.. 2017. The histone deacetylase inhibitor givinostat (ITF2357) exhibits potent anti-tumor activity against CRLF2-rearranged BCP-ALL. Leukemia. (11):2365-2375. Scheres S.H.W. 2012. Relion: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180(3): 519-30. Sen P., Shah P.P., Nativio R., Berger S.L. 2016. Epigenetic Mechanisms of Longevity and Aging. Cell. 166(4):822-839. Sengupta N. & Seto E. 2004. Regulation of histone deacetylase activities. J Cell Biochem. 93(1):57-67. Schweet R. & Heintz R. 1966. Protein Synthesis. Annual Review of Biochemistry. 35: 723–758. 134 References

Schaft D., Roguev A., Kotovic K.M., Shevchenko A., Sarov M., Neugebauer K.M., and Stewart A.F. 2003. The histone 3 lysine 36 methyltransferase, SET2, is involved in transcriptional elongation. Nucleic Acids Res. 31: 2475–2482. Schmidberger J.W., Sharifi Tabar M., Torrado M., Silva A.P., Landsberg M.J., Brillault L., AlQarni S., Zeng Y.C., Parker B.L., Low J.K., Mackay J.P.. 2016.The MTA1 subunit of the nucleosome remodeling and deacetylase complex can recruit two copies of RBBP4/7. Protein Sci. 25(8):1472-82. Sedat J. & Manuelidism L. 1978. A direct approach to the structure of eukaryotic chromosomes. Cold Spring Harb Symp Quant Biol. 42 Pt 1: 331-50. Shi Y., Lan F., Matson C., Mulligan P., Whetstine J.R., Cole P.A., Casero R.A., Shi Y. 2004. Histone demethylation mediated by the nuclear amine oxidase homolog LSD1. Cell. 119(7):941-53. Shiio Y. & Eisenman R.N. 2003. Histone sumoylation is associated with transcriptional repression. Proc Natl Acad Sci U S A. 100(23):13225-30. Shilatifard, A. 2006. Chromatin modifications by methylation and ubiquitination: implications in the regulation of gene expression. Annu. Rev. Biochem. 75: 243–269. Shimbo T., Du Y., Grimm S.A., Dhasarathy A., Mav D., Shah R.R., Shi H., Wade P.A. 2013. MBD3 localizes at promoters, gene bodies and enhancers of active genes. PLoS Genet 9:e1004028. Shimono Y., Murakami H., Kawai K., Wade P.A., Shimokata K., Takahashi M. 2003. Mi-2 beta associates with BRG1 and RET finger protein at the distinct regions with transcriptional activating and repressing abilities. J Biol Chem 278: 51638–51645. Sillibourne J.E., Delaval B., Redick S., Sinha M., Doxsey S.J. 2007. Chromatin Remodeling Proteins Interact with Pericentrin to Regulate Centrosome Integrity. Molecular Biology of the Cell, 18(9): 3225-3709. Silva A.P., Ryan D.P., Galanty Y., Low J.K., Vandevenne M., Jackson S.P., Mackay J.P. 2016. The N-terminal Region of Chromodomain Helicase DNA-binding Protein 4 (CHD4) Is Essential for Activity and Contains a High Mobility Group (HMG) Box-like-domain That Can Bind Poly(ADP-ribose). J.Biol.Chem. 291: 924-938. Simpson V.J,. Johnson T.E., Hammen R.F. 1986. Caenorhabditis elegans DNA does not contain 5- methylcytosine at any time during development or aging.Nucleic Acids Res, 14(16):6711-9. Smeenk G., Wiegant W.W., Vrolijk H., Solari A.P., Pastink A., van Attikum H.. 2010.The NuRD chromatin-remodeling complex regulates signaling and repair of DNA damage. J Cell Biol. 190(5):741-9. Smith T.F., Gaitatzes C., Saxena K., Neer E.J. 1999. The WD repeat: a common architecture for diverse functions. Trends Biochem Sci. 24(5):181-5. References 135

Smits A.H. Jansen P.W.T.C., Poser I., Hyman A.A., Vermeulen M. 2013) Stoichiometry of chromatin-associated protein complexes revealed by label-free quantitative mass spectrometry-based proteomics. Nucl. Acid Res., 41(1): e28. Smolle M. & Workman J.L. 2013. Transcription-associated histone modifications and cryptic transcription. Biochim Biophys Acta. 1829(1):84-97. Sorensen A.B., Søndergaard M.T., Overgaard M.T.. 2013. Calmodulin in a heartbeat. FEBS J., 280(21):5511-5532. Sperlazza J., Rahmani M., Beckta J., Aust M., Hawkins E., Wang S.Z., Zu Zhu S., Podder S., Dumur C., Archer K., Grant S., Ginder G.D.. 2015. Depletion of the chromatin remodeler CHD4 sensitizes AML blasts to genotoxic agents and reduces tumor formation. Blood. 126(12):1462-72. Spruijt C.G., Luijsterburg M.S., Menafra R., Lindeboom R.G., Jansen P.W., Edupuganti R.R., Baltissen M.P., Wiegant W.W., Voelker-Albert M.C., Matarese F., Mensinga A., Poser I., Vos H.R., Stunnenberg H.G., van Attikum H., Vermeulen M.. 2016. ZMYND8 Co- localizes with NuRD on Target Genes and Regulates Poly(ADP-Ribose)-Dependent Recruitment of GATAD2A/NuRD to Sites of DNA Damage. Cell Rep. 17(3):783-798. Stewart D.E., Sarkar A., Wampler J.E. 1990. Occurrence and role of cis peptide bonds in protein structures. J. Mol. Biol. 214: 253–260. Strahl B.D. & Allis C.D. 2000. The language of covalent histone modifications. Nature. 403(6765): 41-5. Sugiyama T., Cam H.P., Sugiyama R., Noma K., Zofall M., Kobayashi R., Grewal S.I.S. 2007. SHREC, an Effector Complex for Heterochromatic Transcriptional Silencing. Cell, 128(3): 491-504. Sun F.L., Cuaycong M.H., Elgin S.C. 2001. Long-range nucleosome ordering is associated with gene silencing in Drosophila melanogaster pericentric heterochromatin. Mol Cell Biol. 21(8):2867-79. Sun Y., Yang Y., Shen H., Huang M., Wang Z., Liu Y., Zhang H., Tang T.S., Guo C. 2016. iTRAQ- based chromatin proteomic screen reveals CHD4-dependent recruitment of MBD2 to sites of DNA damage. Biochem Biophys Res Commun. 471(1):142-8. Sundaramoorthy R. 2019. Nucleosome remodelling: structural insights into ATP-dependent remodelling enzymes Essays in Biochemistry 63: 45–58 Tamaru H., Zhang X., McMillen D., Singh P.B., Nakayama J., Grewal S.I., Allis C.D., Cheng X., Selker E.U. 2003. trimethylated lysine 9 of histone H3 is a mark for DNA methylation in Neurospora crassa. Nat Genet., 34(1):75-9. Tang G., Peng L., Baldwin P.R., Mann D.S., Jiang W., Rees I., Ludtke S.J.. 2007. EMAN2: an extensible image processing suite for electron microscopy. J Struct Biol. 157(1):38-46. 136 References

Tao C.C., Hsu W.L., Ma Y.L, Cheng S.J., Lee E.H.. 2017. Epigenetic regulation of HDAC1 SUMOylation as an endogenous neuroprotection against Aβ toxicity in a mouse model of Alzheimer's disease. Cell Death Differ. 24(4):597-614. Taunton J., Hassing C.A., Schereiber S.L. 1996. A mammalian histone deacetyolase related to the yeat transcripotional regulat Rpd3p. Science 272(5260): 408-11. Thomas J.O. & Kornberg R.D. 1975 An octamer of histones in chromatin and free in solution. Proc Natl Acad Sci U S A. 72(7):2626-30. Tim-Aroon T., Jinawath N., Thammachote W., Sinpitak P., Limrungsikul A., Khongkhatithum C. Wattanasirichaigoon D.. 2017. 1q21.3 deletion involving GATAD2B: An emerging recurrent microdeletion syndrome. Am J Med Genet A. 173(3):766-770. Todd R., McBride J., Tsuji T., Donoff R.B., Nagai M., Chou M.Y., Chiang T., Wong D.T.. 1995. Deleted in oral cancer-1 (doc-1), a novel oral tumor suppressor gene. FASEB J. 9(13):1362-70. Toh Y., Pencil S.D., Nicolson G.L. 1994. A novel candidate metas-tasis-associated gene, mta1, differentially expressed in highly metastatic mammary adenocarcinoma cell lines. cDNA cloning, expression, and protein analyses. J Biol Chem 269:22958–22963. Tomita Y., Lee M.J., Lee S., Tomita S., Chumsri S., Cruickshank S., Ordentlich P., Trepel J.B.. 2016. The interplay of epigenetic therapy and immunity in locally recurrent or metastatic estrogen receptor-positive breast cancer: Correlative analysis of ENCORE 301, a randomized, placebo-controlled phase II trial of exemestane with or without entinostat. Oncoimmunology. 5(11):e1219008. Tong J. K., Hassig C.A., Schnitzler G.R., Kingston R.E., Schreiber S.L. 1998. Chromatin deacetylation by an ATP-dependent nucleosome remodelling complex. Nature, 395: 917–921. Torchy M.P., Hamiche A., Flaholz B.P. 2015. Structure and function insights into the NuRD chromatin remodeling complex. Cell Mol Life Sci. 72(13): 2491-2507. Torrado M., Low J.K.K., Silva A.P.G., Schmidberger J.W., Sana M., Sharifi Tabar M., Isilak M.E.1, Winning C.S., Kwong C., Bedward M.J., Sperlazza M.J., Williams D.C. Jr, Shepherd N.E., Mackay J.P.. 2017. Refinement of the subunit interaction network within the nucleosome remodelling and deacetylase (NuRD) complex. FEBS J. 284(24):4216- 4232. Tsukada .Y, Fang J., Erdjument-Bromage H., Warren M.E., Borchers C.H., Tempst P., Zhang Y. 2006. Histone demethylation by a family of JmjC domain-containing proteins. Nature. 439(7078):811-6. Turner B.M. 2000. Histone acetylation and an epigenetic code. Bioessays. 22(9): 836-45. Turner B.M. 2007 Defining an epigenetic code. Nat Cell Biol. 9(1): 2-6 References 137

Tsuji T., Duh F.M., Latif F., Popescu N.C., Zimonjic D.B., McBride J., Matsuo K., Ohyama H., Todd R., Nagata E., Terakado N. Sasaki, A., Matsumura T., I. Lerman M., Wong D.T. 1998. Cloning, mapping, expression, function, and mutation analyses of the human ortholog of thehamster putative tumor suppressor gene Doc-1. J. Biol. Chem. 273:6704–6709. Tsukiyama T., Palmer J., Landel C.C., Shiloach J., Wu C. 1999. Characterization of the imitation switch subfamily of ATP-dependent chromatin-remodeling factors in Saccharomyces cerevisiae. Genes Dev. 13(6):686-97. Urquhart A.J., Gatei M., Richard D.J., Khanna K.K.. 2011. ATM mediated phosphorylation of CHD4 contributes to genome maintenance. Genome Integr. 2(1):1. Van Heel M., Keegstra W. 1981. IMAGIC: A fast, flexible and friendly image analysis software system. Ultramicroscopy, 7: 113-130. van Holde K. & Zlatanova J. 2007. Chromatin fiber structure: were is the problem now?. Semin Cell Dev Biol. 18(5): 651-8. van Nocker S. & Ludwig P. 2003. The WD-repeat protein superfamily in Arabidopsis: conservation and divergence in structure and function. BMC Genomics. 2003; 4: 50. van Leeuwen F., Gafken P.R., Gottschling D.E. 2002. Dot1p modulates silencing in yeast by methylation of the nucleosome core. Cell. 109(6):745-56. Vanyushin B.F. 2006. DNA methylation in plants. Curr Top Microbiol Immunol, 301:67-122. Värv S., Kristjuhan K., Lõoke M., Mahlakõiv T., Paapsi K., Kristjuhan A. 2010. Acetylation of H3 K56 Is Required for RNA Polymerase II Transcript Elongation through Heterochromatin in Yeast. Mol Cell Biol. 30(6): 1467-1477. Vásquez-Doorman C. & Petersen C.P. 2016. The NuRD complex component p66 suppresses photoreceptor neuron regeneration in planarians. Regeneration (Oxf). 3(3):168-78. Venkatesh S. & Workman J.L. 2015. Histone exchange, chromatin structure and the regulation of transcription. Nat Rev Mol Cell Biol. 16(3): 178-89. Verreault A., Kaufman P.D., Kobayashi R., Stillman B. 1996. Nucleosome assembly by a complex of CAF-1 and acetylated histones H3/H4. Cell, 87(1): 95-104. Vettese-Dadey M., Grant P.A., Hebbes T.R., Crane- Robinson C., Allis C.D., Workman J.L. 1996. Acetylation of histone H4 plays a primary role in enhancing transcription factor binding to nucleosomal DNA in vitro. EMBO J. 15(10):2508-18. Vidal M. & Gaber R.F. 1991. RPD3 encodes a second factor required to achieve maximum positive and negative transcriptional states in Saccharomyces cerevisiae. Mol Cell Biol. 11(12):6317-27. Vignali M., Hassan A.H., Neely K.E., Workman J.L. 2000. ATP-dependent chromatin remodeling complexes. Mol Cell Biol. 20(6): 1899-1910. 138 References

Wade P.A., Jones P.L., Vermaak D., Wolffe A.P. 1998. A multiple subunit Mi-2 histone deacetylase from Xenopus laevis cofractionates with an associated Snf2 superfamily ATPase. Curr Biol, 8: 843–846. Watt F. & Molloy P.L. 1988. Cytosine methylation prevents binding of DNA of a HeLa cell tanscrption factor required for optimal expression of the adenovirus late promoter. Genes Dev, 2(9):1136-43. Waddington C. H. 1942. The epigenotype. Endeavour 1, 18–20. Walavalkar N.M., Gordon N., Williams D.C.Jr. 2013. Unique Features of the Anti-parallel, Heterodimeric Coiled-coil Interaction between Methyl-cytosine Binding Domain 2 (MBD2) Homologues and GATA Zinc Finger Domain Containing 2A (GATAD2A/p66α) J Biol Chem. 288(5): 3419-3427. Waldron L., Steimle J.D., Greco T.M., Gomez N.C., Dorr K.M., Kweon J., Temple B., Yang X.H., Wilcewski C.M., Davis I.J., Cristea I.M., Moskowitz I.P., Conlon F.L. 2016. The Cardiac TBX5 Interactome Reveals a Chromatin Remodeling Network Essential for Cardiac Septation. Dev. Cell. 2016; 36 (3): 262-275. Walsh C.P., Chaillet J.R., Bestor. 1998. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat Genet., 20(2):116-7. Wang H., Zhai L., Xu J., Joo H.Y., Jackson S., Erdjument-Bromage H., Tempst P., Xiong Y., Zhang Y. 2006. Histone H3 and H4 ubiquitylation by the CUL4-DDB-ROC1 ubiquitin ligase facilitates cellular response to DNA damage. Mol. Cell.22: 383–394. Wang L., Zhang J., Duan J., Gao X., Zhu W., Lu X., Yang L., Zhang J., Li G., Ci W., Li W., Zhou Q., Aluru N., Tang F., He C., Huang X., Liu J. 2014. Programming and inheritance of parental DNA methylomes in mammals. Cell, 157(4):979-991. Wang X., Moore S.C., Laszckzak M., Ausió J. 2000. Acetylation increases the alpha-helical content of the histone tails of the nucleosome. J Biol Chem., 275(45):35013-20. Wang Y., Wysocka J., Sayegh J., Lee Y.H., Perlin J.R., Leonelli L., Sonbuchner L.S., McDonald C.H., Cook R.G., Dou Y., Roeder R.G., Clarke S., Stallcup M.R., Allis C.D., Coonrod S.A. 2004. Human PAD4 regulates histone arginine methylation levels via demethylimination. Science. 306: 279–283. Wang Z., Zang C., Cui K., Schones D.E., Barski A., Peng W., Zhao K. 2009. Genome-wide mapping of HATs and HDACs reveals distinct functions in active and inactive genes. Cell. 138(5):1019-31. Wang Z., Kang J., Deng X., Guo B., Wu B., Fan Y.. 2017. Knockdown of GATAD2A suppresses cell proliferation in thyroid cancer in vitro. Oncol Rep. 37(4):2147-2152. Watson A.A., Mahajan P., Mertens H.D., Deery M.J., Zhang W., Pham P., Du X., Bartke T., Zhang W., Edlich C., Berridge G., Chen Y., Burgess-Brown N.A., Kouzarides T., Wiechens N., Owen-Hughes T., Svergun D.I., Gileadi O., Laue E.D.. 2012. The PHD and References 139

chromo domains regulate the ATPase activity of the human chromatin remodeler CHD4. J Mol Biol. 422(1):3-17. Weaver J.R. & Bartolomei M.S. 2014. Chromatin regulators of genomic imprinting. Biochim Biophys Acta1839(3): 168-77. Wei Y., Yu L., Bowen J., Gorovsky M.A., Allis C.D. 1999. Phosphorylation of histone H3 is required for proper chromosome condensation and segregation. Cell 97: 99-109. Weiss K., Terhal P.A., Cohen L., Bruccoleri M., Irving M., Martinez A.F., Rosenfeld J.A, Machol

K., Yang Y., Liu P., Walkiewicz M., Beuten J., Gomez-Ospina N., Haude K., Fong C.T., Enns G.M., Bernstein J.A., Fan J., Gotway G., Ghorbani M.; DDD Study, van Gassen K., Monroe G.R., van Haaften G., Basel-Vanagaite L., Yang X.J., Campeau P.M., Muenke M.. 2016 . De Novo Mutations in CHD4, an ATP-Dependent Chromatin Remodeler Gene, Cause an Intellectual Disability Syndrome with Distinctive Dysmorphisms. Am J Hum Genet. 99(4):934-941. Weiss S. B., Nakamoto T. 1961. Net synthesis of ribonucleic acid with a microbial enzyme requiring deoxyribonucleic acid and four ribonucleoside triphosphates. J. Biol. Chem. 236, PC18-PC20. Willemsen M.H., Nijhof B., Fenckova M., Nillesen W.M., Bongers E.M., Castells-Nobau A., Asztalos L., Viragh E., van Bon B.W., Tezel E., Veltman J.A., Brunner H.G., de Vries B.B., de Ligt J., Yntema H.G., van Bokhoven H., Isidor B., Le Caignec C., Lorino E., Asztalos Z., Koolen D.A., Vissers L.E., Schenck A., Kleefstra T. 2013. GATAD2B loss-of-function mutations cause a recognisable syndrome with intellectual disability and are associated with learning deficits and synaptic undergrowth in Drosophila. J Med Genet. 50(8): 507-14. Witt A.E., Lee C.W., Lee T.I., Azzam D.J., Wang B., Caslini C., Petrocca F., Grosso J., Jones M., Cohick E.B, Gropper A.B, Wahlestedt C., Richardson A.L., Shiekhattar R., Young R.A., Ince T.A.. 2017. Identification of a -specific function for the histone deacetylases, HDAC1 and HDAC7, in breast and ovarian cancer. Oncogene. 36(12):1707-1720. Whetstine J.R., Nottke A., Lan F., Huarte M., Smolikov S., Chen Z., Spooner E., Li E., Zhang G., Colaiacovo M., Shi Y. 2006. Reversal of histone lysine trimethylation by the JMJD2 family of histone demethylases. Cell, 125(3):467-81. Wiggs K.R., Chruszcz M., Su X., Minor W., Khorasanizadeh S. Structure of CHD4 double chromodomains depicts cooperative folding for DNA binding TO BE PUBLISHED, PDB ID: 4O9I. Wilting R.H., Yanover E., Heideman M.R., Jacobs H., Horner J., van der Torre J., DePinho R.A., Dannenberg J.H. 2010. Overlapping functions of Hdac1 and Hdac2 in cell cycle regulation and haematopoiesis. EMBO J. 29(15):2586-97. 140 References

Woodage T., Basrai M.A., Baxevanis A.D., Hieter P., Collins F.S. 1997. Characterization of the CHD family of proteins. Proc Natl Acad Sci USA 94:11472–11477 Woodcock C.L., Sweetman H.E., Frado L.L. 1976 Structural repeating units in chromatin. II. Their isolation and partial characterization. Exp Cell Res. 97: 111-9. Wu G., Xu G., Schulman B.A., Jeffrey P.D., Harper J.W., Pavletich N.P. 2003. Structure of a beta- TrCP1-Skp1-beta-Catenin complex: destruction motif binding and lysine specificity of the SCFbeta-TrCP1 ubiquitin ligase. Mol.Cell, 11: 1445-1456 Wu J., Xu W. 2012. Histone H3R17me2a mark recruits human RNA polymerase-associated factor 1 complex to activate transcription. Proc. Natl. Acad. Sci. USA. 109:5675–5680. Wu T.P., Wang T., Seeting M.G., Lai Y., Zhu S., Lin K., Liu Y., Byrum S.D., Mackintosh S.G., Zhong M., Tackett A., Wang G., Hon L.S., Fang G., Swenberg J.A., Xiao A.Z. 2016.

DNA methylation on N6-adenine in mammalian embryonic stem cells. Nature, 532: 329-333. Wutz A. 2013. Epigenetic regulation of stem cells : the role of chromatin in cell differentiation. Adv Exp Med Biol. 786:307-28. Wysocka J., Swigut T., Xiao, H., Milne T.A., Kwon S.Y., Landry J., Kauer M., Tackett A.J., Chait B.T., Badenhorst P., Wu C., Allis C.D. 2006. A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling. Nature.; 442: 86–90. Xiao H., Sandaltzopoulos R., Wang H.M., Hamiche A., Ranallo R., Lee K.M., Fu D., Wu C. 2001. Dual functions of largest NURF subunit NURF301 in nucleosome sliding and transcription factor interactions. Mol Cell. 8(3):531-43. Xin J., Zhang Z., Su X., Wang L., Zhang Y., Yang R. 2017. Epigenetic Component p66a Modulates Myeloid-Derived Suppressor Cells by Modifying STAT3. J Immunol. 198(7):2712- 2720. Xu Y., Wang J., Fu S., Wang Z.. 2014. Knockdown of CDK2AP1 by RNA interference inhibits cell growth and tumorigenesis of human glioma. Neurol Res. 36(7):659-65. Xue Y., Wong J., Moreno G.T., Young M.K., Côté J., Wang W. 1998. NURD, a novel complex with both ATP-dependent chromatin-remodeling and histone deacetylase activities. Mol Cell, 2: 851–861. Yamaguchi T., Cubizolles F., Zhang Y., Reichert N., Kohler H., Seiser C., Matthias P. 2010. Histone deacetylases 1 and 2 act in concert to promote the G1-to-S progression. Genes Dev. 24(5):455-69. Yang D. & Aira G. 2011. Structure and binding of the H4 histone tail and the effects of lysine 16 acetylation. Phys Chem Chem Phys. 13(7):2911-21. Yang X.J. & Seto E. 2008. The Rpd3/Hda1 family of lysine deacetylases: from bacteria and yeast to mice and men. Nat Rev Mol Cell Biol. 9(3): 206–218. References 141

Yildirim O., Li R., Hung J.H., Chen P.B., Dong X., Ee L.S., Weng Z., Rando O.J., Fazzio T.G. 2011. Mbd3/NURD complex regulates expression of 5-hydroxymethylcytosine marked genes in embryonic stem cells. Cell. 147(7):1498-510. Yokoyama H., Nakos K., Santarella-Mellwig R., Rybina S., Krijgsveld J., Koffa M.D., Mattaj I.W. 2013. CHD4 Is a RanGTP-Dependent MAP that Stabilizes Microtubules and Regulates Bipolar Spindle Formation. Cell, 23(24): 2443-2451. Yoshikawa K. & Yoshikawa Y. 2002. Compaction and condensation of DNA. Pharmaceutical Perspectives of Nucleic Acid-Based Therapy, chapter 8. You A., Tong J.K., Grozinger C.M., Schreiber S.L. 2001. CoREST is an integral component of the CoREST- human histone deacetylase complex. Proc Natl Acad Sci USA 98:1454– 1458. Zakrzewski F., Schmidt M., Van Lijsebettens M., Schmidt T.DNA methylation of retrotransposons, DNA transposons and genes in sugar beet (Beta vulgaris L.).Plant J., 90(6):1156-1175. Zhang H., Stephens L.C., Kumar R. 2006. Metastasis tumor antigen family proteins during breast cancer progression and metastasis in a reliable mouse model for human breast cancer. Clin. Cancer Res. 12: 1479–1486. Zhang W., Aubert A., Gomez de segura J.M., Karuppasamy M., Basu S., Murthy A.S., Diamante A., Drury T.A., Balmer J., Cramard J., Watson A.A., Lando D., Lee S.F., Palyret M., Kloet S.L., Smits A.H., Deery M.J., Vermeulen M., Hendrich B., Klenerman D., Schaffitzel C., Berger I., Laue E.D. 2016. The Nucleosome Remodeling and Deacetylase Complex NuRD Is Built from Preformed Catalytically Active Sub- modules. J Mol Biol. 428(14): 2931–2942. Zhang Y., LeRoy G., Seelig H.P., Lane W.S., Reinberg D. 1998. The dermatomyositis-specific autoantigen Mi2 is a component of a complex containing histone deacetylase and nucleosome remodeling activities. Cell, 95: 279–289. Zhang Y., Iratni R., Erdjument-Bromage H., Tempst P., Reinberg D. 1997. Histone Deacetylases and SAP18, a Novel Polypeptide, Are Components of a Human Sin3 Complex. Cell, 89(3): 357-364. Zhang Y., Sun Z.W., Iratni R., Erdjument-Bromage H., Tempst P., Hampsey M., Reinberg D. 1998. SAP30, a novel protein conserved between human and yeast, is a component of a histone deacetylase complex. Mol Cell 1:1021–1031. Zhang Y. & Li Y. 2010. The Expanding Mi-2/NuRD Complexes: A Schematic Glance. Proteomics Insights 3: 79–109. Zheng J., Xue H., Wang T., Jiang Y., Liu B., Li J., Liu Y., Wang W., Zhang B., Sun M. 2011. miR- 21 downregulates the tumor suppressor P12 CDK2AP1 and stimulates cell proliferation and invasion. J Cell Biochem. 112(3):872-80. 142 References

Zhao H., Han Z., Liu X., Gu J., Tang F., Wei G., Jin Y. 2017. The chromatin remodeler protein Chd4 maintains embryonic stem cell identity by controlling pluripotency- and differentiation- associated genes. J Biol Chem. 292: 8507-8519. Zhu B., Zheng Y., Pham A.D., Mandal S.S., Erdjument-Bromage H., Tempst P., Reinberg, D. 2005. Monoubiquitination of human histone H2B: the factors involved and their roles in HOX gene regulation. Mol. Cell. 20: 601–611. Zhu S., He C., Deng S., Li X., Cui S., Zeng Z., Liu M., Zhao S., Chen J., Jin Y., Chen H., Deng S., Liu Y., Wang C., Zhao G.. 2016. MiR-548an, Transcriptionally Downregulated by HIF1α/HDAC1, Suppresses Tumorigenesis of Pancreatic Cancer by Targeting Vimentin Expression. Mol Cancer Ther. 15(9):2209-19. Zimmerman S.M. & Kim S.K. 2014. The GATA transcription factor/MTA-1 homolog egr-1 promotes longevity and stress resistance in Caenorhabditis elegans. Aging Cell.13(2):329-39. Zivanov J., Nakane T., Forsberg B.O., Kimanius D., Hagen W.J. Lindahl E. Scheres SH. 2018. new tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife. 9;7. Zolochevska O. & Figueiredo M.L. 2010. Novel tumor growth inhibition mechanism by cell cycle regulator cdk2ap1 involves antiangiogenesis modulation. Microvasc Res. 80(3):324- 31. Zupkovitz G., Tischler J., Posch M., Sadzak I., Ramsauer K., Egger G., Grausenburger R., Schweifer N., Chiocca S., Decker T., Seiser C. 2006. Negative and Positive Regulation of Gene Expression by Mouse Histone Deacetylase 1. Mol Cell Biol. 26(21): 7913-28. Zupkovitz G., Grausenburger R., Brunmeir R., Senese S., Tischler J., Jurkin J., Rembold M., Meunier D., Egger G., Lagger S., Chiocca S., Propst F., Weitzer G., Seiser C. 2010. The cyclin-dependent kinase inhibitor p21 is a crucial target for histone deacetylase 1 as a regulator of cellular proliferation. Mol Cell Biol 30: 1171–1181. References 143

Acknowledgements

I would like to thank all the people whose work and support made possible this PhD work.

First, my supervisor Prof. Christiane Schaffitzel, for giving me the opportunity of participating in this project, as well as her guidance during these years.

I would like to thank my thesis jury members and reviewers: Prof. Andrea Dessen, Dr. Juan Reguera, Prof. Helen Saibil, Prof. Patrick Schultz and Prof. Frank Sobott, for offering their expertise and their time to evaluate my work.

Likely, I would like to thank my thesis advisory committee: Prof. Andrea Dessen, Prof. Andrew Mc Carthy and Prof. Martin Beck for their participation in the annual reports and their useful advice.

I would like to thank our collaborators Ernest Laue and Michiel Vermeulen and their groups, particularly Wei Zhang and Susan Kloet for their direct implication in the project, protocols, samples and constructs. Their expertise in different fields allowed us to obtain the results we present in this work.

I’m grateful as well to the members of the Berger and Schaffitzel groups in EMBL Grenoble for their help and support along the project, and particularly to Dr. Manikandan Karuppasamy and Alice Aubert for their direct implication in the project.

I would finally like to thank the platforms that made possible this work, as well as the people managing them and their IT support teams: EMBL Grenoble, EMBL Heidelberg and the IBS electron microscopy platform, as well as the 4DCellFate consortium for providing funding and a collaborative environment for this research. A Appendix

SPIDER script 3D reconstruction. This script can perform reference free 3D reconstruction, and therefore be used for ab initio reconstruction.

This script was created by modifying a helical 3D reconstruction SPIDER script by Dr. Jaime Martin-Benito (CNB, Madrid). MD SET MP ; Set number of OMP processors 0

; +++++ Image stack name ; IMPORTANT - path cannot be more than 80 characters ; IMPORTANT - .dat extension must be omitted FR L 2Dclasses/clean_pmmr_df_2dclasses_for_spider ; path to stack and name, no extension

; +++++ Suffix for volume files ; IMPORTANT - no spaces allowed between and name FR L clean_PMMR_b2_spider_ini_vol_

; +++++ Iteration parameters ; NOTE - For new run, set start_cycle = 2 [start_cycle] = 2 ; start cycle (set to 1 greater than starting volume) [end_cycle] = 500 ; end cycle; +++++ Pixel size

; IMPORTANT - must be set twice as shown below. 1st instance is for use ; by Spider, 2nd is used by externally called programs FR L ; pixel size (Angstroms) 1.94 [pixel_size] = 1.94 ; pixel size (Angstroms) [diameter] = 320 ; diameter of the particle (Angstroms)

[search_range] = 4 ; search range (pixels) divided by # nshifts [max_ring] = 30 ; maximum ring (pixels) [nshift] = 1 ; # shifts used to generate ref projs (Ed's trick) [ang_incr] = 15 ; increment for azimuthal search (degrees) [%] = 100 ; percent of images selected due to correlation with base volume (only when using correlation to select the images)

; +++++ Out-of-plane search parameters ; NOTE - Keep noutp = 0 to skip search over out-of-plane angles [noutp] = 0; ; Number of out of plane deviations in each direction [outp_incr] = 5.0; ; Step size for out of plane deviations

; ------; ----- Should not need to change anything below this point ----- ; ------

; Numerical constants [ninety] = 90.0 [mninety] = -90.0 [zero] = 0.0

; Turn off verbose ouput MD VB OFF

; Get rid of preexisting files that may cause problems VM rm -f newboxes.dat VM rm -f angsym.dat VM rm -f fangles_ran.dat VM rm -f ftzdsk_ran.dat VM rm -f refproj.dat VM rm -f tmp.dat VM rm -f angleproj.dat VM rm -f anglerecover.dat

; Calculate number of ref projs (360/n*[ang_incr]) [nproj] = (INT(360.0/([ang_incr])))*(INT(360.0/([ang_incr]))) [nproj/2] = (INT(360.0/([ang_incr])))

; Get size of images in stack file fi x [nrow],[nsam] ; Extract number of rows and pixels/row from image @0001 ; image name (2,12) ; nrow,nsam = header locations 2,12

[img_dim] = [nrow]

; Make sure that images are square if([nrow].ne.[nsam]) then VM echo "Images in input stack are not square - terminating program" en d endif ; Obtain number of images in stack file fi x [nimg],[num_row],[num_sam] @ (26,2,12)

VM echo "{*****[nimg]}"

; ------Generate starting map if one does not exist ------if([start_cycle].eq.2) then

iq fi [exists] 001

if([exists].eq.0) then

; Generate list of random orientations [dummy]=0 do [k]=1,[nimg] [rho] = (RAN([dummy])*360) [azimuth] = (RAN([dummy])*360) [phi] = (RAN([dummy])*360) sd [k], [rho], [azimuth], [phi] fangles_ran enddo

sd e fangles_ran

; Reconstruct map using back projection

bp 32f ; back projection, Sampled, interpolated in Fourier space @***** ; template for image files (1-[nimg]) ; range of image numbers to use fangles_ran ; projection angles file * ; apply point group symmetry 001 ; reconstructed volume 001_1 ; reconstructed volume first half 001_2 ; reconstructed volume second half

endif endif ; ------END Generate starting map if one does not exist ------

; Now iterate, starting from previous 3D reconstruction do [cycle] = [start_cycle], [end_cycle]

;if ([cycle].eq.11) then ; [ang_incr] = 6 ;endif ;if ([cycle].eq.21) then ; [ang_incr] = 3 ;endif

[prev_cycle]=[cycle]-1 fi x [nrow],[nsam] ; Extract number of rows and pixels/row from image @00001 ; image name (2,12) ; nrow,nsam = header locations 2,12

; Obtain number of images in stack file fi x [nimg],[num_row],[num_sam] @ (26,2,12)

VM echo "{*****[nimg]}"

VM echo "voy por aqui {***[cycle]}"

; Generate reference projections due to azimuthal rotation if ([cycle].gt.2) then VM echo "Angular sampling of {***[ang_incr]} degrees" endif

[counter] = 0 ; global counter for projections [coop] = 0 ; counter for out-of-plane angles [mnoutp] = -[noutp] ; spider can't handle minus sign in loop range variable do [p]=[mnoutp],[noutp] ; Loop over out-of-plane projections

[coop] = [coop]+1

; Generate list of projection angles do [i]=1,[nproj/2] [outp] =[p]*[outp_incr] ; out of plane angle [azimuth]=([i]-1.)*[ang_incr] ; azimuthal angle do [j]=1,[nproj/2] [counter]=[counter]+1 [rho]=([j]-1.)*[ang_incr] ; rho angle

; Combining two sets of rotations sa e, [phi],[theta],[psi] [rho],[azimuth],[zero] [ninety],[outp],[mninety]

; Write projection angles to doc file ; NOTE! PJ 3Q angles = (PSI,theta,PHI) - PJ 3 angles = (PHI,theta,PSI) sd [counter],[psi],[theta],[phi] angleproj

sd [counter],[i],[j] anglerecover

enddo enddo

[counter]=0 ;reset the counter

; Generate projections for this out-of-plane tilt pj 3q ; Calculate projections of volume {***[prev_cycle]} ; input volume ([img_dim]-1) ; projection radius (1-[nproj]) ; records to use from angleproj angleproj ; doc file containing projection angles tmp@***** ; output

; Increment counter to account for block of [nproj] projections [counter] = [counter] + [nproj]

; Copy new projections to end of refproj do [i]=1,[nproj] [j]=([coop]-1)*[nproj]*[nshift] + [i] cp tmp@{*****[i]} refproj@{*****[j]} enddo

; Generate additional reference projections due to shifts ; if([nshift].gt.1) then ; [nshift_minus_one]=[nshift]-1 ; do [i]=1,[nshift_minus_one] ; [yshift]=[i]*[rise]/([pixel_size]*[nshift]) ; do [j]=1,[nproj] ; [counter] = [counter] + 1 ; [k] = [j] + ([p]+[noutp])*[nproj]*[nshift] ; sh ; Shift with Fourier interpolation ; refproj@{*****[k]} ; input image ; refproj@{*****[counter]} ; output image ; ([zero],[yshift]) ; xshift, yshift ; enddo ; enddo ; endif

; Close and then remove angleproj sd e angleproj sd e anglerecover

VM rm -f angleproj.dat enddo

; Find the reference projection that best matches each image VM rm -f fparamzds+{***[cycle]}.dat [total_proj]=[nproj]*[nshift]*([noutp]*2 + 1)

MD VB ON ap sh ; refproj@***** ; template for reference projections (1-[total_proj]) ; range of projections to use ([search_range],1) ; search range (1,[max_ring],1,1) ; first and last range * ; Angle file @***** ; template for images (1-[nimg]) ; range of images to use * ; Image angles file (0) ; Rotational restriction for experimental images (0) ; Check mirrored and shifts fparamzds+{***[cycle]} ; results file projection number etc. sd e ; Close document file fparamzds+{***[cycle]}

MD VB OFF

VM echo "Fhtagn!"

; Go through fparamzds+ file, screen out bad ptles based on in-plane ; deviation and write out fgoodparamzds file

[ngood]=0 VM rm -f fgoodparamzds+{***[cycle]}.dat VM rm -f fangles{***[cycle]}.dat VM rm -f fgoopazim{***[cycle]}.dat VM rm -f fparamzds+{***[cycle]}_neg VM rm -f fparamzds+{***[cycle]}_neg_ordenado do [k]=1,[nimg]

[img_num]=[k] ud ic,[k],[junk1],[junk2],[junk3],[best_proj],[junk4], [inplane_rotx],[xshiftx],[yshiftx],[junk5],[junk6],[cc], [inplane_rot],[xshift],[yshift] fparamzds+{***[cycle]}

[mcc]=-[cc] sd [k],[best_proj],[mcc],[inplane_rot],[xshift],[yshift], [img_num] fparamzds+{***[cycle]}_neg enddo

DOC SORT fparamzds+{***[cycle]}_neg fparamzds+{***[cycle]}_neg_ordenado (2) Y do [k]=1,[nimg] ud ic,[k],[best_proj],[mcc],[inplane_rot],[xshift],[yshift], [img_num] fparamzds+{***[cycle]}_neg_ordenado

[cc]=-[mcc] sd [k],[best_proj],[cc],[inplane_rot],[xshift],[yshift], [img_num] fparamzds+{***[cycle]}_ordenado enddo sd e fparamzds+{***[cycle]}_neg

;sd e ; fparamzds+{***[cycle]}neg_ordenado sd e fparamzds+{***[cycle]}_ordenado ;sd e ; fparamzds+{***[cycle]} ;ud ice ; fparamzds+{***[cycle]}_neg ud ice fparamzds+{***[cycle]}_neg_ordenado ud ice fparamzds+{***[cycle]} ;ud ice ; fparamzds+{***[cycle]}_ordenado

[nimg]=[nimg]*([%]/100)

VM echo "{*****[nimg]}" do LB11 [k]=1,[nimg] ud ic,[k],[best_proj],[cc],[inplane_rot],[xshift],[yshift], [img_num] fparamzds+{***[cycle]}_ordenado

; If we made it this far, ptle within allowed in-plane deviation [ngood]=[ngood]+1 sd [ngood],[best_proj],[cc],[inplane_rot],[xshift],[yshift], [img_num] fgoodparamzds+{***[cycle]}

; --- Converting index of the best projection to angles and shifts ---

ud ic, [best_proj],[callisto],[galileo] anglerecover

[azimuth]=([callisto]-1.)*[ang_incr] [rho]=([galileo]-1.)*[ang_incr]

; Figure out what block (starting from zero) we are in. ; All values within a block have same out-of-plane angle ;[block] = INT(([best_proj]-1)/([nproj]*[nshift]))

; Calculate out-of-plane angle ;[outp] = [outp_incr]*([block]-[noutp])

; Figure out index within a block ;[best_in_block] = [best_proj] - [block]*[nproj]*[nshift]

; Calculate "composite" y-shift accounting for shifts in ref projs ;[jshft]=INT(([best_in_block]-1)/[nproj]) ;[composite_yshift]=[yshift] - [jshft]*[rise]/([pixel_size]*[nshift])

; Calculate azimuthal angle ;[azimuth]=([best_in_block] - [jshft]*[nproj] - 1.)*[ang_incr]

; Combine two sets of rotations sa e, [phi],[theta],[psi] [rho],[azimuth],[zero] [ninety],[outp],[mninety] ; Write new document file containing angles

sd [ngood],[psi],[theta],[phi] fangles{***[cycle]}

sd [ngood],[best_proj],[outp],[azimuth] fgoopazim{***[cycle]}

; Write new stack file w/ images corrected for shift and in- plane rotation RT SQ ; Rotate and shift - quadratic interpolation @{*****[img_num]} ; input image newboxes@{*****[ngood]} ; output image [inplane_rot] ; rotation [xshift],[yshift] ; shift

LB11 ud ice ; Terminate access to in-core image of document file fparamzds+{***[cycle]} ud ice ; Terminate access to in-core image of document file fparamzds+{***[cycle]}_neg ud ice ; Terminate access to in-core image of document file fparamzds+{***[cycle]}_neg_ordenado ud ice ; Terminate access to in-core image of document file fparamzds+{***[cycle]}_ordenado

sd e ; Close document file fangles{***[cycle]} sd e ; Close document file fgoopazim{***[cycle]} sd e ; Close document file fgoodparamzds+{***[cycle]} sd e fparamzds+{***[cycle]}_ordenado

VM rm fparamzds+{***[cycle]}_neg.dat VM rm fparamzds+{***[cycle]}_neg_ordenado.dat

; Reconstruct map using back projection bp 32f ; back projection, 3D Fourier space newboxes@***** ; template for image files (1-[ngood]) ; range of image numbers to use fangles{***[cycle]} ; projection angles file * ; apply point group symmetry {***[cycle]} ; reconstructed volume {***[cycle]}_1 {***[cycle]}_2

RF 3 {***[cycle]}_1 {***[cycle]}_2 (0.5) (0.6,1.4) C (90.0) (3) fscc{***[cycle]}

;limit the resolution

UD N [maxkey] fscc{***[cycle]}

DO [k]=3,[maxkey] [j]=[maxkey]-([k]-3) UD IC,[j],[NORM-FREQ],[DPHJ],[FSC] fscc{***[cycle]} if ([FSC].gt.0.5) GOTO LB22 ENDDO

LB22 UD ICE fscc{***[cycle]} if ([NORM-FREQ].gt.0.35) then [NORM-FREQ]=0.35 endif [DPH]=[NORM-FREQ]+0.15

if([cycle].eq.2) then VM rm -f resolution.dat endif

[RESOLUTION]= [pixel_size]/[NORM-FREQ]

sd [cycle], [NORM-FREQ], [RESOLUTION] resolution

sd e ; Close document file resolution

[new_ang_incr]=INT((360/((3.14159265*[diameter])/[RESOLUTION]) )-1.) if ([new_ang_incr].lt.[ang_incr]) then [ang_incr]=[new_ang_incr] endif VM echo "Current resolution is {****[RESOLUTION]} A"

FQ {***[cycle]} {***[cycle]}_FSC (7) [NORM-FREQ],[DPH]

VM mv [volsuf]{***[cycle]}.dat [volsuf] {***[cycle]}_sin_refinar.dat

VM mv [volsuf]{***[cycle]}_FSC.dat [volsuf]{***[cycle]}.dat

; now cleanup

UD E

VM rm -f ccdoc.dat VM rm -f ccdocs.dat VM rm -f [volsuf]{***[cycle]}r.dat VM rm -f [volsuf]{***[cycle]}rs.dat VM rm -f ccdoc1.dat VM rm -f ccdocs1.dat VM rm -f [volsuf]{***[cycle]}r_1.dat VM rm -f [volsuf]{***[cycle]}rs_1.dat VM rm -f ccdoc2.dat VM rm -f ccdocs2.dat VM rm -f [volsuf]{***[cycle]}r_2.dat VM rm -f [volsuf]{***[cycle]}rs_2.dat VM rm -f [volsuf]{***[cycle]}_1.dat VM rm -f [volsuf]{***[cycle]}_2.dat VM rm -f s[volsuf]{***[cycle]}_1.dat VM rm -f s[volsuf]{***[cycle]}_2.dat VM rm -f ftzdsk{***[cycle]}_1.dat VM rm -f ftzdsk{***[cycle]}_2.dat

[fhtagn]=INT([cycle]/3) [ia]=([fhtagn]*3)+1 if([cycle].gt.20)then if([cycle].ne.[ia])then VM rm -f [volsuf]{***[prev_cycle]}_sin_refinar.dat VM rm -f [volsuf]{***[prev_cycle]}.dat endif endif enddo ; End loop over cycles

; Cleaning up files VM rm -f newboxes.dat VM rm -f refproj.dat VM rm -f tmp.dat VM rm -f angleproj.dat en B Appendix

Bash script for particle classification based on their stability of the particles along 2D or 3D classifications. Please note: This script is optimised for use in the EMBL computing cluster environment, were it was written. It might need to be adapter for use in other environments. /bin/bash

#This script is intended to localize those particles that can't be assigned to a stable class in 2D or 3D classification. Since it was written to be used with Relion it will read the .star format that Relion outputs. It should be quite portable between different Relion versions, so long nothing radical changes in the output format (if the output you will analyse can be used in Relion 1.4 or previous that is a good signal that this script will work properly).

#Results from this script need to be manually evaluated before taking a decision about them. Gnuplot is a good and broadly available tool (in linux machines) to help you visualize the output file, but any plotting program should be fine for this purpose.

#IMPORTANT: this program is divided in 3 scripts. The current script will create the other two and will run them. After the job is done they will be erased, but the actual code can be obtained from lines 55 to 137 and 150 to 231 of the current script. #To execute this script, first copy it to your working folder and make it executable.

#IMPORTANT2:Your working folder must contain the *data.star files that you want to analyse, and no more *data.star files. This files can be all the *data.star files from a 2D or 3D classification, or part of them (for example, the last 25 iterations). Less files speed up the process, but the more files give a more detailed analysis. While is possible to use alternate files (only even files, for example) to speed up the process while keeping information from earlier iterations, be careful while doing so, and better avoid it completely when the number of classes is low (this situation needs a more detailed analysis to get meaningful information). #While running this script for a 3D classification ignore the files from iteration 0. In case you obtained you dataset from a previous 2D classification (most likely if you followed a relion workflow) the *data.star from this iteration will contain information from the 2D classification. The impact in the final result will be minimal if any, but not using it is the correct way of doing it. #Then set up the variables as explained and save the script. #IMPORTANT3: class_shift_analysis_master.bash is the script controlling the other two, and the one that should be modified and run by the user. None of the other two needs to be modified. #cp class_shift_analysis_master.bash path/to/your/folder/. #cd path/to/your/folder #chmod +x class_shift_analysis_master.bash

#Execute it: #Login into a mpi environment (sky50 or sky70). This script is intended to run with Slurm, if the mpi queue system is a different one modifications will need to be done to lines 271 and 273 #./class_shift_analysis_master.bash #As you see from the command, the script is executed directly, instead of sending it to Slurm. The master script will handle this for you: the computing part will separated in different parts to speed up the process and sent to Slurm, while the original script will remain in the original node and will only display a progress bar.

#READ the explanations first to know how to fill each field. When doing so, always write without leaving a space after the "=", and make sure you leave one before the #. Avoid spaces and simbols like "*", "#", or others that might have a special meaning in bash scripting

first_file=data.star #The full name of the first file of the *data.star files you are going to use. Actually, any of them is good. k=3 #Number of classes used in the classification. This is important if the number is 2, since the equation handling the points will change in this case. For any number higher than 2 the result will be the same, full equation will be applied. output=plot_points.txt #output text file with the list of particles and their values. You will be able to plot the results from here, as explained when the script finishes.

#First version: v1.0, March 2016 #Last update: v1.2 , December 2016 #Written by Jesus, complains to him. ####################################################### #nothing needs to be modified below here date_start=$(date) export first_file export k export output echo "first file: $first_file" rm -f class_shift_analysis_top.bash class_shift_analysis_bottom.bash

#write class_shift_analysis_top.bash read -d '' top_script <<\EOF #!/bin/bash #this script is for use with class_shift_analysis_master.bash script, which will create it on run. Any previous script with the same name will be erased from the current folder.

#First version: v1.0, March 2016 #Last update: v1.1 , March 2016 #Written by Jesus, complains to him. ([email protected]) ####################################################### #nothing needs to be modified below here data=*data.startop number=0 while read line; do name=`echo $line | awk '{if ( NF > 2 ) print $'''${im_name_pos}'''}'` #print name if column number greater than 2 if [ ! -z $name ]; then #if name exists and is not empty #echo $name number=$(( $number + 1 )) counter=0 xzero=0 xs=0 points=0 anticounter=1 fail=0 for input in $data; do #echo $input class=`grep -m 1 $name $input | awk '{print $'''${im_class_pos}'''}'` #echo $class if [ $counter -gt 0 ]; then if [ $class -eq $prev_class ]; then if [ $fail -eq 1 ]; then xzero=$counter xs=0 fi let points=" $points + ( 4 * ( $counter - $xzero - $xs ) * $counter ) + ( 4 * ( ${counter} ** 2 ) ) " anticounter=0 fail=0 else anticounter=$(( ${anticounter} + 1 )) xs=$(( $xs + 1 )) fail=$(( ${fail} + 1 )) fi if [ $counter -gt 1 ] && [ $k -gt 2 ]; then if [ $prev_class -eq $prev2_class ]; then let points=" $points + ( ${counter} ** 2 ) " if [ $fail -eq 2 ]; then anticounter=1 fi fi fi if [ $anticounter -gt 1 ]; then xzero=$counter xs=0 fi fi counter=$(( ${counter} + 1 )) prev2_class=$prev_class prev_class=$class done printf "%-s %-s %-s\\n" "$name $number $points" >> ${output} fi done < ${first_file}top touch top_done.txt exit

EOF echo "$top_script" >> class_shift_analysis_top.bash chmod +x class_shift_analysis_top.bash

#write class_shift_analysis_bottom.bash read -d '' bottom_script <<\EOF #!/bin/bash

#this script is for use with class_shift_analysis_master.bash script, which will create it on run. Any previous script with the same name will be erased from the current folder.

#First version: v1.0, March 2016 #Last update: v1.1 , March 2016 #Written by Jesus, complains to him. ([email protected]) ####################################################### #nothing needs to be modified below here

data=*data.starbottom number=$other_half_data while read line; do name=`echo $line | awk '{if ( NF > 2 ) print $'''${im_name_pos}'''}'` #print name if column number greater than 2 if [ ! -z $name ]; then #if name exists and is not empty #echo $name number=$(( $number + 1 )) counter=0 xzero=0 xs=0 points=0 anticounter=1 fail=0 for input in $data; do #echo $input class=`grep -m 1 $name $input | awk '{print $'''${im_class_pos}'''}'` #echo $class if [ $counter -gt 0 ]; then if [ $class -eq $prev_class ]; then if [ $fail -eq 1 ]; then xzero=$counter xs=0 fi let points=" $points + ( 4 * ( $counter - $xzero - $xs ) * $counter ) + ( 4 * ( ${counter} ** 2 ) ) " anticounter=0 fail=0 else anticounter=$(( ${anticounter} + 1 )) xs=$(( $xs + 1 )) fail=$(( ${fail} + 1 )) fi if [ $counter -gt 1 ] && [ $k -gt 2 ]; then if [ $prev_class -eq $prev2_class ]; then let points=" $points + ( ${counter} ** 2 ) " if [ $fail -eq 2 ]; then anticounter=1 fi fi fi if [ $anticounter -gt 1 ]; then xzero=$counter xs=0 fi fi counter=$(( ${counter} + 1 )) prev2_class=$prev_class prev_class=$class done printf "%-s %-s %-s\\n" "$name $number $points" >> ${output} fi done < ${first_file}bottom touch bottom_done.txt exit

EOF echo "$bottom_script" >> class_shift_analysis_bottom.bash chmod +x class_shift_analysis_bottom.bash

#write variables and half datasets data=*data.star head_size=$(awk '{if (NF < 3) {count++} else {exit}} END { print count }' $first_file) all_data=$(wc -l $first_file | awk '{print $1}') all_data=$(( $all_data - $head_size )) let half_data=" ( $all_data / 2 ) + 1 " let other_half_data=" $all_data - $half_data " let other_half_data_head=" ( $other_half_data + $head_size ) " im_name_pos=`grep -m 1 '_rlnImageName #' $first_file | sed "s|_rlnImageName #||"` im_name_pos=$(echo -e "${im_name_pos}" | tr -d '[[:space:]]') im_class_pos=`grep -m 1 '_rlnClassNumber #' $first_file | sed "s|_rlnClassNumber #||"` im_class_pos=$(echo -e "${im_class_pos}" | tr -d '[[:space:]]') export im_name_pos export im_class_pos export other_half_data echo lines \in top half\: $other_half_data_head echo lines \in bottom half\: $half_data

for input in $data; do rm -f ${input}bottom rm -f ${input}top tail -n $half_data $input >> ${input}bottom head -n $other_half_data_head $input >> ${input}top done

#run jobs rm -f ${output} srun -n 1 -J top ./class_shift_analysis_top.bash & srun -n 1 -J bottom ./class_shift_analysis_bottom.bash &

#wait for jobs to start until [ -f ${output} ]; do sleep 5 echo 'waiting for the servers' done echo 'job in progress' until [ -f bottom_done.txt ]; do

#progress bar current=$(wc -l ${output} | awk '{print $1}')

let progress="( ${current} * 100 ) / ${all_data}" fill=$(( ${progress} * 4 / 10 )) empty=$(( 40 - ${fill} )) _fill=$(printf "%${fill}s") _empty=$(printf "%${empty}s")

printf "\rProgress : [${_fill// /#}${_empty// /-}] ${progress}%%"

sleep 60 done #wait until jobs are done until [ -f top_done.txt ]; do current=$(wc -l ${output} | awk '{print $1}') ##math block let progress="( ${current} * 100 ) / ${all_data}" fill=$(( ${progress} * 4 / 10 )) empty=$(( 40 - ${fill} )) _fill=$(printf "%${fill}s") _empty=$(printf "%${empty}s") printf "\rProgress : [${_fill// /#}${_empty// /-}] ${progress}%%" #"${spin[$spiny]} $ {spin[$spinx]}" sleep 60 done

#finishing block rm top_done.txt bottom_done.txt rm -f class_shift_analysis_top.bash class_shift_analysis_bottom.bash rm -f *data.starbottom rm -f *data.startop current=$(wc -l ${output} | awk '{print $1}') let progress="( ${current} * 100 ) / ${all_data}" fill=$(( ${progress} * 4 / 10 )) empty=$(( 40 - ${fill} )) _fill=$(printf "%${fill}s") _empty=$(printf "%${empty}s") printf "\rProgress : [${_fill// /#}${_empty// /-}] ${progress}%%" date_end=$(date) echo \ finished echo / echo / echo / echo ------echo ======echo Started at ${date_start} echo "use 'gnuplot -p -e \"plot '${output}' u 2:3\"' to view an overview of particle scores" echo "or 'gnuplot -p -e \"set boxwidth 1; plot '${output}' u 3:(1) smooth frequency with boxes\"' for frequency plot" echo Finised at ${date_end} echo ======echo ------echo / echo / echo / exit Résumés

Le complexe de remodelage des nucléosomes et de désacétylation des histones (NuRD) est l'un des principaux régulateurs épigénétiques du génome. Il contribue à la formation et au maintien de l'hétérochromatine, une structure très dense d'ADN et de protéines réprimant la transcription. La NuRD joue un rôle central dans les processus biologiques pertinents tels que la régulation de la pluripotence ou la tumorigenèse. Malgré cela, sa structure et son mécanisme d'action restent largement inconnus. Cela est dû en grande partie à l'hétérogénéité inhérente de composition et de conformation de NuRD. Dans ce travail de doctorat, nous avons essayé de surmonter ces défis en utilisant une approche multidisciplinaire. Nous avons combiné la purification par affinité de complexes et de sous-unités NuRD endogènes et recombinants, la réticulation, la cryo-microscopie électronique et la spectrométrie de masse. Grâce à cela, nous avons pu identifier un sous-complexe d'histone désacétylase stable à activité indépendante (PMMR) et résoudre sa structure à 16,6 Å. Nous avons également obtenu des structures à faible résolution de plusieurs sous-complexes plus petits. L'étude de ces sous-complexes nous a permis de proposer un processus d'assemblage Holo- NuRD, conduisant à la publication d'un article scientifique. Mots-clés: NuRD, régulation épigénétique, remodelage de la chromatine, histone désacétylase, Cryo-EM

The nucleosome remodeling and histone deacetylase (NuRD) complex is one of the main epigenetic regulators of the genome. It contributes to the formation and maintenance of the heterochromatin, a tightly packed structure of DNA and proteins that represses transcription. NuRD plays a central role in relevant biological processes such as pluripotency regulation or tumorigenesis. Despite that, its structure and action mechanism remain unknown. This is largely due to the inherent compositional and conformational heterogeneity of NuRD. In this PhD work we tried to overcome these challenges using a multidisciplinary approach. We combined affinity purification of endogenous and recombinant NuRD complexes and subunits, cross-linking, electron cryo-microscopy and mass spectrometry. Thanks to that we were able to identify a stable histone deacetylase subcomplex with independent activity (PMMR) and solve its structure at 16.6 Å. We also obtained low resolution structures of several smaller subcomplexes. Studying these subcomplexes allowed us to propose a Holo-NuRD assembly pathway, leading to the publication of a scientific paper. Key words: NuRD, Epigenetic regulation, Chromatin remodelling, Histone deacetylase, Cryo-EM