Université Grenoble 1 — Joseph Fourier Sciences et Géographie

N attribué par la bibliothèque

Thèse de Tim GRÜNE Doctorat: Chimie et Sciences du Vivant — Biologie Discipline: Aspect Moleculaires et Cellulaires de la Biologie

Structural studies on ISWI, an ATPdependent nucleosome remodelling factor

Thèse dirigée par: Christoph W. MÜLLER

Laboratoire Européenne de Biologie Moleculaire, Grenoble

Soutenance publique le 3 Octobre 2003

Jury:

Christoph W. MÜLLER EMBL Grenoble Directeur de thèse Elena CONTI EMBL Heidelberg Rapportrice Félix REY CNRS GifsurYvette Rapporteur Hans GEISELMANN UJF Grenoble Président du Jury Saadi KHOCHBIN UJF Grenoble membre du Jury 2 3

Abstract

The Imitation Switch , or ISWI, from D. melanogaster is an essential enzyme that uses the energy from ATP hydrolysis in order to rearrange nucleosomes in chromatin. It plays an important role in gene expression because access to DNA and especially to sites is altered by nucleosome positioning. ISWI can thus act both as an enhancer and repressor of . In all eukaryotes one can find a large number of complexes involved in chromatin remodelling. Despite the diversity and variety of functioning, all these complexes contain an ATPase with a homologous socalled SNF2 domain that is conserved through all eukaryotes. Four groups of remodelling ATPases can be distinguished, SNF2, SNF2L, CHD1, and INO80 of which only the first three have been further characterised according to conserved domains they contain besides the SNF2 domain. For more than 15 years these complexes have been known and a large pool of data is available to characterise a process that, together with covalent modifications, alters the chromatin structure and has important influence on processes like transcription, DNA repair, and replication. But to date detailed information that might shed light on the mechanism of how remodelling is carried out has been missing and only coarse hypotheses have been proposed. The work summarised in this document began in January 2000 with the aim to find structural information about ISWI. The starting point was a clone and a protocol that allowed to produce the enzyme recombinantly, but the path was certainly not straight. A manifold of attempts was undertaken, many of which are not described here. Acf1 and ISWI build one of the complexes that can be found in vivo. The possibility of ISWI being stabilised by Acf1 was considered, but neither could Acf1 expression be detected in bacteria nor could the complex be produced in insect cells in sufficient amounts for structural studies. Binding to substrates like the Nterminal tail of histone H4, ATP and nonhydrolisable analogues of ATP, or cruciform DNA was investigated but did not produce useful results. Given that ATPases often undergo large conformational changes, it was not unexpected that crystallisation trials with the fulllength protein (Mr = 120 kDa) failed. One of the first experiments was therefore to find subdomains of the protein by restricted proteolysis. It showed that the enzyme consists of two flexibly linked main parts, the Nterminus that makes two thirds of the protein, and the Cterminus, about one third. These two parts were subcloned, but only the Cterminal part proved to be stable and gave crystals. Many Nterminal clones could be purified but were difficult to concentrate, even worse than the fulllength protein. Crystals of the Cterminal fragment of ISWI were obtained in July 2001. Since then, most of the effort went into improving the crystals‘ data collection, but the structure could only be solved with a new crystal form that suddenly appeared in May 2002. The introduction of this thesis provides a short overview on chromatin remodelling complexes but emphasises the ISWI family. Chromatin remodelling covers a wide network of intertwined actions that include histone tail modifications, transcription, silencing and the condensation and decondensation of chromatin. No attempt was made to fully cover the literature. The in vitro assays used to characterise the functioning and interaction with substrate are explained in a general way and only the results specific to ISWI are described in further detail. The part following the introduction is dedicated to the techniques that were used to gain insight into the struc ture of ISWI. These obviously include protein crystallisation and crystallography (with emphasis on the phase problem), but also a short section about circular dichroism. The main result of this work is the crystal structure of ISWI [691:991] to 1.9 Å resolution. The elongated structure consists of three domains which are described separately in detail. The very Nterminal domain presents a new fold and has been named the “Hand” domain. It is in direct contact with the following SANT domain. The last domain, SLIDE, is separated from the rest of the molecule by a straight 50 Ålong spacer helix. The fragment has an asymmetric charge distribution of acidic and basic residues between the Nterminal domains and the Cterminal domain. Most of the molecule has a negative charged distribution on the surface, but especially the SLIDE domain contains some positive patches. The following analysis showed that this patch is probably the main area where the Cterminal part of ISWI contacts the DNA of the nucleosome by a classical helixturnhelix motif. The water, glucose, and glycerol molecules, that were found in the structure, are described in detail because they are important for crystal contacts. The description includes a beautiful composition with a water molecule sitting right on one of the symmetry axes that forms a pentagonal water ring with two more water molecules from the asymmetric unit and their symmetry mates. Most of the water molecules of the structure concentrate at the Cterminus of the protein where they build part of the interface between two protein molecules. Functional interpretation of the structure was based on the homology of both the SANT and the SLIDE domain with DNA binding like the oncogene product cMyb and the homeodomain protein Pax6. The SLIDE domain is only distantly related to SANT domains which had been proposed to be DNA binding. However, we now found strong evidence that in fact the more remotely related SLIDE domain contacts the DNA directly while the helix of the SANT domain that, according to the structural homology, should be in contact with the DNA is too negatively charged to bind DNA. The thesis finishes with a suggestion of how the Cterminal fragment of ISWI might bind the nucleosome and hence act as a substrate recognition module for fulllength ISWI.

Note for reading Like during this introduction, I sometimes refer to the names of the domains in the structure of the Cterminal fragment of ISWI. In order to better understand the text I recommend the reader to first look at figure 13.1 in order to get an overview of the model and the domain names and their borders. Furthermore, 4

I sometimes use the term ISWIC. With this term I refer to any of the Cterminal clones of ISWI I prepared during this work where a distinction is not necessary. These are mainly the ones that crystallised: ISWI [691:991], ISWI [701:991] and ISWI [713:991]. 5

Résumé1

La protéine Imitation Switch de Drosophila melanogaster est une enzyme essentielle qui utilise l’énergie de l’hydrolyse de l’adénosine triphosphate pour réarranger des nucléosomes dans la chromatine. Elle joue un rôle important dans l’expression des gènes parce que l’accès à l’ADN et aux sites promoteurs est modifié par le po sitionnement des nucléosomes. ISWI peut donc agir comme activateur mais aussi comme répresseur de la trans cription. Dans tous les Eucaryotes, on peut trouver un grand nombre de complexes multiprotéiques impliqués dans le réarrangement de la chromatine. En dépit de leur diversité et de leur varieté de fonctionnement, tous ces com plexes contiennent une ATPase présentant un domaine homologue très conservé nommé SNF2. Une caractérisation plus approfondie de ces ATPases impliquées dans le remodelage de la chromatine a permis de distinguer quatre groups : SNF2, SNF2L, CHD1 et INO80. En quinze ans, depuis la découverte de ces complexes, un grand nombre d’informations ont été collectées pour caractériser le processus de remodelage de la chromatine qui, avec les mo difications covalentes des , affecte la structure de la chromatine et a donc une influence importante sur des processus comme la réplication, la réparation de l’ADN et la transcription. Mais jusqu’ici le manque d’informa tions détaillées qui pourraient éclairer le mécanisme de remodelage du nucléosome a seulement conduit à proposer des hypothèses. Le travail récapitulé dans ce document a commencé en janvier 2000 avec le but de l’obtention d’informa tions structurales sur ISWI. Le point de départ était un clone et un protocole permettant de produire l’enzyme ISWI recombinante chez E.coli. Mais le chemin conduisant à des résultats a été ramifié : une multitude d’expé riences différentes a été effectuées, dont beaucoup ne sont pas décrites ici. Par exemple, ISWI établit avec Acf1 un complexe qui peut être trouvé in vivo. La possibilité de stabiliser ISWI par association à Acf1 a été conside rée. Malheureusement aucune expression d’Acf1 n’a été détectée dans les bactéries et les quantités de complexe produites en cellules d’insecte étaient insuffisantes pour des études structurles. L’association d’ISWI avec des sub strats dont l’extrémité Nterminale de l’histone H4, l’ATP, des analogues nonhydrolysables de l’ATP ou de l’ADN cruciforme a été étudiée mais n’a pas produit de résultats utiles. Sachant que les ATPases subissent de grands changements de conformation durant leur activité enzymatique, il n’est pas surprenant que les essais de cristallisation avec la protéine entière (Mr = 120 kDa) aient échoué. L’une des premières expériences a donc été une protéolyse restreinte de la protéine entière afin d’identifier des sous domaines. Elle a montré que ISWI se compose de deux parties principales liées par une partie flexible : le domaine Nterminal constitue deuxtiers de la protéine, le domaine Cterminal environ un tiers. Ces deux parties ont été subclonées mais seulement la partie Cterminale s’avérait stable et résultait en cristaux. De nombreux clones de la partie Nterminale ont été purifiés mais leur concentration était difficile, encore plus mauvaise que celle de la protéine entière. Des cristaux de la partie Cterminale d’ISWI ont été obtenus en juillet 2001. Depuis lors, la majeurs partie de l’effort a été consacré à l’amélioration des cristaux et à la collecte de données ; la structure a pû être résolue avec une nouvelle forme de cristal soudainement apparue en mai 2002. L’introduction de cette thèse fournit une courte vue d’ensemble sur les complexes de remodelage de la chroma tine en soulignant la famille d’ISWI. Le remodelage de la chromatine couvre un large réseau d’actions entrelacées impliquant modifications des extrémités des histones, activation et répression de la transcription, condensation et décondensation de la chromatine. Aucune tentative de couvrir totalement la littérature n’a été faite. Les analyses in vitro caractérisant le fonctionnement et l’intéraction avec le substrat sont expliquées de manière générale et seuls les résultats spécifiques à ISWI sont décrits en détail. La partie suivant l’introduction est consacrée aux techniques qui ont été employées pour aboutir à la structure d’ISWI. Cela inclut bien sûr la cristallisation de la protéine et la cristallographie (avec un rapport détaillé sur le problème de phase) mais également une courte section au sujet du dichroisme circulaire. Le principale résultat de ce travail est l’obtention de la structure de ISWI [691 :991] à 1.9 Å de résolution. La structure allongée consiste en trois domaines décrits séparément en détails. Le domaine Nterminal présente un nouveau repliement et a été nommé domaine HAND. Il est en contact direct avec le domaine suivant SANT. Le dernier domaine, SLIDE, est séparé du reste de la molécule par une hélice droite, longue de 50 Å. Le frag ment a une distribution asymétrique des charges des résidues acides et basiques entre les domaines Nterminal et Cterminal. La majeure partie de la molécule a une distribution de charges négatives sur la surfaces, mais le domaine SLIDE en particulier contient quelques parties positives. L’analyse suivante a montré que cette région est probablement la partie principale par laquelle le Cterminale d’ISWI entre en contact avec l’ADN du nucléosome par un motif classique “hélicecoudehélice”. Des molécules d’eau, de glucose et de glycérol trouvées dans la structure sont décrites en détail car elles sont importantes pour les contacts entre cristaux. La déscription inclut une belle composition avec une molécule d’eau se positionnant exactemant sur un des axes de symétrie et formant une boucle pentagonale avec deux autres molécules d’eau de l’unité asymétrique et de leurs compagnons de symétrie. La plupart des molcules´ d’eau de la structure sont concentrées au niveau du domaine Cterminale de la protéine où ils constituent une partie de l’interface entre deux molécules de protéine. La interprétation fonctionnelle de la structure a été basée sur l’homologie des domaines SANT et SLIDE avec des protéines liant l’ADN comme le produit de l’oncogènes cMyb ou la protéine Pax6. Le domaine SLIDE est lié au domaine SANT qui avait été proposé comme domaine se liant à l’ADN. Cependant, nous avons maintenant

1I should thank Cédric for correcting my initial translation of the abstract which resulted in a complete rewrite . . . 6 de forts indices montrant qu’en fait c’est le domaine SLIDE qui entre en contact avec l’ADN alors que l’hélice du domaine SANT qui devrait se lier à l’ADN est trop négativement chargée pour assurer cette fonction. La thèse se termine sur un modèle suggerant la manière dont le domaine Cterminale d’ISWI pourrait se lier au nucléosome, agissant ainsi comme module d’identification du substrat par ISWI entier.

Remarque afin de faciliter la lecture : Je me refère parfois aux noms de domaines dans la structure du fragment Cterminal d’ISWI. Afin de mieux comprendre le texte, je recommande au lecteur de regarder tout d’abord la figure 13.1 afin d’avoir une vision d’ensemble du modèle, des noms de domaines et de leurs limites. En outre, j’utilise parfois le terme de ISWIC qui se refère alors a n’importe quel clone du domaine Cterminal d’ISWI préparé pendant ce travail et dont la distinction précise n’est pas nécessaire. Il s’agit principalement des clone suivants qui ont été cristallisé : ISWI [691 :991], ISWI [701 :991] et ISWI 713 :991]. 7

Acknowledgments

Since I often ask before I think, I am bound to forget many people in this list who helped me during my nearly fouryears stay at the EMBL. Yet, I want to try to make it as complete as my memory admits. Andreas for being a living encyclopaedia and patiently answering all my chemical questions. Andreas, Carlo, Raimond & Serge for the lessons in crystallography. Annie for lookin after and buying the chemicals and other things I needed. Annie, Monique for tidying all the mess I left behind in the wetlab. Christoph for guiding me well through the work of my thesis. At several decisive steps he advised me well and still allowed me a lot of freedom to develop my own ideas. Denis, Jean-Marie, and Jean-Pierre they had answers for all technical and less technical questions about broken and unbroken lab equipment L’équipe de natation du GUC et SAM et Phillippe

Fabrice, Jan & Mark for their help with cloning et al.. Fabrice strengthened my understanding of oligo design and PCR. Guy Schoehn for the time he spent at the electron microscopy with the ACF complex and ISWI (unfortunately, I could not reproduce the preparations to confirm what he saw) Jean-Pierre he had particular importance to the lab work for he built a 25 cm extension to both desk and bench without which my back would have severly suffered. Kreischi weil er als erster mich auf die Idee brachte, daß es hilfreich ist, mit Denken Probleme anzugehen.

Mark & Serge for interesting and entertaining discourses about computing and programming. Martine for all the small and big questions that arose in the lab. Raimond for introducing me to his beamline and letting me play a lot with it. He spent several hours of is spare time explaining to me and measuring (like nonexisting platinumsignals). Stephen Curry for he is responsible for the basics of what this work is based on — I profited from his great skills of teaching crystallography and molecular biology The members of my thesis advisory committee for having listened to all of my progress reports and useful dis cussions. After my first progress report, Saadi Khochbin suggested to carry out binding studies with cruci form DNA. Mila weil sie mir zeigte, wie man mit Denken Probleme lösen kann. Und für unzähliges mehr. Sie fand auch die beiden Paare, die die “Hand”Domäne stabilisieren und die wunderschöne Figur der Wassermoleküle, die in dieser Arbeit in Bild 13.6 auf Seite 75 beschrieben ist. 8 Contents

I Introduction 15

1 Structural features of the nucleosome 17 1.1 Structure of the nucleosome core particle ...... 17

2 Chromatin remodelling complexes 21 2.1 The unifying SNF2 domain ...... 21 2.2 The three subgroups of SNF2 remodelling enzymes ...... 21 2.3 ISWI containing complexes ...... 22 2.4 In vitro characterisation of chromatin remodelling enzymes ...... 23 2.5 Substrate dependence and nucleosome sliding ...... 24 2.6 Importance of ISWI in vivo ...... 24

3 Scope of this thesis 27

II Methods — theoretical background 29

4 Circular dichroism 31 4.1 Theory ...... 31 4.2 Data evaluation — wavelength scan and melting curve ...... 31

5 Protein crystallisation 33 5.1 Crystallisation techniques ...... 33

6 Protein crystallography 35 6.1 The phase problem ...... 35 6.1.1 Molecular replacement ...... 36 6.1.2 Experimental phasing ...... 36 6.2 Phase improvement — density modification ...... 37 6.2.1 Automatic model building ...... 38 6.3 TLS refinement ...... 38

III Materials and Methods 39

7 Purification and crystallisation 41 7.1 Subcloning ...... 41 7.2 Expression and purification of recombinant proteins in E. coli ...... 41 7.2.1 Protein expression ...... 41 7.2.2 Production of selenomethionine substituted protein ...... 42 7.2.3 Purification of 6xHistagged Proteins ...... 42 7.2.4 Purification with the IMPACT T7 system ...... 43 7.2.5 Concentration measurement ...... 43 7.3 Crystallisation of the Cterminal sublcones of ISWI ...... 43 7.4 Harvesting and freezing of crystals ...... 44

8 Additional experiments for protein characterisation 47 8.1 Bandshift assays with ISWI [691:991] and cruciform DNA ...... 47 8.2 Restricted proteolysis ...... 47 8.3 Circular dichroism ...... 48

9 10 CONTENTS

9 Data collection, processing and refinement 49 9.1 Data collection ...... 49 9.2 Data processing ...... 49 9.2.1 Generation of tagged reflections for free R calculation ...... 49 9.3 Phasing ...... 50 9.3.1 MAD location of Seleniumsites ...... 50 9.4 Model building and refinement ...... 50 9.5 Other useful programs ...... 50 9.6 Superposition of the nucleosome and DNA ...... 50

IV Results 53

10 Protein characterisation 55 10.1 Restricted proteolysis ...... 55 10.1.1 Subcloning and protein expression...... 56 10.2 Circular dichroism ...... 57 10.2.1 Measurements — wavelength scans ...... 57 10.2.2 Measurements — melting curve ...... 58 10.3 Binding of ISWI [691:991] to cruciform DNA ...... 59

11 Crystallogenesis of ISWI [691:991] 61 11.1 Hexagonal space group ...... 61 11.2 Monoclinic space group ...... 61 11.3 Production of crystals for phasing ...... 62 11.3.1 Hexagonal space group ...... 62 11.3.2 Monoclinic space group ...... 63

12 Data collection and processing 65 12.1 Data statistics ...... 65 12.2 Low resolution data ...... 66 12.3 Density modification and automated building — resolve ...... 66 12.4 Molecular replacement with data from the hexagonal crystal form ...... 68

13 Description of the structure of ISWI-C 69 13.1 Overall structure ...... 69 13.2 “Hand” domain — a new fold ...... 71 13.3 SANT domain ...... 71 13.4 SLIDE domain ...... 72 13.5 Solvent molecules in the structure ...... 72

14 Comparison with known structures 77 14.1 Interpretation of structural homology ...... 79 14.1.1 Consequences for nucleosome recognition by ISWI ...... 80

15 Discussion and perspective 85

V Appendix 87

A Secondary structure prediction of full-length ISWI 89

B List of clones 91

C Calculating the slope of CD data 93

D Article 97 List of Tables

2.1 Subfamilies of SNF2 ...... 21

9.1 Settings for collection of data sets ...... 49

10.1 Secondary structure prediction for ISWI [691:991] from CDspectra ...... 58

11.1 Tests with heavy metal derivatives on the rotating anode ...... 63

12.1 Data sets statistics ...... 65 12.2 Summary of solve refinement ...... 66

14.1 Results of DALI search for structural homologues ...... 77

A.1 Domain prediction by SMART ...... 90

B.1 Nterminal subclones of ISWI (pProExHtb) ...... 91 B.2 Cterminal subclones of ISWI (pProExHtb) ...... 92

11 12 LIST OF TABLES List of Figures

1.1 Model of the nucleosome core particle ...... 18

4.1 CDspectra of polyLLysine in three conformations ...... 32

5.1 Crystallisation methods — vapour diffusion ...... 33 5.2 Crystallisation methods — liquid phase diffusion ...... 34

6.1 Experimental phasing ...... 37

7.1 Purification effect of second Nicolumn ...... 43

10.1 Domain composition of ISWI ...... 55 10.2 Trypsin digestion of fulllength ISWI ...... 56 10.3 Secondary structure prediction of ISWI[691:991] ...... 57 10.4 Temperaturedependent expression of ISWIC ...... 58 10.5 Circular dichroism — ISWI [691:991] and ISWI ...... 58 10.6 Melting curve and first derivative of ISWI [691:991] ...... 59 10.7 EMSA of ISWI [691:991] with DNA ...... 60

11.1 Crystal growth by reversed saltingin. Phase diagram and example pictures...... 62 11.2 Examples for micro–seeding ...... 63

12.1 Sample diffraction images of high and low resolution pass ...... 67 12.2 Comparison of completeness for high and low resolution pass ...... 68

13.1 Structure of ISWI [691:991] ...... 70 13.2 Fold and stabilisation of the “Hand” domain ...... 71 13.3 Interaction interface between “Hand” and SANT domain ...... 72 13.4 The SLIDE domain ...... 73 13.5 Binding of the glucose molecule ...... 73 13.6 A special water configuration at the interface between two molecules ...... 75 13.7 Location of the special water molecule within the structure ...... 75

14.1 Superposition of SANT and SLIDE domains with their closest structural neighbours ...... 78 14.2 Residues of SANT and SLIDE contacting DNA ...... 79 14.3 Binding possibilities of ISWI [691:991] at the nucleosome ...... 80 14.4 Two possibilities how ISWI [691:991] contacts the tails of H3 and H4 ...... 82 14.5 Basic residues on the spacer helix support the model of how ISWI [691:991] contacts the DNA . . 83

C.1 Effect of window size in perl script to determine the inflection point of melting curve data . . . 95

13 14 LIST OF FIGURES Part I

Introduction

15

Chapter 1

Transcription regulation — structural features of the nucleosome

Eukaryotic organisms maintain their genome as chromatin, a dynamic assembly of DNA, RNA, and proteins. Its diversity and dynamics — people speak of chromatin fluidity — allows for high control over nuclear processes such as transcription, replication and repair, that are required to maintain the cell’s viability. Chromatin has various faces. During cell division, it condenses to what is called heterochromatin and chromosomes can be distinguished under a lightmicroscope; condensation into the highly compacted heterochromatin makes the DNA mostly in accessible and suppresses transcription, a process called silencing (Voet and Voet, 1995; Kornberg and Lorch, 1999). During the remaining time of the cell cycle, however, the chromosomes cannot be distinguished any more; chromatin appears as a mixture of heterochromatin and the loosely packed euchromatin. At this stage, DNA can be accessed by a vast machinery present in the nucleus. Transcription, for example, involves more than hundred proteins that regulate proper functioning of RNA polymerases, which themselves are complexes in the mega Dal ton range. These proteins need access to the DNA, notably the promoter sites in order to initiate and carry out transcription. The fact that DNA can be compacted into higher order structures finally leading to the chromosomes (as shown e.g. by electron microscopy (Voet and Voet, 1995)) indicates that the genome is not stored as bare DNA but by the help of repetitive elements that direct folding and unfolding. There are several degrees of compaction between free DNA and chromatin; first comes the nucleosome, a protein octamer built up of four different histones, two each, with the double helix wrapped around. With the aid of linkerhistones and transacting proteins, nucleosomes form nucleosome fibres of 10–30 nm diameter (depending on the ionic strength). These are assembled into even higher order structures that contain additional proteins like HP1 (heterochromatin protein 1) and SIR proteins (Hayes and Hansen, 2001). The existence of the nucleosome was first suggested by R. Kornberg in 1974 based on experimental results available at the time (Kornberg and Thomas, 1974; Kornberg and Lorch, 1999; Voet and Voet, 1995). Amongst all chromosomal proteins, five are the most abundant ones, with a mass level comparable to that of the DNA. These proteins have been named histones H1, H2A, H2B, H3, and H4. Histone H1 is present at only half the mass of each of the other four histones. It seems to be important for internucleosomal interaction. The nucleosome itself consists of a stretch of doublestranded DNA wrapped around a histone octamer consisting of the (H3H4)2 tetramer flanked by two H2AH2B dimers. These four histones that build the histone octamer are amongst the most conserved proteins, especially the histones H3 and H4 (Kornberg and Lorch, 1999). Micrococcal nuclease (which cuts free double stranded DNA) digestion of chromatin showed that 146 bp of DNA are wrapped around the histone octamer, building the nucleosome core particle; the sequence linking two nucleosomes varies about a mean of 50 bp.

1.1 Structure of the nucleosome core particle

An important contribution to the understanding of the nucleosome was the Xray structure at 2.8 Å resolution (Luger et al., 1997), now amended by several other structures at higher resolution, e.g. (Davey et al., 2002). The structure revealed the shape of a flat cylinder with a diameter of 100 Å and a height of 60 Å. Many hydrogen bonds and salt bridges result in strong interactions between the proteins and the phosphate backbones but also make the DNA deviate from its canonical straight doublehelical form and make it turn 1 3/4 times around the histone octamer. These contacts render the nucleosome a rather stable complex at physiological conditions. Its globular structure minimises the surface and protects the DNA; therefore, the histone tails are important signals to the “outside world”, see Figure 1.1. With up to 40 residues they make up to one third of the histone mass and reach far from the nucleosome globule. The distance of the outermost histone tail can be at least 45 Å from the DNA surface, which is nearly half the diameter of the nucleosome. Even though the contacts between proteins and DNA are not sequence specific, the curvature of the DNA does depend on its sequence. This influences the position of

17 18 CHAPTER 1. STRUCTURAL FEATURES OF THE NUCLEOSOME

(a) view of half the nucleosome perpendicular to the dyad (b) view along the dyad axis, rotated by 90◦ around the axis x-axis

FIGURE 1.1: Model of the nucleosome core particle at 1.9 Å resolution (PDB-code 1kx5; atoms with zero occupancy removed). Figure 1.1(a) shows only one copy of each histone and half the DNA for better clarity. The “handshake” motif as dimerisation interface between the H3–H4 (blue–green) and H2A–H2B (red–yellow) respectively can be seen well. The side view of Figure 1.1(b) illustrates the compactness of the nucleosome. The nucleosome occludes major parts of the DNA double helix. the nucleosome and they appear to be close to promoters, regulatory elements, or other special sites (Kornberg and Lorch, 1999). Protection of the DNA and providing an important cornerstone in chromatin compaction are rather passive roles of the nucleosome. It is also —directly and indirectly — involved in transcription regulation. Directly because, as mentioned above, the nucleosome can hide promoter sites and thus inhibit expression of a particular gene. Indirectly because of the histone tails that function as signals and binding anchors for many factors that further direct the steps to be carried out. There are two basic and important processes of nucleosome modifications in the foreground of discussion: 1. Covalent histone tail modifications. As mentioned above, the histone Ntermini are very long and flexible, reaching out from the nucleosome core. Their residues are subject to various modifications that regulate several processes, and many DNA dependent pathways depend on the state of modifications of the histone tails (Strahl and Allis, 2000; Iizuka and Smith, 2003, for reviews). The following ones seem especially important: (De-)Acetylation of lysines mostly on the amino terminal ends of H3 and H4 are associated with transcrip tion activation. Histone acetyltransferases (HAT’s) and histone deacetylases (HDAC’s) carry out these modifications. Acetylation neutralises the charge of the lysine and thereby alters its binding behaviour to other proteins. Phosphorylation of serine 10 of histone H3 is often (but not always) associated with chromatin condensa tion; however, other sites in other histones including H1 can also be phosphorylated. Methylation of lysines or arginines, mostly of histones H3 and H4; mono, di and trimethylation can be observed. Methylation of histone tails is related to DNA methylation and and both events are usually associated with transcription repression even though the opposite has been reported, too (Bernstein et al., 2002). 2. ATPdependent nucleosome remodelling. A large number of complexes is capable of changing the positions of nucleosomes on DNA. Important for these complexes is a central ATP dependent subunit. These subunits share a common ATPase domain with strong homology across all eukaryotes that classifies them as members of the SNF2family of . This subgroup of the DEAD/H helicases is unified by a stretch of several hundred amino acids that contains characteristic, highly conserved motifs. Remodelling is important both for repression and activation of transcription. Histone tail modifications and nucle osome remodelling are not independent processes but often occur together. Remodelling that suppresses transcrip 1.1. STRUCTURE OF THE NUCLEOSOME CORE PARTICLE 19 tion has been observed to be accompanied by methylation and deacetylation, and by acetylation when it enhances transcription (Santoro et al., 2002; Tariq et al., 2003). 20 CHAPTER 1. STRUCTURAL FEATURES OF THE NUCLEOSOME Chapter 2

Chromatin remodelling complexes

2.1 The unifying SNF2 domain

The first chromatin remodelling activity was found in yeast with the SWI/SNF complex (mating type switch / sugar non fermenting). This 2 MDa complex contains about eleven subunits including the ATP dependent enzyme SWI2/SNF2. Homologs of SWI2/SNF2 were soon found in many eukaryotes. Database searches in the Drosophila melanogaster genome revealed Brahma as close relative and, more distantly related, ISWI, imitation switch.A comprehensive work classified a large range of these proteins and found a central SNF2 domain characterised by several conserved motifs. Seven of these motifs make them belong to the DEAD/H class of helicases (Eisen et al., 1995; Bork and Koonin, 1993). The second of these motifs is the wellknown Walker motif A, GXnGK[TS], that is used by many proteins for magnesium mediated ATP binding. The authors’ phylogenetic analyses divided the SNF2family into 15 subfamilies, but only three of them are chromatin remodelling enzymes, SNF2, SNF2L (now often named ISWI), and CHD1 (also called Mi2). This subdivision is based on differences in the sequences adjacent to the domain which will be described in the following section. One difference between these groups is the size of the complexes they form. The first one, SNF2 builds the largest complexes with more than ten subunits. The SWI2L / ISWIlike proteins are found in the smallest complexes with two to four subunits. The sizes of Mi2 containing complexes lie between these ones. To date no eukaryote is known that does not contain a member of the family spanned by these three chromatin remodelling enzymes. There is a paralogue of SNF2/SWI2 that does not follow this scheme and cannot be classified in one of the three aforementioned groups. Ino80 (ORF YGL150C) from yeast, a 170 kDa protein, shares the homology of the ATPase domain (expectation value for the SNF2_N domain (PFAM) = 10−109), but neither PFAM nor SMART indicate any known domains other than that. It is member of a 12 unit complex and seems to be directly involved in DNA repair since deletion mutants are more sensitive to UV and radioactive radiation than wildtype cells (Shen et al., 2000; Ebbert et al., 1999; Steger et al., 2003).

2.2 The three subgroups of SNF2 remodelling enzymes

Currently one distinguishes three classes of chromatin remodelling enzymes: SNF2, SNF2L or ISWI, and CHD1 or Mi2. Databases like PFAM (Bateman et al., 2002) and SMART (Schultz et al., 1998) split the SNF2domain into two subdomains, named DEXDc and HELICc in the case of SMART, SNF2_N and helicase_C for PFAM. HELICc and helicase_C differ only slightly in their consensus, DEXDc is a more general definition than SNF2_N. The first family, SNF2, contains a bromodomain Cterminal to the SNF2domain, the second one, SNF2L / ISWI, a SANT domain, and the third group, CHD1 / Mi2, an Nterminal chromo domain, often accompanied by a PHD finger. The domainsubstructure is sketched in Table 2.1.

SNF2 SNF2L / ISWI CHD1 / Mi-2

DEXDc and HELICc encompass the ATPase domain common to all three groups

C-terminal bromodomain C-terminal SANT domain N-terminal chromo domain 10–20 subunits / complex 2–4 subunits / complex 5–8 subunits / complex nucleosome transfer in cis and trans nucleosome transfer in cis acetyltransferase activity

TABLE 2.1: Subfamilies of SNF2 are distinguished according to the domain structure adjacent to the SNF2-domain (here split into DEXDc and HELICc). SNF2-like proteins have a C-terminal bromodomain, SNF2L-like proteins a C-terminal SANT domain, and CHD1-like proteins an N-terminal chromo domain that is generally preceded by a PHD finger.

21 22 CHAPTER 2. CHROMATIN REMODELLING COMPLEXES

The bromodomain of the SNF2group binds histone tails and is associated with transcription activation. The name is derived from brahma and the analogy to chromo domains. It can be found not only in SNF2like pro teins but also in acetyltransferases. The bromodomain specifically binds acetylated histone tails. Details of the interaction with residues 16–19 from histone H4 were demonstrated by the crystal structure of the 110 residues bromodomain of Gcn5p complexed with residues 15–29 of histone H4 acetylated at Lys16 (Owen et al., 2000). However, as in the case of the chromo domain, homology must not be conferred to function without great caution, and bromodomains in different proteins may exhibit different functions.

The chromo domain of the CHD1group occurs also in HP1, a protein associated with heterochromatin, where the domain binds methylated H3 tails. This is not the case for Mi2, and it has been suggested that the chromo domain in Mi2 interacts with DNA rather than histones (Bouazoune et al., 2002).

The SANT domain is part of the fragment presented in this work and will be described in greater detail. The name is derived from the four proteins that initially defined it, Swi3 (ISWI family), the Ada2 (a subunit of the acetyltransferase complex SAGA), the corepressor NCoR, and the general TFIIIB. All these proteins interact with nucleosomal DNA. Because of their strong homology to the DNA binding repeats of the oncogene product cMyb, the SANT domain has been proposed to also bind DNA (Aasland et al., 1996). In the known structure of cMyb, the three homologous repeats each consist of a globular three helix bundle. DNA is bound through the helixturnhelix motif of the second and third helix. Many proteins make use of the helixturn helix motif to bind the major groove of double stranded DNA. Recent results indicate, that SANT domains are very important as a histone tail presenting module. The effect of mutations in the SANT domains of several members of yeast chromatin modifying complexes were inves tigated (Boyer et al., 2002): Swi3 is one of 17 subunits of the SWI/SNF complex; Ada2 is present in several HATcomplexes containing the enzyme Gcn5, including the SAGA complex; Rsc8 belongs to RSC whose ATPase subunit belongs to the SNF2family, and so does the SWI/SNF complex. In this study all deletions seriously im peded the viability of the yeast cells. Importantly, the SANT domain of Ada2 has a large impact on the interaction of the SAGA complex with the H3 tail. Pulldown assays against GSTH3 tails with the intact SAGA complex kept both Gcn5 and Ada2 attached to the glutathionine beads. A deletion mutant of Ada2 where ten residues were removed from the putative third helix of the SANT domain transferred a major part of the complex to the unbound fraction. The authors also point out that they could not detect any in vivo influence of three independent point mutations of residues in Swi3. The corresponding residues in Myb on the other hand play critical roles for its DNA binding. A second study (Sterner et al., 2002) tested the effects of various mutations of the SANT domain in Ada2 on the activity of the SAGA complex and they found that the second half of the SANT domain is important for binding to Gcn5. The corresponding part in Myb is in contact with DNA and therefore less likely to be involved in proteinprotein contacts. Our analysis of the Cterminus of ISWI and binding studies confirm that the ISWI SANT domain is probably not involved in direct DNA binding (Grüne et al., 2003).

2.3 ISWI containing complexes

ISWI was found in Drosophila melanogaster in the following complexes:

1. ACF, the ATPutilising chromatin assembly and remodelling factor, or CHRAC, the chromatin accessibility complex. From current assays, ACF and CHRAC cannot be distinguished. They both contain ISWI and Acf1, 270 kDa. CHRAC was purified with two additional peptides, Chrac14 and Chrac16 (14 and 16 kDa respectively), but apart from their composition no difference between CHRAC and ACF has yet been re ported (Corona et al., 2000; Ito et al., 1999).

2. NURF, the nucleosome remodelling factor. Apart from ISWI, NURF contains Nurf301, which is similar, but not identical, to Acf1; Nurf55, also found in the chromatin accessibility complex CAF1; and Nurf38, an inorganic pyrophosphatase.

Following the literature, defining these complexes has not been easy. They were identified and purified on the basis of different biochemical assays. NURF was found to be required for chromatin remodelling that is induced by the GAGA factor (Tsukiyama and Wu, 1995). CHRAC was identified by its ability to mobilise nucleosomes in a manner that allows enhanced access of restriction enzymes to DNA packed into chromatin. ACF was identified by a NAPI assisted chromatin assembly assay (Kadonaga, 1998). Acf1, a 270 kDa protein, was initially reported not to be part of CHRAC while topoisomerase II was copurified with CHRAC. This is probably why most people still refer to ACF and CHRAC as distinct complexes even though they cannot be distinguished based on there activities. NURF, however, is clearly distinct. As discussed in Section 2.6, depletion of NURF301 is not recovered 2.4. IN VITRO CHARACTERISATION OF CHROMATIN REMODELLING ENZYMES 23 by CHRAC/ACF, and unlike NURF, which renders nucleosomes stochastically distributed on a multinucleosomal array, CHRAC/ACF have the opposite effect and cause regular spacing. Considering the complexity of interactions involved in nuclear processes, it might not always be possible to define a complex as an isolated stable entity. Different stages of functioning may require different manifestations. Studies by Memedula and Belmont (Memedula and Belmont, 2003) for examples provide evidence that BAF155 and BAF170, both members of SWI/SNFlike complexes in yeast, are recruited more than an hour after Brg1 or Brm, the active enzymes of those complexes. Something similar might be true for Chrac14 and Chrac16, and that these two proteins are only required for some but not all modes of function of ACF/CHRAC. In that case, ACF and CHRAC would of cause have to be referred to as two distinct complexes.

2.4 In vitro characterisation of chromatin remodelling enzymes

ATP hydrolysis and remodelling activity can be monitored separately in vitro. Hydrolysis is often measured by dissociation of γ32P from ATP. Sequencing gel electrophoresis after enzymatic digestion of multinucleosomal arrays can be used to check remodelling activity. The protection of DNA by the nucleosome alters the digestion pattern by DNase I or MNase (micrococcal nuclease) that can be observed. Electrophoretic mobility shift assays (EMSA) are used to detect interaction between substrate and complexes. These techniques have widely been used to characterise the function of chromatin remodelling proteins and complexes (Corona et al., 1999; Clapier et al., 2001; Havas et al., 2000; Whitehouse et al., 2003). In vivo all chro matin remodelling factors act as complexes, for ISWI it has been shown that it remodels chromatin without the aid of cofactors. The rate of hydrolysis is substrate dependent. ISWI by itself as well as ISWIcontaining complexes require nucleosomal substrate with an intact H4tail and an overhang of DNA extending from the nucleosome core for full activity. The activity of ISWI is further stimulated in the presence of Acf1. The hydrolysis of other com plexes is already stimulated by free DNA (SWI/SNF) or the nucleosome core particle without histone tails (Mi2). The following list summarises the most frequently used assays for remodelling complexes and enzymes (Kingston and Narlikar, 1999).

ATP hydrolysis can be monitored qualitatively by thin layer chromatography by using radioactively labelled ATP and quantitatively with a scintillation counter or a phosphor imager. In the case of ISWI, unfortunately, the rate of hydrolysis is very low and impedes kinematic studies (Whitehouse et al., 2003).

Nucleosome assembly and remodelling by NAP1 supported by a chromatin remodelling enzyme results in the protection of DNA as shown by agarose gel electrophoresis upon restricted digestion by MNase. MNase cannot cut DNA protected by the histone core. Hence changes in the protection pattern can be used to show nucleosome movement. For example a multinucleosomal array with randomly placed nucleosomes will result in a smear whereas an evenly spaced array produces distinct bands as the enzyme cleaves only at the locations that are unoccupied by the nucleosome. DNase prefers sites where the minor groove faces away from the nucleosome. This allows for higher resolution mapping then the MNase assay.

Transcription enhancement. This can be measured with labelled that are being incorporated into the transcribed RNA. NURF, for example, enhances binding of the transcription factor GAGA and hence the amount of RNA produced during the assay.

Nucleosome sliding. Mononucleosomes with overhanging DNA, i.e., more than 150 bp, can be separated by EMSA depending on the location of the histone octamer. If they are at the edge of the DNA they migrate faster than if they are at a central position.

Assays testing these functions were originally used to discover many remodelling complexes. An important result towards understanding the remodelling mechanism was the finding that ISWI works without support of other complex subunits (Corona et al., 1999). ISWI hydrolysis ATP in the presence of nucleosomes, remodels nucleosomal arrays and facilitates chromatin assembly. Unfortunately, quantification and comparison between different experiments, especially with respect to the ISWI group, is impeded by the low activity and difficult reproducibility (Martens and Winston, 2003; Whitehouse et al., 2003, and private communication J. Brzeski). The following observations illustrate the influence of the other subunits within the complexes. ISWI and CHRAC/ACF render a nucleosomal array evenly spaced while NURF causes a stochastic distribution (the same is true for the SWI/SNF complex with SNF2 from the SNF2subgroup). On the other hand, ISWI incubated with a nucleosome with an overhang of DNA makes it move to the edge while CHRAC/ACF centres the nucleosome. Only the large complexes RSC or SWI/SNF from the SNF2group have been reported to displace nucleosomes in trans (Panigrahi et al., 2003; Längst et al., 1999). 24 CHAPTER 2. CHROMATIN REMODELLING COMPLEXES

2.5 Substrate dependence and nucleosome sliding

ISWI by itself hydrolyses ATP only weakly. Critical for its full activity is only the H4 Nterminal tail and some linker DNA extending beyond the nucleosome core. The activity is further enhanced when ISWI acts in complex with Acf1. Nucleosomes from recombinant histones with all but the histone H4 tail removed do not reduce the activity of ISWI while it becomes inactive with nucleosomes containing all but the H4 Nterminal tail. Further analysis assigned an important role to a small epitope in the region between Gly10 and Arg19: With the first ten residues removed from histone H4 (∆10H4) ISWI maintains a normal hydrolysis level while the level is reduced to background with a ∆19H4 mutant. As opposed to ATPhydrolysis, nucleosome sliding is nearly abolished with any of the four tails missing. It is only reduced but still clearly detectable if ISWI acts as the CHRAC complex with any of the three histone tails of H2A, H2B, or H3 removed (yet again, a missing H4tail abolishes sliding even for CHRAC) (Brehm et al., 2000; Clapier et al., 2001; Clapier et al., 2002). Both double and single stranded DNA induce ATPase activity of ISWI but at a much lower level compared to nucleosomal substrate (Boyer et al., 2000; Corona et al., 1999; Whitehouse et al., 2003). This exhibits further differences between the subgroups of the SNF2family: The SNF2group can be equally stimulated by free DNA and nucleosomes, with or without tails. Mi2 requires nucleosomes but not tails, and with nucleosomefree DNA the activity of Mi2 is reduced to about 30%. Current models of how the nucleosome sliding or remodelling process works are based on two ideas:

Twisting. Chromatin remodelling complexes and enzymes have been reported to alter superhelical torsion (Havas et al., 2000). This raised the idea that the DNA could be “screwed” around the histone core. This would increase the torsional stress and the requirement for means of release, especially in multinucleosomal arrays. Torsional stress can be released by topoisomerase I, but in vitro assays work without this enzyme.

Bulging or looping. Remodelling is not inhibited by nicked DNA which is supposed to interrupt torsion propaga tion (Aoyagi and Hayes, 2002). Therefore a mechanism purely based on twisting of the DNA can probably be excluded. A “bulging” or “looping” mechanism has been proposed instead where the remodelling com plex forms a loop of DNA that propagates around the histone octamer (Längst et al., 1999).

A mixture of both ideas is also being considered, but no detailed enough experiments are available for a more precise model. Several recent publications support the second mechanism. The loss of approximately 40 bp worth of histoneDNA interactions upon remodelling by SWI/SNF has been reported (BazettJones et al., 1999), as well as the formation of an up to 50 bplarge loop by SWI/SNF during nucleosome remodelling (Kassabov et al., 2003). Finally, the translocase abilities of ISWI were tested by the displacement of triplex DNA 0–60 bp away from an edgepositioned nucleosome. Between 40 and 50 bp no more triplex displacement was detected (Whitehouse et al., 2003). This work also found the first link between proteins from the SNF2 family and the superfamily of DEAD/H helicases they belong to. For none of the chromatin remodelling enzyme strand separating activity has been reported. The triplex displacement assay first showed a 3′ 5′ preference for ISWI: the triplex was not removed with a five or ten base pair gap in the 3′ 5′ strand;→ a gap in the opposite strand had no effect and triplex removal was as strong as with no gap in the double→ stranded DNA. The authors state that strand activity is a property of DNA helicases and that many members of the DEAD/H family show 3′ 5′ specificity. → 2.6 Importance of ISWI in vivo

Many experiments have been published trying to elucidate the remodelling mechanisms and their links to other gene regulators. Only little is known about their role in vivo. Two very important, complementary contributions were published recently (Badenhorst et al., 2002; Deuring et al., 2000). In the latter publication Drosophila mutants were examined with Cterminally truncated forms of ISWI: ISWI[1:800] and ISWI[1:953]. Neither could be detected in embryos1 and the mutants die in late larval or in early pupal state even though no phenotypical anomalies were found. Preliminary survival was attributed to the presence of paternal ISWI — in adults, ISWI is mainly expressed in oocytes and testicles, maybe only in order to provide a stock for their offspring. Local overexpression in the eye disc of ISWIK159R, a point mutation that lacks the capability to hydrolyse ATP (Corona et al., 1999), results in heavy mutilation of the eye of the adult fly. On the molecular level, the males’ X chromosome of the mutant fly embryos was heavily deformed; an effect on autosomes was detected but more subtle. In their studies fulllength ISWI does not colocalise significantly with the transcription activator GAGA. This seems to be in contrast to the activation enhancement of GAGA by ISWI in vitro (Tsukiyama et al., 1995) as well as the strong reduction of expression of heat shock protein 70 (hsp70), ultrabithorax (ubx), and engrailed (en), all three targets of GAGA. However, since Drosophila has more than one complex which contains ISWI as ATPase (CHRAC/ACF and NURF), it was unclear whether or not their findings could be contributed directly to the malfunctioning of ISWI.

1The authors suggest that the Cterminus of ISWI is essential for its proper folding; this is supported by the fact that recombinant Nterminal subclones of fulllength ISWI are very unstable and difficult to purify as reported later in this work. 2.6. IMPORTANCE OF ISWI IN VIVO 25

Therefore, the experiments were repeated with Nurf301 (Badenhorst et al., 2002), the large subunit in NURF. which led to very similar results. Since the functionality of the other two ISWIcontaining complexes present in Drosophila should not be affected by the Nurf301 mutation, these results highlight basic differences between the NURF complex and ACF of CHRAC. To my knowledge, similar investigations using Acf1, the common large subunit of ACF and CHRAC, are still missing. 26 CHAPTER 2. CHROMATIN REMODELLING COMPLEXES Chapter 3

Scope of this thesis

Despite the amount of information available about chromatin remodelling complexes, enzymes and their function, structural information that might help explain the remodelling mechanism has been missing. Therefore a collabo ration between the crystallography group of Christoph Müller, who is interested in DNAprotein interactions, and the group of Peter Becker, whose work has contributed a lot to the understanding of chromatin remodelling and especially ISWI and CHRAC, was initiated. The goal of this work was to supply structural information about ISWI in order to help understanding the mechanism of remodelling. To crystallise one of the ISWIcontaining complexes to would have been too ambitious for a fouryears project. The results from D. Corona from P. Becker’s group (Corona et al., 1999) had shown that the remodelling activity is intrinsic to ISWI and not the fully assembled complexes. This made this enzyme by itself a particularly suitable target for the task to shed some light on the mechanism of chromatin remodelling.

27 28 CHAPTER 3. SCOPE OF THIS THESIS Part II

Methods — theoretical background

29

Chapter 4

Circular dichroism

4.1 Theory

The secondary structure elements of proteins (αhelix, βstrand and random coil regions) have different absorption coefficients for light with negative and positive helicity. Therefore, analysis of the change of polarisation of light can exhibit information about conformation and stability. The most general expression of a homogeneous plane electromagnetic wave is given by (Jackson, 1998)

ikx−iωt E(x t) = (ε1E1 + ε2E2)e (4.1) with ε12 being two linearly independent unit vectors in the plane perpendicular to the direction of propagation k and E1 and E2 two complex amplitudes. A wave is said to have positive helicity (or to be left circularly polarised in optics) if the total amplitude sweeps counterclockwise around k, negative helicity in the opposite case. Since proteins modify the state of circular polarisation, a base transformation is convenient by introducing 1 ε± = (ε1 iε2) √2 ± ikx−iωt E(x t) = (ε+E+ + ε−E−)e (4.2)

The (complex) amplitude can be separated into magnitude and phase,

iδ± E± = a±e

The four Stokes parameters are hence defined and expressed as

ε∗ E 2 ε∗ E 2 2 2 s0 = + + − = a+ + a− (4.3) ∗ ∗ s1 = 2Re (ε E) (ε−E ) = 2a+a− cos (δ− δ+) (4.4) + −  ∗ ∗  s = 2Im (ε E) (ε−E) = 2a a− sin(δ− δ ) (4.5) 2 + + − + ε∗E 2 ε∗ E 2  2 2 s3 = + − = a+ a− (4.6) − − The Stokes parameters can be measured experimentally and fully determine the state of the wave after interaction with the protein solution. Because of their handedness, proteins have different and – more importantly – complex refraction indices for ε+ and ε− respectively, i.e., the two compounds of the wave are absorbed differently. This causes what is called "circular dichroism". The ellipticity of the sample is defined via the difference in absorbance ∆I for the negative and positive component of the wave, the sample concentration c, and the path length ∆ through the sample. This quantity is wavelength dependent. Using Beer’s law the ellipticity can be expressed depending only on the measurable values s0 and s3:

2 I I− 1 (s + s ) Θ(λ) = + − = log 0 3 (4.7) c ∆ c ∆ 10 (s s )2 × × 0 − 3 4.2 Data evaluation — wavelength scan and melting curve

Figure 4.1 shows the spectra for polyLlysine in three different conformations, i.e., purely αhelical, purely β turns, and as random coil (Greenfield and Fasman, 1969). Ideally, the signal of any protein ought to be a linear superposition of these three spectra, weighted by the amount of each conformation present in the structure of the protein. The data could be analysed by fitting to a theoretical curve; however, to make up for possible deviations due to tertiary structure elements, recorded data are normally compared to data bases assembled from proteins

31 32 CHAPTER 4. CIRCULAR DICHROISM

8 × 104 α−lysine β−lysine coiled lysine 6 × 104

4 × 104

2 × 104 /dmol] 2

0 × 100 [°cm Θ

−2 × 104

−4 × 104

−6 × 104 190 200 210 220 230 240 250 wavelength [nm]

FIGURE 4.1: CD-spectra of three different conformations of poly-L-lysine: α-helical, β-turn, and coiled-coil (data from Greenfield and Fasman, (Greenfield and Fasman, 1969)). A general spectrum can be considered a linear superposition of these three curves. The data between 190 nm and 200 nm contain information important for evaluation. with known structure. This method requires accurate knowledge about protein concentration and it is not easy to tell a priori how accurate the prediction is. CDdata from a wavelength scan may be more useful to detect conformational changes induced by ligand binding or in different solvent condition. A second, and more reliable deduction about protein properties can be drawn from a “melting curve”: The CDsignal is measured at a fixed wavelength while the temperature of the sample is increased. As the protein denatures (“melts”), the signal is reduced. Normally the melting curve contains one or more inflection points. The temperature of the ith inflection i point is defined as melting temperature Tm. To maximise the signal difference, a wavelength should be chosen where the signal is extremal at the starting temperature. The melting temperature can also be affected by ligand binding; in fact an interesting experiment would have been CDmeasurements of ISWI with and without cruciform DNA since the fragment ISWI[691:991] and fulllength ISWI both weakly bind to it. Chapter 5

Protein crystallisation

5.1 Crystallisation techniques

Like salts, proteins can form crystals if their concentration in solution exceeds their solubility. The interaction between protein molecules, however, is much weaker than for small (especially ionic) molecules, and crystallisa tion is a rare event compared to disordered aggregation. Finding appropriate conditions is one of the bottlenecks towards obtaining a crystal structure.

Vapour diffusion Because of its simple setup, crystallisation trials for proteins are often carried out by the vapour diffusion method. Thereby a small volume of protein solution is brought into a closed system with a large reservoir solution. A concentration difference between the reservoir and the sample causes vapour diffusion between the two solutions until the vapour pressure in the system is at equilibrium. Hence the change of conditions in the protein solution can bring about the precipitation of the protein. Under the right conditions this happens by the formation of crystals; in most cases, however, by amorphous aggregation. In practice, 1–10 l of the purified protein is mixed with the reservoir solution which contains a precipitant at concentration cP . If the ratio is 1:1, both protein and precipitant concentration are halved upon mixing and will return to the initial concentration at equilibrium. But due to the presence of the precipitant the solubility of the protein can now be lower so that it precipitates. One can vary the mixing ratio and even have different constituents for the reservoir solution and the solution the protein is mixed with (Grüne, 1999). The two most common vapour diffusion techniques are the sitting drop and the hanging drop method. In the sitting drop method the proteinprecipitant mixture is placed onto a small depression or bridge on top of the reservoir. In the hanging drop method, the protein sample is prepared on a cover slip that is turned upsidedown before sealing the well, as illustrated in Figure 5.1.

protein bridge

reservoir

FIGURE 5.1: Crystallisation by vapour diffusion is based on equilibration via the gas phase so that only volatile compounds can interchange. The two most common methods are the sitting (left) and hanging drop (right).

Liquid phase diffusion A very different approach is the liquid phase diffusion method. Here, protein solution and reservoir are not mixed directly but separated by either a membrane or e.g. a layer of agarose so that the exchange is also diffusion driven but concerns all constituents in the two liquids that are small enough to pass through the filter, not only volatile ones. Therefore, nonvolatile compounds are also exchanged which can lead to very different, often improved results (Hansen et al., 2002).

Screening Whether or not crystals form depends on a large manifold of parameters and there is no or very limited a prior information about which ones to choose. The following parameters (and more) can influence crystallisation:

nature and concentration of precipitant

protein concentration

33 34 CHAPTER 5. PROTEIN CRYSTALLISATION

protein agarose

reservoir

FIGURE 5.2: The liquid phase method for crystallisation allows direct contact between protein and reservoir solution separated only by a material impermeable for macromolecules. The material can be agarose or a dialysis membrane.

composition of protein and reservoir solutions, i.e., pH, ionic strength etc. temperature

Screening kits mostly scan dependencies on chemicals, either in a broad, random manner (matrix screens) or systematically around certain values (grid screens, e.g. pH vs. ionic strength). Chapter 6

Protein crystallography

6.1 The phase problem

A crystal can be considered as a lattice that is built up by the regular repetition of a unit cell. Every point x inside the unit cell at the origin can be described as

xunit cell = xa + yb + zc 0 xyz < 1 ≤ where ab and c are the unit cell vectors and x, y and z the fractional coordinates of the point x. Since the lattice is built up of translations of the unit cell, the coordinates of an arbitrary point can be decomposed into the translation of the unit cell from the origin and the position within the unit cell

x = sa + tb + uc + xa + yb + zc s t u Z 0 xyz < 1 ∈ ≤ translation of unit cell relative position in unit cell | {z } | {z } Due to this periodic structure of the crystal a plane electromagnetic wave is scattered as distinct reflections. By classical electrodynamics and knowledge about the scattering of individual atoms one can derive that the electron density within the crystal can be expressed as the Fourier synthesis of the scattered electromagnetic wave

1 hx ρ(x) = F (h)e−2πi Vcell Xh 1 h hx = F (h) eiα( )e−2πi (6.1) Vcell Xh | |

2πihxi F (h) = fie (6.2) Xi

The structure factor F (h) is complex and depends on the scattering factor fi of the atom at position xi. The measured intensities from the diffraction are proportional to the square of the modulus of the structure factor,

I(h) F (h) 2 ∝| | This means that from one experiment one cannot directly calculate the electron density ρ(x) because for Xrays the phase angle α of the structure factor can be measured only by multiple beam interference, similar to holography. Even though first ideas of how this could be achieved experimentally were published more than 70 years ago, this technique is not very suitable for protein crystallography (Weckert and Hümmer, 1997) because of the small scattering power, the densely packed reciprocal lattice which causes overlaps of the interference profiles, and radiation damage. One is therefore faced with what is known as the phase problem in crystallography. To restore the missing information, several methods can be applied:

Molecular Replacement (MR): If a similar structure is known, phases can be calculated using its coordinates after locating the molecule in the unit cell.

Multiple Isomorphous Replacement (MIR): The native data are compared with two or more derivatives col lected at the same wavelength. The derivative contains a small and isomorphous modification that yet scatters strongly enough to detectably alter the diffraction pattern.

Multi wavelength Anomalous Dispersion (MAD): Data from a crystal containing an anomalous scatterer are collected at different wavelengths. The positions of the anomalous scatterer can be determined from the differences and serve as first phase information.

35 36 CHAPTER 6. PROTEIN CRYSTALLOGRAPHY

Radiation Damage Induced Phasing (RIP): The difference between two datasets, before and after exposing the crystal to a high dose rate to induce radiation damage, can be used as phasing information The latter three methods differ mainly in the realisation and experimental procedure and less on the underlying theory.

6.1.1 Molecular replacement For molecular replacement to work, two conditions must be met: 1. sufficient structural homology between the test model and the structure of interest. 2. correct orientation of the test model inside the unit cell It can be applied for example for ligand binding studies or if crystals from different space groups were measured and the model of one could already be solved. In that case, lack of structural homology is less of a worry, but in most cases a different protein must be used and differences between the two structures can seriously hamper successful phasing. In order to reduce the computing time the positioning of the test model is usually separated into two steps, the rotational and translational search. I.e., two threedimensional searches are carried out instead of one six dimensional search. For the rotational search, the coordinates of the centre of mass of the molecule are not required because of the use of the Patterson function. The Patterson function is defined as the convolution of the electron density with itself. It is related to the intensities of diffraction by 1 P (u) := d3xρ(x)ρ(x u)= I(h) cos(2πuh) (6.3) Z V Vcell − cell Xh To interpret the meaning of the Patterson function one has to see that the argument of the integral maximises if both ρ(x) and ρ(x u) are maximal. This is the case if u represents an interatomic vector and this is independent of the orientation of− the molecule. Since the intensities are the result of a diffraction experiment, one can calculate and use the Patterson function: The test model is rotated, its Patterson function calculated and compared with the Patterson function from the experimental data. The translational search involves the symmetry operators of the space group and requires more computational power (Brünger, 1997). The rotational search only depends on which of the 14 Bravais lattices the space group belongs to, only the translational search distinguishes between the 65 groups available for protein crystals. Recent programs take advantage of the increase in computing power available nowadays and directly apply six dimensional searches, e.g. in combination with MonteCarlomethods (Glykos and Kokkinidis, 2000) or with maximum likelihood approaches (Read, 2001).

6.1.2 Experimental phasing The other three methods mentioned above are based on very similar ideas but the goal is achieved in different ways: If the coordinates of a small subset of atoms are known, their structure factors, i.e., amplitudes and phases, can be calculated. With their help the phases of all reflections can be deduced algebraically up to a discrete ambiguity. Using two data sets to compare with the native one the ambiguity can be resolved. Important for these approaches is that the native data differs only additively from the altered data, i.e., only in the intensities of the reflections and not their positions. From the difference the positions of the scatterers that induce the change can be determined e.g. by interpretation of the difference Patterson map. Figure 6.1, the “Harker” construction explains how the phases are found once the coordinates of the additional scatterers are found. For ideal isomorphous replacement, i.e., if the positions of the atoms in the native crystal are not disturbed, the relation holds

Fp+h = Fp + Fh (6.4)

This is an equation with complex quantities, but for Fp+h and Fp only the moduli are known up to a discrete twofold ambiguity. Figure 6.1 shows the graphical solution for the problem. Fh denotes the contribution from the additional atoms, e.g. the anomalous scatterer from a heavy (sic!) atom derivative. F denotes the contribution | p| from the native (protein only) crystal and is proportional to the square root of the measured intensities. Fp+h denotes the combined structure factor amplitude and is derived from the measured intensities from a derivative| | crystal (protein plus extra scatterers). The “Harker” construction provides the geometric solution of Equation 6.4 if only the phase of the extra scatterer and the moduli of native and derivative reflection are known. One circle is placed at the origin with radius Fp+h . The vector Fh, placed at the origin, determines the centre of the second circle with radius F . The vectors| from| the origin to the two intersections of the circles are the two possible | p| solutions for the structure factor Fp+h. Since one would like to have an isomorphous replacement between native and derivative crystal, the additional signal must be strong enough to be detected but it should introduce no or as small as possible a perturbation of the native crystal. These are the various methods that were listed earlier: 6.2. PHASE IMPROVEMENT — DENSITY MODIFICATION 37

F | p|

Fh

F | p+h|

FIGURE 6.1: Graphical solution for equation 6.4 if the phases of Fp+h and Fp are unknown. Once the co-ordinates of a small number of scatterers is known that induce a change in the diffraction pattern, the phases of all reflections can be calculated up to a discrete twofold ambiguity. |Fp|: structure factor amplitude from the native data set; |Fp+h|: structure factor amplitude from the altered data set (protein and extra scatterers); Fh: structure factor from the extra scatterers (e.g. heavy metal compound or due to anomalous dispersion).

Heavy metals. The scattering power of an atom increases with the number of its electrons. Elements like mercury (Z = 80), gold (Z = 79), or platinum (Z = 78) therefore scatter more strongly than the most common atoms in proteins, carbon, nitrogen and oxygen (Z = 6 7 8 respectively). Incorporation of heavy atoms into the crystal can be achieved by cocrystallisation or soaking of the crystal in a solution with the heavy atom. Anomalous absorption. By interaction with the incoming electromagnetic wave an electron can absorb enough energy for a shell transition within the atom if the energy is larger then the energy difference of the two shells. For many atoms, the transition energy is in the range of synchrotron radiation. To compensate for the anomalous dispersion, the scattering factor fi in Equation 6.2 needs to be decomposed into a real and an imaginary part, 0 ′ ′′ fi(λ)= fi + fi (λ)+ ifi (λ) f ′ and f ′′ have their extrema at around the transition energy but at slightly different wavelengths. Data measured at these two wavelength can be handled like two different derivatives. Two cases deserve special attention because they have the least impact on the crystal packing: 1. phosphorus and sulphur (Z = 15 16) , both atoms that occur in biological macromolecules have absorption peaks of their Kelectrons at about 5Å. Therefore data are collected at about 7 keV ˆ=17 Å at the remote wavelength where the anomalous signal is still strong enough. The disadvantage is that the signal is rather weak and very abundant data must be collected to improve the signal to noise ratio. 2. The sulphur in methionine can be replaced with selenium (Z = 34) by recombinant methods. The anomalous signal of selenium is stronger than of sulphur or phosphate and data do not need to be collected as abundantly as in the first case. Specific radiation damage. For RIP, atoms are not added but partially disordered by the Xray beam. After collection of the first data set, the crystal is exposed to some dose of radiation before a second data set is collected. Apart from general damage, some highly specific changes will occur: carboxyl groups are going to loose their definition in the electron density maps and disulphide bonds will elongate and eventually break (Ravelli et al., 2003).

6.2 Phase improvement — density modification

The initial phases are generally quite poor and it is often difficult to start building the model. The term “density modification” encompasses a set of possibilities to improve the initial phases and hence the first electron density 38 CHAPTER 6. PROTEIN CRYSTALLOGRAPHY map. The original idea was to use the first phases to find the boundaries between (disordered) solvent and protein regions in the unit cell. Setting the weights for the phases within the solvent region to zero (solvent flattening) or even to a negative value (solvent flipping) enhances the relative weights for the phases in the protein region and improves the map. There are various other ways that sometimes can be applied depending on the amount of information available for the data. Like in many other crystallographic applications, maximum likelihood has proven to be more powerful for density modification than conventional methods (Terwilliger, 2000). The program resolve uses a first order Taylor’s series of the likelihood function for the structure factors around the initial structure factors to obtain (and improve) their probability distribution.

6.2.1 Automatic model building The way resolve prebuilds a model can be quoted directly from its manual (online at http://resolve.lanl.gov): Use the best existing amplitudes, phases, and weights to calculate a map Identify locations of helices and strands using an FFTbased correlation search with a standard set of he lix/strand templates Using a library of actual helical/beta templates, find the best match to density near each helix/strand Trim the templates down to match the density Extend the templates using a template library of short fragments Assemble fragments into longer fragments

Match sidechain density to library of sidechain densities, get probability of each possible sequence align ment, choose those with very high probability Map all the fragments to one asymmetric unit so that they are as close together as possible

Write out PDB file with the fragments as a mainchain model.

6.3 TLS refinement

The program refmac5, that was used to refine the model against the data, offers the option to use TLS refinement. Normally four parameters per atom are refined for protein structures: the three coordinates and a temperature fac tor. The temperature factor is a measure of the mean displacement of an atom. Since data sets from proteins are generally not highly overdetermined (the ISWI [691:991] model has about 2,300 atoms and was refined against 38,000 reflections, that is only a fourfold overdetermination of the problem), the temperature factor is considered as isotropic, only for very high resolution data it can be refined anisotropically, introducing an extra two param eters per atom. TLS refinement applies an anisotropic temperature factor for groups of atoms. Each TLS group introduces 20 extra parameters. For the ISWI [691:991] structure, four groups were defined, which lowered the Rfreefactor by about 1%point but used only 80 more parameters. Part III

Materials and Methods

39

Chapter 7

Purification and crystallisation

7.1 Subcloning

The cDNA sequence of full–length ISWI from Drosophila melanogaster was kindly provided by D.F.V. Corona (Corona et al., 1999). All subclones were derived from this vector. For this purpose, oligos of 20–27 bp length were designed with four bases overhang at the 5’ end followed by the mutations required for restriction site, a stop codon (TAA, TGA, or TAA) for the 3’ end oligo, and filled up with sequence matching the original sequence to a length such that both oligos had similar melting temperatures. The 3’ end was ensured to end with 1–2 guanines or cytosines which is supposed to enhance initiation of the polymerase (private communication F. Michel). Oligos for PCR were ordered from MWG (http://www.mwgdna.com). Restriction sites were NcoI and KpnI for the pProExHtb vector, which leaves an extra three residues at the Nterminus after TEV–cleavage (glycine, alanine, methionine, see Section 7.2.3).

PCR. The annealing temperature Tm was chosen to be approximately 5K below the lowest (theoretical) melting temperature of either oligo and 25–30 PCRcycles were carried out. The elongation temperature was lowered from 72◦C to 68◦C for very long products (>3,000 bp).

Ligation. The PCR products were purified with the Qiagen “PCR purification kit”, digested with restriction enzymes in one reaction, purified and finally ligated into the target vector. The vector was previously digested and dephosphorylated with alkaline dephosphatase (BoehringerMannheim or Promega).

Transformation. The ligation product was directly transformed into chemically competent cells (XL1-blue, g Stratagene) according to the manufacturer’s instructions and plated on agarose plates with 50 ml ampicilline.

Analysis. were purified from colonies by minipreps (Promega or Qiagen) and analysed on agarose gels after double–digests. The sequence of the ISWI [691:991]clone (of which the structure was solved) has been confirmed by sequencing.

7.2 Expression and purification of recombinant proteins in E. coli 7.2.1 Protein expression The following are generic instructions that were applied to all constructs expressed in E. coli. Two different systems are described, based on the vectors pProEx for purification with nickel chelating agarose and pTYB4 / pMYB4 for purification with chitin beads. The host strain for the latter system is ER2566; the yield for expression of fulllength ISWI with this vector in BL21(DE3) cells is much reduced (data not shown). The expression vector pProExHtb (LifeTechnologies) carries a bacterial trcpromoter. The usage of a T7 system is therefore not required. Nevertheless, all ISWIfragments cloned into this vector were expressed in E. coli Bl21(DE3) cells carrying the (viral) T7 RNA polymerase gene. This was done for no particular reason and the expression level was satisfactory. Cultures in LB media with antibiotic(s) were inoculated with 1–10% (v/v) from an overnight culture after the media was exchanged once to remove βlactamase. This was done by centrifugation at 400 g for 10 min and × resuspension of the cell pellet in fresh media. At an optical density OD600nm 06 12 the bacteria were transferred from 37◦C to 20◦C and overnight expression was induced with 0.2–0.5≈ mM− IPTG (isopropylthio galactoside). Expression at low temperature was crucial for the solubility of the protein (see Section 10.1.1, p. 56). The bacteria were harvested by centrifugation at 4,0005,000 g for 20 minutes. The pellet was resuspended in buffer containing 20 mM Tris or Hepes, 0.5 M NaCl, optionally× 20% glycerol, pH 7–8 and 1 tablet protease

41 42 CHAPTER 7. PURIFICATION AND CRYSTALLISATION inhibitor cocktail (EDTAfree, BoehringerMannheim) per 50 ml buffer solution, 12.5 ml per litre of culture for the less expressed constructs like flISWI, 20 ml for the better expressed ones like the Cterminal fragments of ISWI. Cells were lysed by sonication at 4◦C, sometimes supported by addition of lysozyme to a concentration of g 50200 ml . The cell debris was spun down by ultracentrifugation at 29,000 g for at least 30 min. The pellet was≈ discarded, the supernatant loaded on the column material depending≥ on the× expression system as described in the following sections, i.e., Nichelating agarose for the pProEx system and chitin beads for the pMYB4 / pTYB4 system . For a gel sample, the pellet was resuspended in a volume equal to the supernatant for quantitative comparison.

7.2.2 Production of seleno-methionine substituted protein To produce selenomethionine substituted ISWI [691:991], a 10 ml preculture in LB + ampicilline was initiated from a glycerol stock of bacteria to grow overday. 1 ml of it was used to inoculate inoculate an overnight 25 ml M9 minimal media + ampicilline. The whole preculture was used to inoculate 1 l M9 minimal media + ampicilline. ◦ The bacteria were grown at 37 C to OD600nm = 06. The following amino acids were added to the given final concentration

mg L-lysine, L-phenylalanine, L-threonine: 100 l mg L-isoleucine, L-leucine, L-valine: 50 l mg L-seleno-methionine 60 l with an additional growth of 15 minutes at 37◦C. Then the culture was transferred to 20◦C and induced to 0.2mM IPTG. The M9 minimal media has the following composition:

M9salts per 1 l final concentration Na HPO 7H O 21.3 g 63.7 mM 2 4 · 2 ˆ=Na2HPO4 18.9 g KH2PO4 4 g 29.4 mM NaCl 0.67 g 11.4 mM NH4Cl 1.33 g 24.9 mM

1M MgSO4 2.67ml 2.67 mM 1M CaCl2 0.133ml 0.133 mM 20% carbon source 26.7ml 0.53% a pinch of vitamin B

The M9salts were mixed into deionised water and autoclaved, MgSO4 and CaCl2 must be autoclaved separately and added when the water has cooled down to below 50◦C. The carbon source solution must be sterilised be filtration. Glucose is suitable as carbon source, sucrose did not work with the BL21(DE3) cells.

7.2.3 Purification of 6xHis-tagged Proteins The expression vector pProExHtb provides an Nterminal 6xHistag with a subsequent recognition site for the protease from the tobacco etch virus (TEV). It cleaves the sequence ENLYFQˆG between the glutamine and the glycine. Since all clones were inserted at the NcoIsite, the three residues glycine, alanine, and methionine (GAM) preceded the target sequences (fulllength ISWI starts with Lys3 and is preceded by GAMA). TEVprotease was kindly produced and provided by A. Geerlof at the Protein Expression and Purification Facility, EMBL Heidelberg, using a clone by G. Stier (EMBL Heidelberg). The protein was bound to Ni2+chelating agarose (tridentate iminodiacetic acid from Amersham Pharmacia, 4–7.5 ml/l culture). After loading, the column was washed with approximately 15 column volumes of buffer+50 mM imidazole until no more protein was eluted (monitored by photometry at 280 nm or Bradford assay). The protein was eluted with a constant buffer gradient from 50 mM to 0.5 M imidazole in 2–3 column volumes, fraction size 0.1–0.2 column volumes. The fractions were analysed by SDSPAGE and/or a Bradford assay (the latter was pos sible due to the high expression levels and restricted use of agarose so that the material was overloaded with target protein that was therefore very pure after the washing step). Following the protocol provided, the pooled fractions were amended by 0.5 mM βmercaptoethanol1, 0.5 mM EDTA and 0.5–1% (m/m w.r.t. the target protein) TEVprotease and shaken overnight at 4◦C. The degree of cleavage was checked by SDSPAGE and the protein solution dialysed in 12,000 MWCO mem branes against 2 1l buffer without glycerol for about 1h each and then repassed through the (recycled) nickel resin to remove the× Histagged TEV protease, undigested protein and contaminants that bin unspecifically to the

1The protocol suggests DTT (dithiothreitol) instead of βmercaptoethanol, but this sometimes led to difficulties with the following Ni column despite extensive dialysis. 7.3. CRYSTALLISATION OF THE C-TERMINAL SUBLCONES OF ISWI 43 d pre TEV post TEV F/T F/T conc column strip Marker

66kDa

45kDa

31kDa

FIGURE 7.1: Effect and efficiency of the second Ni-column. The flow-through (F/T) contains the mostly pure target protein while the contaminants still present before the application of the sample (lane “post TEV”) were now retained on the resin (lane “column strip”) resin. Figure 7.1 shows the effect of the second Nicolumn. Most contaminants still visible in the input (lane “postTEV”) remained on the column (lane “column strip”) while the collected flow through contained the purified protein (lanes “F/T” and “F/T concd” after concentration). After this step the protein was separated into 0.5–1 ml ◦ mg aliquots and stored at 20 C at a concentration of 20 ml . To prevent damage, the formation of ice crystals was inhibited by addition of− 10–15% glycerol. Finally, after concentration with a 10 kDa molecular weight cutoff (MWCO) filter, the protein was transferred into a defined buffer (20mM Tris or Hepes, pH 7–7.5, 0.5 M NaCl) by gelfiltration on a Superdex 200 16/60 column (Amersham Pharmacia). The final concentration was achieved by centrifugation with 10 kDa or 5 kDa MWCO concentrators.

7.2.4 Purification with the IMPACT T7 system

The initial ISWI clone, kindly provided by D.F.V. Corona, was used with this purification protocol. It was cloned into the vector pMYB4. In vectors pMYB4 and pTYB4 (New England Biolabs), a chitin binding protein (CBP) and Intein are fused Nterminally to the target gene. The Intein can be cut off the target protein under reducing conditions. CBP can be immobilised on chitin beads and the contaminants removed by extensive washing of the column (10–15 column volumes). The cleavage is induced by flushing the column with two column volumes of buffer +30 mM DTT. After 12h, the cleaved target protein can be eluted from the column. After measuring the protein concentration by a Bradford≥ assay, the main peaks are pooled and further purified by gel filtration with a Superdex 200 16/60 (Amersham Pharmacia) column. The buffer system is the same as for the Histag protocol.

7.2.5 Concentration measurement

The concentrations of all proteins were determined by photometric measurement at λ = 280 nm. The molar extinction coefficient can be approximated with the equation

M M M εM = n 5690 + n 1280 + n 120 280 W × cm Y × cm C × cm where nW , nY , and nC are the number of tryptophanes, tyrosines and cystines (half cysteines forming disulphide bonds) respectively (Gill and Hippel, 1989). The extinction coefficient was calculated by the program ProtParam at www.expasy.ch. They are listed in Appendix B for the different clones

7.3 Crystallisation of the C-terminal sublcones of ISWI

Vapour diffusion Crystallisation trials were carried out as hanging drop with protein concentrations between 15 mg and 120 ml . The mixing ratio was usually 1:1 with 2l total drop volume, the reservoir 0.3–0.5 ml. Matrix screens I, II, and I light from Hampton Research were tested. Grid screens were both commercial (sometimes modified) and manually produced. 44 CHAPTER 7. PURIFICATION AND CRYSTALLISATION

commercial kits manually prepared PEG 6,000 vs. pH sodium malonate vs. PEG 400–PEG 2,000 MPD vs. pH sodium chloride vs. PEG 6,000 ammonium sulphate vs. pH MPD + ammonium sulphate vs. pH PEG 6000—lithium chloride vs. pH PEG 6,000 + ammonium sulphate vs. pH ammonium acetate Additive screens (Hampton Research) were used according to instructions with a reservoir solution of 20 mM Hepes, pH 7.0, 4% PEG 6,000.

Liquid phase diffusion Liquid phase diffusion was attempted in two different manners,

mg mg 1. 10 l dialysis buttons (Hampton Research) with protein concentrations from 5–30 ml (5 ml steps). The protein solution was dialysed directly against solution without sodium chloride or the salt concentration was reduced stepwise (∆ c = 01 M). − 2. glass capillaries (diameter 0.5 mm) filled with protein solution + 1% agarose2. The capillaries were sealed with wax at the bottom. The agarose column was covered with buffer (20 mM Hepes or Tris, pH 7–8, 5% PEG 6,000) and the capillary sealed with grease. The agarose was boiled to dissolve and mixed with the protein solution at below 30◦C.

Micro seeding The protein was prepared in a batch setup as hanging drop in the following way:

mg protein, 35–60 ml mixed with reservoir solution 1l 20 mM Tris or Hepes 4 l 125 mM Hepes pH 7.0 0.1 M Hepes pH 7.0 pH 7–7.5 5% PEG 6000 4%PEG 6000 0.5 M NaCl 100 mM NaCl 1l idem 9l 111 mM Hepes pH 7.0 0.1 M Hepes pH 7.0 4.44% PEG 6000 4%PEG 6000 50 mM NaCl i.e., the concentrations were chosen such that the system was near equilibrium. The top setup results in 100 mM, the bottom one in 50 mM final sodium chloride concentration. After at least one day equilibration, the drop was streakseeded with a cat whisker: a crystal (from a previous preparation) was touched with the whisker, the whisker was washed in reservoir solution and immersed into the fresh drop. Crystals were harvested within three days to avoid degradation that became visible after one to two weeks.

7.4 Harvesting and freezing of crystals

Xrays induce damage in proteins during the course of collection that negatively affects the data quality (a fact one can also profit from as in the case of radiation induced phasing, Section 6.1). The damage is reduced if the experiment is carried out at low temperature. Nowadays, protein crystals are usually frozen in a stream of nitrogen (100 K). In order to avoid the formation of ice crystals, that would both disrupt the protein crystal lattice and overexpose the detector, crystals need to be transferred into a cryosolution that contains the mother liquor (i.e. the solution the crystals were grown in) and a cryoagent (e.g. glycerol, PEG 400, see (Rodgers, 1997)). The amount of cryoagent depends on the composition of the mother liquor. The lowest possible concentration can be found directly by monitoring whether or not ice forms if a loop with cryobuffer is held into the cryostream. Otherwise some guidelines are available like the formulation sheet of the cryoscreen by Hampton Research at www.hamptonresearch.com. Some crystals support an immediate transfer into the required concentration of cryoagent, others need to be transferred stepwise. The latter was the case for both the hexagonal and the monoclinic crystal form of ISWI [691:991]. Directly transferred into 30% glycerol, 30% PEG 400, or 30% glucose resulted in cracks or dissolution. Therefore, crystals were transferred by 5%point increments to the final concentration. An equilibra tion of 1–5 min was awaited the steps with the longest pause between 10% and 15%. Otherwise especially large crystals were observed to get cracks. Small crystals were moved with a cryoloop, large ones with a 20lpipette so that the accompanying previous solution slowed down the concentration change. Crystals in the final cryo solution were mounted in a cryoloop of appropriate size and frozen in a bath of liquid nitrogen.

2Note that this is a different setup than suggested in Section 5.1 7.4. HARVESTING AND FREEZING OF CRYSTALS 45

Cryo Agents

1. Hexagonal crystals were frozen with 35% glycerol or 35% PEG 400. A difference between these two conditions was not detected. 2. Monoclinic crystals dissolved or cracked in 25% PEG 400, 12.5% MPD, or 12.5% Ethanol. Oil completely destroyed any diffraction. Glucose, however, was suitable for freezing. With 4–5% PEG 6000, 35% glucose were necessary to avoid the formation of ice crystals. 46 CHAPTER 7. PURIFICATION AND CRYSTALLISATION Chapter 8

Additional experiments for protein characterisation

8.1 Bandshift assays with ISWI [691:991] and cruciform DNA

Annealing of Holliday junction DNA The four different oligos for the cruciform DNA (or fourwheeljunction DNA or Holliday junction DNA) were ordered from MWG with the sequences shown in the graph below. They were assembled in a total volume of 50l under the following conditions according to the method described in (Angelov et al., 1999) ||||||||||||||||| 5’-gggccagatctggcgttag

buffer conditions annealing

ctaacgccagatctggc-3’ 10 mM TrisHCl, pH 8.0 3’ 95◦C

5’-gtcgaattcagcacgagtc gtgataccgatgcatcggc-3’ ◦

||||||||||||||||||| |||||||||||||||||| 100 mM NaCl 30’ 50 C cactcgtgctgaattcgac-3’

5’-gccgatgcatcggtatca 1 mM EDTA 30’ 44◦C ggcttacgactagttgag-3’ 5h 37◦C 2h 20◦C 5’-ggctcactagtcgtaagc |||||||||||||

Set-up and incubation Bandshift assays as described in Section 10.3 can be carried out at either constant protein or DNA concentration. In the case where the behaviour of the protein was of interest, the concentration of the DNA was varied. The total volume for the experiment was 40 l, with 50 pmol of ISWI [691:991], the DNA concentration was increased from 5 pmol to 2 nmol. The lowest possible salt concentration of 54.3 mM was limited by the high salt concentration in the initial protein solution. The amount of salt for the samples with lower protein to DNA ratio was adjusted to the same concentration; traces of EDTA came from the oligo resuspension buffer. The ISWI [691:991] stock solution contained 10 M protein and 80 mM NaCl.

Signal detection by non-denaturing PAGE The native gel with 50 mM Tris, 50 mM Glycine (pH 8.9), 5% acrylamide (75:1 polyacrylamide:Bis) was kept at 4◦C and prerun at 100 V for 0.5 h before the samples were g loaded in 10–15% glycerol. The same voltage was applied for 2.5 h before the gel was stained firstly with 0.5 ml ethidium Bromide and subsequently with Coomassie Blue to detect the protein.

8.2 Restricted proteolysis

Fulllength ISWI was treated with trypsin, chymotrypsin, subtilisin, and elastase at a 500:1 (m/m) ratio. Aliquots were taken between 5 min and 3 h and the reaction stopped by addition of PMSF. The results were analysed by 10% SDSPAGE. There were no remarkable differences between different enzymes. The clearest result, with trypsin, was used for Nterminal sequencing of the three major bands. The lowest band turned out to be a double band and could only be resolved with knowledge of the sequence (private communication Keith Ashman).

47 48 CHAPTER 8. ADDITIONAL EXPERIMENTS FOR PROTEIN CHARACTERISATION

8.3 Circular dichroism

CDspectra for both fulllength ISWI and ISWI [691:991] were recorded with a Jascow J810 spectrometer with a 1 mm cuvette. The buffer contained 50 mM potassium phosphate, pH 7.5 and 0.5 M ammonium sulphate. The buffer exchange was carried out by multiple≈ cycles of dilution and concentration. The software provided with the instrument does accumulate measurements but does not include errors in the output. Therefore several scans were recorded separately manually and the standard deviation

N (Θ Θ )2 v i σΘ = u − uX N(N 1) t i − calculated with standard scripting programs (awk, perl, shell builtins, etc). The perl script used to calculate the first derivative of the melting curve is described in Appendix C. Chapter 9

Data collection, processing and refinement

9.1 Data collection

Some tests for crystal freezing were done at the inhouse rotating anode. All data were collected at ESRF beam lines at 100K. The settings are summarised in Table 9.1. A fluorescence scan for the hexagonal crystal form was measured at ID144 with the help of R. Ravelli. MAD data from the selenomethionine substituted crystals were collected at the CRG beam line BM14 with the help of H. Belrhali. The program chooch (Evans and Pettifer, 2001) calculated peak and inflection point from the fluorescence scan. The ESRF software to control data collection is ProDC.

crystal form hexagonal monoclinic data set native SeMet native 1 native 2 subset peak infl. remote beam line ID14-4 BM14 ID14-2 ID14-4 detector ADSC Q4 MarCCD (165mm) ADSC Q4 ADSC Q4 distance (mm) 250 180 180 250 150 λ (Å) 1.0707 0.9788 0.9793 0.9184 0.9330 0.9393 ◦ ◦ ◦ ◦ ◦ ∆Φimage 0.5 1 1 2.0 0.25 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ∆Φtotal 70 297 180 127 180 308 180

TABLE 9.1: The settings of different data sets. “SeMet” stands for the MAD data collection, in the order peak, inflection point, remote. ∆Φimage gives the oscillation range per image,∆Φtotal the rotation range of the data set.

9.2 Data processing

All data were processed with xds (Kabsch, 2002). Usually the default settings were used as far as quality and rejection criteria were concerned. For the second native data set, the weighting factor that controls the number of rejected misfits in the CORRECT step was reduced from 1.5 to 1.0 to reduce the Rmeas factor and in order to compare with data processed with mosflm (compare Chapter 12). Data were recursively integrated and corrected (INTEGRATE and CORRECT steps) and the parameters (cell dimensions, axes, distances, beam divergence etc.) updated until convergence (within the given accuracy). This normally led to the lowest Rmeas. For comparison, the high resolution pass of data set “native 2” was processed with mosflm1, but the data were not used in model building or refinement. Data were scaled with xscale (part of XDS). After the decision for the final resolution limits was taken, the images were reintegrated and corrected with these limits. Scaling was carried out until all reflections not obeying the Wilson distribution (as reported in the output XSCALE.LP) were removed.

9.2.1 Generation of tagged reflections for free R calculation Within every data set used for refinement 5% of reflections for the monoclinic space group and 10% of reflections for the hexagonal space group were tagged for crossvalidation (Brünger, 1993; Brünger, 1992). Scaled reflections were converted to CNS format with xdsconv which automatically flagged 5% reflections as test set. This format could then be converted to mtzformat with f2mtz from the CCP4 program suite (Collab orative Computational Project Number 4, 1994).

1prerelease version 3.2.3b was kindly provided by Harry Powell and Olof Svensson because the official release could not handle the amount of data

49 50 CHAPTER 9. DATA COLLECTION, PROCESSING AND REFINEMENT

The initial data set was the peak from the MADcollection. The test set was attributed randomly. To maintain independence of the test set, subsequent data sets were referenced to the preceding one, i.e., the first native one to the peak data, the second native one to the first native one. Low resolution data sets were referenced to their corresponding high resolution mates. This means that reflections already tagged in the reference data set were kept tagged and 5% of the additional reflections were randomly tagged.

9.3 Phasing

9.3.1 MAD - location of Selenium-sites The selenium sites were located with the program solve (Terwilliger, 2002). The three data sets (peak, inflection point, remote) were separately merged and scaled; input scattering factors for selenium are tabulated values. The unit cell dimensions were taken from the peak data set and the resolution restricted to between 20.0 Å and 2.6 Å. solve was set up to scale the three data sets together, refine scattering factors and look for four anomalous scatterers (there are four methionines in the sequence of GAMISWI [691:991]). All but the Nterminal Selenium were found with no ambiguity, their locations supported by very good statistical values, Table 12.2.

9.4 Model building and refinement

Several programs from the CCP4 program suite were used for small tasks like merging of data sets or incorporation of phases from data set “SeMet” into the “native 1” to be used for the first few refinement cycles (Collaborative Computational Project Number 4, 1994). Most of these programs were used via the graphical user interface. Versions 4.2.1 and 4.2.2 were in use. The model was constructed by repeated cycles of refinement of the model against the data by the maximum likelihood method and visual inspection and correction. resolve (Terwilliger, 2000) was used for density modification and automatic model building. refmac5 was used for data refinement, initially including phases from the MAD experiment, later without; at the last few steps TLSrefinement was applied using the groups Ala 697–Gln 795, Gly 796–Gln 850, Asp 851– Arg 882, and Tyr 883–Glu 977. The weight between geometrical restraints and crystallographic data was set to 0.1–0.2 for data from the monoclinic crystals (high resolution data) and 0.05 for that from the hexagonal crystals. ono was used for model building (Jones et al., 1991). water molecules were initially automatically placed with arp_waters . At a later stage of refinement peaks in the difference density maps were located using peakmax and watpeak , with a threshold I 4, and σI ≥ inspected with ono. procheck was used for model validation in addition to the refmac5 statistics. turbo was initially tried but only used for a few residues (available at http://afmb.cnrsmrs.fr).

9.5 Other useful programs rasmol was used for coordinate file inspection. Figures were prepared using molscript (Kraulis, 1991), the Raster3D suite (Merritt and J., 1997) (in combination with ono and the perl script plt2mol.pl by Charlie Bond to convert density maps from ono to molscript format). Surfaces were computed with grasp or msms (Sanner et al., 1996). Graphs were produced with gnuplot, images manipulated with gimp and the ImageMag ick collection, and a lot more helpful programs available for most Unix system, like awk and perl. A few small programs, e.g. used for oligo design or preparation of data for display with gnuplot, were written in C++. Aid for interpretation of the superpositions with DNAbinding proteins was given by nucplot (Luscombe et al., 1997).

9.6 Superposition of the nucleosome and DNA

The ISWI [691:991] model and the homologues returned by the DALI were superimposed using lsqman or ono. For those homologues that contained DNA and that were fitted to the nucleosome, the DNA coordinates were merged with the ISWI [691:991] coordinates thereafter. The DNA chains of the nucleosome structure in PDB code 1kx5 run from I73 to I73 and from J73 to J73. The two base pairs highlighted in Figure 14.2(b) (the DNA 9.6. SUPERPOSITION OF THE NUCLEOSOME AND DNA 51 was merged from the structure with pdbcode 1h88. There the bases close the loophelix Sl3 (lightblue) were B19 and B20, the distal ones C8 and C9) were superimposed with base pairs (I k,I( k + 1)) and (J(k 1),Jk) with k [ 72 73] using the atoms O5′ and C1′. Due to the symmetry of the− nucleosome− it does not− matter which of∈ the− two strands from the template is superimposed with strand I and which with strand J as long as one runs through the whole interval [ 72 73]. − 52 CHAPTER 9. DATA COLLECTION, PROCESSING AND REFINEMENT Part IV

Results

53

Chapter 10

Protein characterisation

Apart from determining the crystal structure of ISWI[691:991], a few additional experiments were undertaken to characterise either this fragment or fulllength ISWI. The following three sections will describe

Restricted Proteolysis This technique was used to determine stable subdomains of fulllength ISWI that might be more suitable for crystallisation.

Circular Dichroism The secondary structure of ISWI and ISWI [691:991] could not be reliably predicted. How ever, the melting temperature of the Cterminal fragment was found.

Substrate Binding Band shift assays showed that fulllength ISWI and ISWI [691:991] bind to cruciform DNA, a substrate known to resemble the curvature of the DNA wrapped around the histone octamer of the nucleo some.

10.1 Restricted proteolysis

Fulllength ISWI is a 120kDa enzyme. Proteins of this size are unlikely to consist of just one stable core. Fur thermore, flexible regions and loops often inhibit crystallisation of proteins. Considering its complex functioning and manifold interactions with chromatin and proteins, it is not daring to assume a multiple domains composition. Domain databases like SMART or PFAM (Schultz et al., 1998; Bateman et al., 2002) confirm this notion as they predict four known domains and several lowcomplexity regions (Figure 10.1). This putatively flexible structure may prevent crystallisation of fulllength ISWI. Limited proteolysis is a common approach to determine stable fragment within a multidomain structure. The result from restricted proteolysis after treatment with trypsin is shown in Figure 10.2 on 12% SDSPAGE. The results from chymotrypsin, elastase, and subtilisin looked similar. The gel that had been sent for sequencing

FIGURE 10.1: Domain composition of ISWI as predicted by SMART. Short pink stripes are low complexity regions . The PFAM database yields a very similar result but distinguishes between the DEXDc-domain and SNF2_N-domain (see text). “SLIDE” is actually not yet a separate database entry but was defined in the present work. It is related to the “SANT”-domain. contained only 10% polyacrylamide which was not enough to visually distinguish between the bands “C1” and “C2”, but the sequencing signal could be deciphered with knowledge of the sequence of ISWI. According to the sequencing results, the two large fragments “A” and “B” cover the Nterminal two thirds of fulllength ISWI. “A” begins at residue Met8 which includes the FLAGpeptide at the Nterminus of the clone. “B” begins with residue Lys77, just before the beginning of the DEXDc domain. The molecular weights of all fragments were estimated from the gel by comparison with molecular weight standards. According to that, both band “B” (a little more than 70 kDa) and band “A” (about 80 kDa) end before the beginning of the Cterminal fragments that were contained in the lower doublet on the gel. Their starting positions are Lys693 and Ala713 and the estimated molecular weight just below 40 kDa. The Ctermini of the fragments could have been better determined by massspectroscopy; however, this requires rather clean cuts with not too large a population per visible band on the gel. Considering the quality of the gel shown in Figure 10.2 such an undertaking would not have been very promising. Another problem is the high salt concentration these fragments require for solubility. Even though mass spectroscopy is done under

55 56 CHAPTER 10. PROTEIN CHARACTERISATION

Marker 0’ 10’ 30’ 3h 116 -8–700, 80kDa 97 ≈ A 66 B 45 77–700, 67kDa ≈ C1/2 31 693/713–end, 39kDa ≈

FIGURE 10.2: Trypsin digestion of full-length ISWI and results from N-terminal sequencing. Cartoons on the right indicate the “guesstimated” length of the fragments, i.e., N-terminal starting residues result from sequencing, C-terminal ones are guessed from the marker on SDS-PAGE. Fragment “A” includes nine residues from the FLAG-peptide that was used for anti-body detection. denaturing condition, these fragments could not always be successfully precipitated, desalted, and measured (mass spectroscopy after restricted proteolysis with the ISWI [691:991] fragment was impeded by exactly these problems and the mass of the two fragments that had resulted from the proteolysis could not be determined). Fragment sizes were therefore “guesstimated” by gel electrophoresis using standard markers and subcloned with additional information from secondary structure prediction (e.g. “PedictProtein” at www.expasy.ch). The secondary structure prediction for ISWI [691:991], together with the real distribution as known from the structure, is presented in Figure 10.3, page 10.3; the prediction for fulllength ISWI can be seen in Appendix A. The agreement between prediction and the secondary structure derived from the model of ISWI [691:991] is very good, there are only a few discrepancies. The helical content of the fragment is higher than predicted, but this is mostly due to the boundaries of the properly predicted ones. The transition loop from the SANT domain to the spacer helix (Thr845–Gln850) was missed by the program which indicated a long continuous helix instead, nor was the short helix Sli correctly indicated, which is inserted at the end of the loop between helices Sl2 and Sl3 of the SLIDE domain. Overall the clones were reasonably designed. One could have cut the Cterminus a little shorter. The 14 last residues were completely disordered in the model, but on the other hand loop regions and N and Cterminal extensions can be important for folding and solubility: a clone starting at position Phe731, where the long loop region between helices H1 and H2 was removed, could not be expressed in E. coli, an indication of improper folding or aggregation (see also Appendix B, Table B.2 for a list of subclones). The Nterminal sequencing after restricted proteolysis had returned two starting points for the Cterminal fragment, Glu691 and Ala713. The helix H1 that is located between these two positions was correctly predicted.

10.1.1 Subcloning and protein expression. The results from restricted proteolysis were used to subclone both Nterminal and Cterminal clones of fulllength ISWI. Appendix B lists most of the clones prepared in the course of this work. Three things were crucial for the successful expression and purification of all fragments: expression at low temperature, high ionic strength for all buffers (0.4–0.5 M sodium chloride), and the use of tridentate iminodiacetic acid as chelating agent.

Expression at low temperature Like fulllength ISWI (Corona et al., 1999), all clones had to be expressed at 20◦C. For the Cterminal construct a comparison of expression and purification is shown in Figure 10.4. Most of the protein expressed at 37◦C is in the cell debris (pellet, lane “P”). Note that the strongest band in the “post TEV”column on the right is slightly lower than the corresponding band in “fractions 1–6”, indicating that some of the target protein was eluted from the column (and not only contaminants) and cleaved by TEV protease. An explanation for the low expression level at 37◦ could be the low thermal stability of the protein as detected by circular dichroism described in the next section.

Nickel chelating agent. It was important to note that none of the Cterminal fragments bound to nickel chelated by nitrilotriacetic acid (NiNTA sepharose, data not shown). NTA is a tetradentate chelating agent. Tridentate iminodiacetic acid (IDA) had to be used instead which coordinates the ion from only three and not four sides. It allows more generous access to the divalent ion (usually Ni2+ or Co2+) by the Histag for the cost of lesser stringency (Qiagen, 2001; Römpp, 1962).

Yield of expression. The expression vector pProExHtb worked very well in most cases. For ISWI [691:991], the expression level was on average 30–40 mg of protein per litre of media after all purification and concentration mg protein steps (60 l media were determined once after the first purification step). Most Cterminal fragments expressed at similar levels, apart from the two shortest ones, ISWI[731:991] (C241) and ISWI[741:991] (C251) that did not express detectably. The discussion of the structure in Section 13.2 shows that the region Lys721–Phe731 has two important functions: it covers the hydrophobic residues from the subsequent helix and the short loop that 10.2. CIRCULAR DICHROISM 57

....,....70...,....71...,....72...,....73...,....74...,....75 residue ERKANYAVDAYFREALRVSEPKAPKAPRPPKQPIVQDFQFFPPRLFELLDQEIYYFRKTV predicted .....HHHHHHHHHHHH..LLLLLLLLLLLLLLLL.LLLLLLHHHHHHHHHHHHHHH... structure ??????HHHHHHHHHH??????LLLLLLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHH domains H1 H2

....,....76...,....77...,....78...,....79...,....80...,....81 residue GYKVPKNTELGSDATKVQREEQRKIDEAEPLTEEEIQEKENLLSQGFTAWTKRDFNQFIK predicted ..EE.LLLLLLLL..HHHHHHHHH.LLLLLLL.HHHHHHHHHH...... HHHHHHHHH structure LLLLLLL?LLLL????HHHHHHHHHHHHHLLLHHHHHHHHHHHHLLLLLLHHHHHHHHHH domains H3 H4 Sa1

....,....82...,....83...,....84...,....85...,....86...,....87 residue ANEKYGRDDIDNIAKDVEGKTPEEVIEYNAVFWERCTELQDIERIMGQIERGEGKIQRRL predicted HHHH.L..HHHHHHHH.LLLLHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...HHHHHHH structure HHHHHHLLHHHHHHHHLLLLHHHHHHHHHHHHHHHHLLLLHHHHHHHHHHHHHHHHHHHH domains Sa2 Sa3 spa–

....,....88...,....89...,....90...,....91...,....92...,....93 residue SIKKALDQKMSRYRAPFHQLRLQYGNNKGKNYTEIEDRFLVCMLHKLGFDKENVYEELRA predicted HHHHHHHHHHH..LLL...... LLLLLLLLL....HH..HHHHH.LLL.HHHHHHHHH structure HHHHHHHHHHHHHLLHHHHHLLLLLLLLLLLLHHHHHHHHHHHHHHHHLLLLHHHHHHHH domains –cer Sl0 Sl1 Sl2

....,....94...,....95...,....96...,....97...,....98...,....99 residue AIRASPQFRFDWFIKSRTALELQRRCNTLITLIERENIELEEKERAEKKKKAPKGSVSAGS predicted HHH.LLL...... HHHHHHH.HHHHHHHH..LLHHHHHHHHHHHHH.LLLL.LLLL structure HHHHLLLLLLLHHHHHLHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH?????????????? domains Sli Sl3

FIGURE 10.3: Secondary structure prediction of ISWI[691:991] in comparison with the known values from the known structure. Only values with a reliability greater than 82% are shown. Key: “?”: residues not present in the refined structure, “H”: α-helix, “L”: loop-region, “E”: β-strand, “.”: reliability < 82%. The “domain”-entries contain names of helices as given for the structure, see Chapter 13, page 70. H1–4 “Hand” domain, Sa1–3 SANT domain, spacer spacer helix, Sl1–3 SLIDE domain. ends with Phe731 interacts with the hydrophobic core of the SANT domain. Removal of these ten residues might therefore lead to improper folding of the protein and hence to reduction of the expression level (Conte et al., 2000, for example). The Nterminal and fulllength subclones yielded less protein but were still clearly overexpressed as visible on SDSPAGE of crude extract (data not shown). However, unlike the Cterminal clones, the Nterminal ones as well as fulllength ISWI tended to aggregate throughout the purification process.

10.2 Circular dichroism

10.2.1 Measurements — wavelength scans

CDspectra were recorded for both fulllength ISWI and ISWI [691:991] with a JASCOW J810 spectrometer. However, evaluation of the wavelength scans were not useful most likely because of the high salt content that was required to keep the samples soluble. Most salts and buffers absorb in the UVrange below 200 nm making the signal useless, and at high salt concentration, as required for ISWI and ISWI [691:991], even the salts that are recommended for CD experiments like ammonium sulphate or sodium fluoride. As seen in Figure 4.1, page 32, all three conformations, αhelical, βstrands, and random coil have large extrema below 200 nm, and this peak can be important for data fitting. Figure 10.5 presents the spectra of ISWI [691:991] and ISWIK159R (the latter does not differ much from fulllength ISWI, but a better spectrum including standard deviation from ten measurements was available). According to the spectra, ISWIK159R is mostly αhelical, too. This is supported by the secondary structure prediction for fulllength ISWI (Appendix A), but as mentioned above, the information content is poor due to the lack of data at short wavelengths. With knowledge of the structure of ISWIC the data were reevaluated with a cutoff at 198nm. The suggested results did not fit the data very well; the six different models available with the instrument’s software are listed in Table 10.1. All interpretations suggest a mostly αhelical protein but they deviate a lot in the fraction and in the prediction of the other secondary structure elements so that the meaning was unclear until the structure was solved. 58 CHAPTER 10. PROTEIN CHARACTERISATION

20°C 37°C

fractions 23−30 50 0 50 fractions 1−6 Marker S/N P F/T w p. I. S/N P F/T w w post TEV Marker

FIGURE 10.4: Comparison of expression of ISWI [691:991] at 20◦C (left) and 37◦C (right) by SDS-PAGE. Abbreviations: S/N: supernatant, P: pellet, F/T: flow–through, w0, w50: washes with 0 mM, 50 mM Imidazole respectively, p.I.: post induction.

ISWIK159R data cut ISWI[691:991] λ=198nm α−lysine [°] Θ

λ=208nm λ=222nm

180 190 200 210 220 230 240 250 260 270 wavelength [nm]

K159R K159R FIGURE 10.5: CD-spectra of ISWI-C, full-length ISWI and α-lysine. Data were scaled to the ISWI data. The pseudo peaks at 198 nm result from noise due to absorption overcoming the signal below. Manual recording of ten spectra for ISWIK159R enabled plotting of error bars. ISWI [691:991] was accumulated automatically (ten scans) and the Jascow software does not calculate the errors.

10.2.2 Measurements — melting curve

The melting curve of a protein monitors the temperature dependent reduction of the signal at a fixed wavelength. It gives information about the stability of the protein, and a melting temperature Tm is given by an inflection point of the curve. Several inflection points mark several unfolding events and are indicators for a multidomain protein or an oligomeric protein. To get an error estimate the melting curve of ISWI [691:991] was measured on two separate sample preparations with four and five measurements each. Unfolding of ISWI [691:991] is irreversible (data not shown). The results of the measurement at λ = 208 nm are shown in Figure 10.6. The first derivative was calculated with a perl script described in Appendix C. The meaning of the window width is explained there, too. The small maxima at 26◦C and 57◦C are considered as artefacts due to noise. The units of the ordinate are arbitrary and not scaled to protein concentration. The first derivative shows two major inflection points, one sharp peak at 38.5◦C and a second broader one at 50◦C. Since the structure consists of three domains in two parts that do not interact (“Hand”/SANT domains vs. SLIDE domain, see Chapter 13), the two domains may unfold separately. The partial melting at 38.5◦C could explain why this fragment was not soluble if expressed at 37◦C (see Figure 10.4). For cMyb, a transcription factor with substantial structural homology to both SANT and SLIDE domain of

α β turn coil Fasman 89% -4% 0 14% Chen 37% 14% 0 48% Bolotina 45% 13% 30% 13% Chang 36% 32% 2% 30% Yang 53% 26% 0 21% Observed 89% 0 0 11%

TABLE 10.1: Secondary structure prediction for ISWI [691:991] from CD-spectra in comparison with the values found in the structure. 10.3. BINDING OF ISWI [691:991] TO CRUCIFORM DNA 59

−20 −20 4 samples slope (window = 4) 5 samples data −30 4+5 samples −30 4+5 samples

−40 −40

−50 −50

−60 −60 [°]

Θ −70 −70 slope [°/K] −80 −80

−90 −90

−100 −100

−110 −110 20 30 40 50 60 70 80 20 25 30 35 40 45 50 55 60 65 70 Temperature [°C] Temperature [°C]

(a) Data from two independent samples and averaged, (b) Overlay of 9 sample data and its first derivative (qualita- scaled together tively).

FIGURE 10.6: Melting curve and first derivative of ISWI [691:991] measured at λ = 208 nm. The two inflection points are at 38.5◦C and 50◦C respectively.

ISWI [691:991], two very different melting temperatures have been measured, too. cMyb contains three DNA binding repeats in tandem. All of them have a similar fold like the three helix bundles of SLIDE and SANT. The first and third repeat have melting temperatures of 61◦C and 57◦C respectively, but the middle repeat is less ◦ ◦ stable with Tm =43 C (Ogata et al., 1996; Sarai et al., 1993). The melting curve is broadened between 30 C and 80◦C, similar to ISWI [691:991], for which the curve stretches out between 30◦C and 60◦C. In the case of cMyb, the differences between the three repeats are due to a cavity in the hydrophobic core opened by the small hydrophobic residue Val103. The other two repeats have large hydrophobic residues, Leu51 and Ile155 at this position and mutation of the valine to a bulkier residue also renders the second repeat more stable. A corresponding cavity could not be found in either the SLIDE or the SANT domain, the corresponding residues are Phe808 and Leu910 (Figure 14.1(c), page 78) and the reason for the twostep melting curve remains unclear. The sequences of SANT and SLIDE are certainly different enough to allow for a similar interpretation, but the melting curve of ISWI [691:991] could as well be unrelated to the observations for cMyb. The SANT domain could for example be stabilised by the interaction with the “Hand” domain.

10.3 Binding of ISWI [691:991] to cruciform DNA

The curvature of cruciform DNA is similar to the curvature of the DNA at the entry / exit site of the nucleosome (Angelov et al., 1999). Electrophoretic mobility shift assays (EMSA) with ISWI [691:991] found that this fragment interacts with both double stranded and cruciform DNA, but only with the latter the experiment shows distinct bands of both DNA and protein on a native gel. Duplex DNA used as a control was only aggregated in the gel wells together with the protein. One problem of native gels with any ISWI fragment is the highly salt dependent solubility. At concentrations required for detection by Coomassie Blue staining, most of the protein precipitated. However, with an excess of DNA the protein was able to migrate and form distinct bands and eventually make the free DNA disappear. The setup of the experiment with Coomassie Blue and ethidium bromide staining at constant protein concentration allowed to detect both protein and DNA. With an excess of DNA there is only one band visible for the protein on the gel in Figure 10.7 but at least two for the DNA. It has been suggested that ISWI [691:991] forms oligomers with several protein molecules per molecule of cruciform DNA which are “dissolved” by the addition of excess DNA (private communication S. Khochbin). It remains unclear, though, why there are bands of DNA in the lanes with ratios 1:20 and 1:40 which have no protein associated with them. The bands at about 700 bp appear stronger than the corresponding bands of the lanes with ratios 1:2 and 1:4, but a signal of Coomassie Blue is missing. 60 CHAPTER 10. PROTEIN CHARACTERISATION

FIGURE 10.7: EMSA of ISWI [691:991]with Holliday junction DNA. Numbers on top of the lanes denote protein:DNA ratios. ISWI [691:991] concentration was kept constant. Left: Ethidium bromide staining to detect DNA. Right: Coomassie Blue staining of the same gel to detect protein. Chapter 11

Crystallogenesis of ISWI [691:991]

11.1 Hexagonal space group

First crystals grew five days after setting up drops with ISWI [691:991] and a reservoir containing (4% PEG 80001, mg 0.1 M Tris, pH 8.5) at a protein concentration of 60 ml at a 1:1 ratio. The fact that the drop volume increased with time showed that the high sodium chloride concentration in the drop (initially 0.5 M/2=0.25 M) overcame the mg dilute PEG 8000 and crystals grew by a reverse saltingin effect: at 40–60 ml protein concentration, the solubility of ISWI [691:991] decreases faster than its concentration, as salt and protein concentration in the drop are diluted with water, see Figure 11.1(a). The presence of PEG was necessary for crystallogenesis even though as little as 0.5% was sufficient to obtain crystals. Drops that were mixed with PEGfree reservoir solution, or not mixed at all mg (i.e. at a protein concentration of 60 ml ) and placed over a reservoir free of PEG did not produce any crystals. The crystals grew in clusters, seldomly as single crystals that diffracted to 9Å, Figure 11.1(b). A slight improvement was achieved by a pseudo batch setup accompanied by addition of 1% 2,5Hexanediol: 1l was mixed with 1l (4% PEG 6000, 0.1M Hepes 7.0, 4% 2,5Hexanediol) and 4l water. This yielded a few better diffracting crystals. One of them was used to collect a data set to 3.3Å resolution (data set statistics on page 65). mg ISWI [691:991] requires a high ionic strength for solubility. The protein can be concentrated to 120 ml at 0.4– mg 0.5 M sodium chloride and still be crystallised, but to less than 2 ml at a concentration of 0.2 M NaCl (formation of a pellet after centrifugation, data not shown). Standard vapour diffusion as described in Section 5.1 has the disadvantage that the protein is abruptly mixed with the reservoir solution. In the case of ISWI [691:991] this corresponds to a dilution of the ionic strength which leads to precipitation. The Cterminal fragment crystallised from precipitate, but such cases are probably exceptional and the crystallisation conditions are certainly not ideal. In order to overcome this problem, crystallisation in a capillary was tested in order to try liquid phase diffusion. Crystals could be grown but were of very similar appearance. One crystal that looked suitable for data collection dissolved when the capillary was opened. Trials with dialysis buttons where the salt content was lowered daily in steps of 0.1 M only brought about aggregation of the protein. Another attempt was to replace sodium chloride with ammonium acetate. This is a volatile salt and by the expected slow diffusion above a saltfree reservoir an improvement of crystal quality was hoped for, but no crystals were obtained this way at all.

11.2 Monoclinic space group

Without change of conditions, once a crystal of very different morphology was obtained with the pseudo batch setup described in the previous section. Despite the setup of several thousand drops, this happened only a single time. This crystal form proved to be mechanically much more stable and turned out to be suitable for micro seeding. By this method, the crystals could be easily reproduced and improved. At 50 mM sodium chloride (see methods, Section 7.3) crystals grew faster than at 100 mM, but remained smaller (probably due to the lower protein concentration). Minutes after seeding, crystals were visible under the microscope when seeded into low salt conditions. At 100 mM NaCl, crystals were often only visible the next day. The presence of PEG was important for crystal growth, but not so much its concentration. Crystals grew between 0.5% and 5% PEG 6,000. PEG with a mean molecular weight between 4,000 and 20,000 Da was success fully used for crystal growth. With low molecular weight PEG (400–1,500 Da) no crystals could be produced, and PEGs with weights above 20,000 were not tested. The Nterminally truncated constructs ISWI [713:991] and ISWI [701:991] could be microseeded using crys tals from construct ISWI [691:991] to give crystals of similar shape and nearly identical quality.

1polyethylene glycol with an average molecular weight of 8,000 Da

61 62 CHAPTER 11. CRYSTALLOGENESIS OF ISWI [691:991]

insoluble

ci salting−out

concentration solubility salting−in soluble protein

salt concentration

(a) Salting-in: since the solubility curve of the protein in this example has a concave segment, the protein concentration (together with the salt concen- tration, starting at ci) can drop below the protein solubility. The increase of solubility with salt-concentration is called “salting-in”, the opposite “salting- out”.

(b) Examples of the initial, hexagonal crystal (c) Co-crystallisation with 2,5- (d) Crystals grown in a form. The left side shows a typical cluster, the hexanediol improved diffraction 0.1 mm capillary in 1% right one a single crystal (diameter ≈ 0.1 mm) to 3.4 Å even though the crystals agarose. which diffracted to ≈ 9Å. looked smaller and less regular.

FIGURE 11.1: Crystals of ISWI-C grew by inverted salting-in. Figure 11.1(a) sketches the corresponding phase diagram; Figure 11.1(b) gives two examples of the original crystals. Addition of 2,5-hexanediol improved the crystals, but not growth by liquid phase diffusion in capillaries(Figures11.1(d),11.1(c)).

11.3 Production of crystals for phasing

11.3.1 Hexagonal space group Before the new crystal form arose, attempts were made in order to solve the structure of the poorly diffracting the hexagonal crystals (Section 11.1). Two different approaches were carried out.

1. Cocrystallisation with heavy metals. The following heavy metal compounds were tried at concentrations mg between 1 mM and 5 mM (note that a protein concentration of 60 ml corresponds to 1.6 mM protein)

P t(NH3)2Cl2 HgAc2

K2P t(CN)4 mercurochrome

K2P tCl4 phenyl mercuric hydroxide

KAuCl4 EuCl3

Soaks (instead of cocrystallisation) dissolved the crystals or resulted in flexible “crystals” in the case of mercurochrome.

2. Crystals with selenomethionine substituted protein appeared in one out of 24 drops. They dissolved upon opening of the well.

Cocrystallisation of ISWI [691:991] with P t(NH3)2Cl2 and KAuCl4 ( 0.1–0.2 mM) yielded crystals in a few cases. Only the Platinum derivative was tested on beamline ID144, but no≈ anomalous signal was detected. 11.3. PRODUCTION OF CRYSTALS FOR PHASING 63

(a) This series of pictures shows the importance of diluting the number of micro-seeds. The cat whisker was washed in reservoir solution between seeding events resulting in less and more beautiful crystals.

0.2mm 1mm 1mm

(b) Crystals grown at 50 mM NaCl suitable for data collection but also very large trunks — with defects. Higher salt concentration (100 mM) yielded larger crystals at slower growth rate.

FIGURE 11.2: Examples of crystals generated by micro-seeding. Crystals grew within minutes (without NaCl in reservoir) or 1–2 days (with 50–100mM NaCl).

11.3.2 Monoclinic space group Obtaining phase information for the second crystal form was nearly straightforward. Since by the time it occurred it was not sure whether or not it could be reproduced, the original crystal was harvested into 4% PEG 6,000, 0.1 M Hepes pH 7.0, 2 mM KAuCl4. Once the seeding method was established, both heavy metal derivatives and crystals from selenomethionine substituted protein could be grown.

conc. compound crystal shape diffraction 5.0 mM P t(NH3)2Cl2 o.k. not tested 2.0 mM K P t(CN) o.k. 4 Å 2 4 ≈ 2.0 mM K2P tCl4 coarse/cracks not tested 0.5 mM Mercurochrome o.k. 7 Å (badly frozen) 1.0 mM phenyl mercuric hydroxide still visible but wrecked not tested 1.0 mM EuCl o.k. 4 Å 3 ≈ TABLE 11.1: Diffraction tests on the inhouse rotating anode with heavy atom derivatives of the monoclinic crystal form.

Soaking in heavy metal compounds Crystals were soaked overnight with heavy metal compounds listed in Table 11.1 (all in 0.1 M Hepes pH 7.0, 5% PEG 6,000). Backsoaking was carried out by the transfer into cryo buffer (see next section) in order to remove unspecifically bound compounds. The diffraction was tested on the inhouse rotating anode. A native crystal diffracted to 3.2 Å on the rotating anode. Some of these derivatives were later tested at the synchrotron and diffracted to better≈ than 2 Å(see below).

Seeding of seleno-methionine substituted protein Other than the hexagonal crystal form, the monoclinic form could also be seeded into drops with selenomethionine substituted protein. This was done in 4% PEG 20,000 instead of PEG 6,000 for no particular reason. One of these crystals was sufficient for location of the selenium atoms by a threewavelength MAD experiment and hence to solve the phase problem. 64 CHAPTER 11. CRYSTALLOGENESIS OF ISWI [691:991] Chapter 12

Data collection and processing

12.1 Data statistics

Hexagonal crystals The 3.3 Å dataset for the hexagonal space group was collected on ID144 with the help of R. Ravelli. This protein was crystallised in the presence of P t(NH3)2Cl2, but no signal was detected with an Xray fluorescence scan. P t(NH3)2Cl2 is rather insoluble and the drop had been prepared from a suspension with a nominal concentration of 30mM. The low solubility of the compound could be the reason why not even the mother liquor where the crystals grew in showed a fluorescence signal.

data set λ resoln reflections B a 2 b c (Å) (Å) Rmeas IσI unique mult. compl. (Å ) unit cell

space group P 65 native 1.071 25–3.4 5.9% 12.0 12138 2.21 98.7% (94.7) 103.3×103.3×75.0 38.0% 3.0 918 2.16 89.5% space group C2 SeMet peak 0.8788 20–2.4 6.4% 12.3 37953 3.10 97.6% 52.9 110.1×67.2×84.5 23.4% 4.7 4031 2.87 91.1% β =125.1◦ infl. point 0.9793 20–2.4 6.9% 8.8 37774 1.89 97.2% 52.9 110.1×67.2×84.5 23.8% 3.6 3906 1.77 88.3% β =125.1◦ remote 0.9184 20–2.4 6.9% 8.7 30219 1.70 77.5% 53.6 110.1×67.3×84.5 31.0% 3.4 3469 1.65 77.8% β =125.0◦ native 1 0.9390 20–2.0 3.8% 18.4 33318 3.69 98.5% 48.5 110.0×66.8×83.5 51.8% 3.0 4481 3.71 98.4% β =124.9◦ native 2 0.9393 high 20–1.9 4.5% 16.5 38074 3.68 98.0% 44.8 109.5×66.4×82.9 33.3% 4.6 5333 3.61 97.0% β =124.4◦ low 35–3.0 6.3% 23.0 9826 6.20 98.9% 39.2 109.1×66.2×82.5 14.4% 11.4 878 6.14 96.4% β =124.3◦ mergedd 35–1.9 7.0% 17.1 38365 5.24 99.0% 46.1 109.4×66.3×82.7 33.1% 4.6 5428 3.62 98.5% β =124.4◦

TABLE 12.1: Statistics for data collection and processing. Figures for the highest resolution shell (0.1 Å wide) are given in the second line of each entry. mult.: average multiplicity of reflections, compl.: completeness of data; settings like distance can be found in Table 9.1, page 9.1

nh nh h n −1 i |Ih−Ihi| a = r h Rmeas P Pnh nh: multiplicity of reflection h h i Ihi bfrom Wilson statistics:P P data are split into resolution shells (depending on Θ) and the mean values per shell fitted to Θ logI = A − 2B sin2 fi λ fl

Data below 3Å resolution are required for reasonable fitting, i.e., the value for the hexagonal space group is not meaningful. c ◦ ◦ ◦ for P 65: α = β = 90 γ = 120 , for C2: α = γ = 90 ddata were merged prior to scaling; the cell dimensions are the means weighted with the number of reflections

Monoclinic crystals Some of the heavy metal derivatives listed in Table 11.1 were measured at ID142. Most likely due to liquid nitrogen dripping on the crystals, none of the data sets could be successfully merged despite the good quality of the diffraction images and diffraction beyond 2 Å. The MAD data sets were collected at the

65 66 CHAPTER 12. DATA COLLECTION AND PROCESSING

CRG (Collaborating Research Group) beam line BM14 with the help of H. Belrhali. Peak and inflection point of the fluorescence scan were calculated with the program chooch (Evans and Pettifer, 2001). To improve the data quality the peak wavelength (λ =0.9788 Å) was redundantly collected with 270◦. Only 180◦ were collected for the inflection point (λ =0.9793 Å). During collection of the remote wavelength (λ =0.9184 Å), the beam was interrupted after 127◦ resulting in only 77% completeness of this data set. It was not necessary to complete this data set. Table 12.2 reports some of the figures calculated by the program solve. Three out of four selenium sites present in the sequence of ISWI [691:991] were found. The fourth one is not part of the wildtype sequence of ISWI [691:991] but results from cloning. It is very close to the Nterminus and therefore probably disordered so that it is not surprising that its location could not be detected. Somewhat better statistics would have been achieved by using only the data sets collected at the peak and the inflection point wavelength. Several native data sets were collected. Two from construct ISWI [691:991] both at high and low resolution, and one from construct ISWI [701:991]. All crystals diffracted well below 2 Å and reflections to below 1.7 Å could be observed. Radiation damage inhibited collection to very high resolution as the crystals deteriorated before the end of data collection. More careful collection might have avoided this, but the gain of information from data within this resolution range for protein crystals is not very high.

completeness f ′ f ′′ λ total 2.6–2.73 Å input refined input refined peak 0.9788 Å 97.5% 92.7% -7 -7.4 5 4.7 infl. pt. 0.9793 Å 96.8% 91.7% -9.22 -9.7 2.5 1.3 remote 0.9184 Å 83.0% 79.4% -3 -2.9 4 3.0

site no. fractional co-ordinates occupancy x y z B-factor (Å2) 1 0.946 0.6394 0.0046 0.4441 42.1 2 0.924 0.3097 0.1249 0.4650 51.4 3 0.973 0.5533 0.1434 0.2798 60.0 Z-score=18.9, Figure of merit=52.0%

TABLE 12.2: Summary of results from the program solve. 2.6–2.73 Å denotes the highest resolution shell. f ′ and f ′′ are the scattering factors for selenium at the given wavelengths. The bottom table shows the unique solution for three selenium sites that was found with the program solve. All three have high a occupancy and reasonable B-factors.

12.2 Low resolution data

For both native data sets a high and a low resolution pass were collected (see Table 12.1). Initially, the importance of the low resolution pass was underestimated and not used for refinement. Table 12.1 shows that the Rmerge increased from 4.5% to 7.0% as low and high resolution pass of “native 2” were merged. That is why this step was initially not undertaken. Later the electron density map was compared with the map obtained from the same data set processed with the program mosflm (Leslie, 1992) and it became obvious that the low resolution data adds a lot of important information and improves the map quality; for example the loop region in the “Hand”domain (see structure description, Chapter 13) between residues Lys756 and Thr765 was better defined. One difference between the programs mosflm and xds is how they deal with overloaded reflections. xds completely rejects these reflections whereas mosflm carries out profile fitting , i.e., whatever part of a reflection is reliably measured is used to fit the overloaded region to a standard profile determined from other reflections in the same detector area. It is recommended, however, to not use this option (Dauter, 1999), especially for applications that are sensitive to low resolution data like phasing or autotracing. Overloaded reflections occur mainly at low resolution. Spots near the beam axis need a much larger angular width and need longer to fully pass through the Ewald sphere. Therefore their intensity is stronger than those at higher angular distance. Thus overloads can often not be avoided if one wants to measure to the highest possible resolution. Figure 12.1 illustrates the difference between the high and low resolution pass.

12.3 Density modification and automated building — resolve

Density modification carried out by the program resolve was of enormous help for starting model building. 86 residues corresponding to the sequence were prebuilt in five segments, six additional segments were built with 60 alanines and glycines. These parts included only two out of the three selenium atoms that were found by solve. This preliminary model and the last selenium site as anchor were sufficient to correct and complete most of the model by manual building with the program ono. After only one round of building, the model could hence be refined against the native data of data set native 1 (see Table 12.1). In order to reduce bias the experimental phases 12.3. DENSITY MODIFICATION AND AUTOMATED BUILDING — RESOLVE 67

(a) high resolution pass, detector distance 150mm, (b) low resolution pass, detector distance 250 mm, 4 s/frame, ∆Φ = 025◦; the edge of the frame corresponds 0.5 s/frame, ∆Φ = 20◦;the edge of the frame corresponds to 1.7 Å resolution, the dark water ring is at about 3.6 Å. to 2.6 Å resolution,

(c) The magnified view of the high resolution image shows clearly the overloaded reflections that reach to 4 Å resolution (reflections marked white).

FIGURE 12.1: High and low resolution pass for the same crystal orientation of data set “native 2”. For the low resolution pass (right image), the distance is much longer, the oscillation range eight times larger and overloaded reflections are marked white and reach out to 4 Å resolution. The inner dark ring in Figure 12.1(a) corresponds to the ring in Figure 12.1(b) and corresponds to about 3.6 Å resolution. 68 CHAPTER 12. DATA COLLECTION AND PROCESSING

100000 1

0.8 10000 low resolution pass high resolution pass 0.6

1000

0.4 # reflections Completeness of data 100 0.2

10 0 2 3 4 5 6 10 20

dmin [Å]

FIGURE 12.2: Distribution and completeness of reflections by resolution shells. The low resolution pass (35–3.0 Å, ∆Φ = 2◦) from data set “Native 2” is represented by blue boxes, the high resolution pass by green impulses (20–1.9 Å, ∆Φ = 0.25◦). Shells span to the next lower resolution marked, i.e., the 10Å value includes the 10–20Å bin. In addition to the absolute number of reflections (in logarithmic scale, left axis), the completeness is marked in the corresponding colour (scale at the right axis). were kept for the first few cycles of refinement/building until the model was nearly complete. Once available the model was refined against data set “native 2”. Random noise of 2Å on average was added to the coordinates of the model as it was built so far, again in order to reduce model bias.

12.4 Molecular replacement with data from the hexagonal crystal form

Having solved the structure from the monoclinic crystal form, it was thought to be straight forward to use the model for molecular replacement with the low resolution data from the initial crystal form. However, using the full fragment did not produce any hits at all. Therefore several combinations of the three domains the protein consists of where tested as search models. If the search was carried out with only the SLIDE domain, the rotation function had a clear hit. However, after the next step, the translational search, the peak flattened out and the calculated phases were not good enough to trace the model. In fact density was only seen around the SLIDE domain and there was no extra density that could have accommodated the remaining parts. Fixing the SLIDE domain and searching with the rest or parts of the rest of the model brought about similar problems. The search was carried out with the programs beast, amore, molrep, and cns_solve. Best results were achieved with beast but they were not enough to solve the structure. With beast, all hexagonal Laue groups were tested and the rotational search made clear that the space group is either P 65 or P 61 (with the rotational search one cannot distinguish between these two), and the translation search showed a slight preference if it was carried out in space group P 65 with only the SLIDE domain (residues Arg869–Glu977). Chapter 13

Description of the structure of ISWI-C

A note on the model The coordinates for the model and structure factors have been submitted to the Protein Data Base (PDB) at the EBI, UK. The entry code is 1OFC. This model was refined only against the high resolution data of data set “native 2” (see Table 9.1, p. 49); the low resolution data were only included at a later stage. The differences are only small and do not affect the following description and interpretation of the model unless otherwise stated. If model refinement comes to a satisfactory state, the second model might be submitted, too. The work was published together with of biochemical data produced in the laboratory of P. B. Becker. The structural and functional data suggest that the Cterminus of ISWI is responsible for substrate recognition of ISWI (Grüne et al., 2003). The article can be found in Appendix D. Not only ISWI [691:991] could be microseeded, crystals of similar shape and morphology could also be produced for the Nterminal truncations ISWI [701:991] and ISWI [713:991] when the seeding technique was applied (see also Section 11.2, p. 61). Data were collected from ISWI [691:991], the largest of all Cterminal constructs, and of ISWI [701:991], a truncation of 11 residues at the Nterminus. A refinement of these data and the model from ISWI [691:991] with refmac5 resulted in very similar statistics. Hence no major differences between the two constructs were found apart from the missing first residues1. These data were therefore not refined further. For statistics about the quality of the model I would like to refer the reader to the article in Appendix D. The submitted data were carefully checked with the program procheck and no serious deviations from standard mean values were found. The second, more complete data set with high and low resolution pass merged, shows some more deviations that still require correction, but as mentioned above, this model has not been finished, yet.

13.1 Overall structure

Figure 13.1 shows the model of ISWI [691:991] in ribbon representation. It contains name conventions that will be referred to throughout the following discussion. αhelices are numbered within each of the three domains from N to Cterminus with a preceding one or two letter code to identify the corresponding domain, “H” for “Hand” (blue), “Sa” for SANT (green), and “Sl” for SLIDE (yellow). The measurement of circular dichroism, Figure 10.5, indicated a mainly αhelical structure. Even though those data were not interpreted by the time of recording, the final model confirmed this result. 270 residues are folded in helices, i.e. 63% of the full sequence. The structure has a remarkable shape. It can be split into three domains that are tandem arranged. This leads to an elongated structure that appears to contain flexible hinges. However, the following sections, where each domain is described, show that there are specific interactions between the different regions that indicate a restricted flexibility. As seen from the “side”, like the right hand side of Figure 13.1, one can think of a seahorse, facing right, with the top domain (“Hand” domain) being the head, the central SANT domain its back fin, and the SLIDE domain at the bottom of the figure the curled tail. Having this analogy in mind, the left hand side of this figure shows the “front” view. The very Nterminal part has been named “Hand” domain, it runs from Ala697 to Gln795. It is immediately followed by the SANT domain, Gly796 to Leu849. The SLIDE domain, Ala885 to Glu977 (the last residue for which electron density can be seen) is separated from the Nterminal part. Inbetween lies the straight, “spacer” helix, Gln850 to Arg884. It spans 30 residues and measures nearly 50Å. The whole fragment fits in a box with side dimensions of 96 Å 48 Å 47 Å. × × The domain definition for the SANT domain matches the one from the SMART domain database (Schultz et al., 1998) that locates the SANT domain between residues Gly796 and Arg845. The SLIDE domain extends the second

1If ISWI [701:991] is refined with the model from ISWI [691:991], there is clearly negative density around the Nterminal helix H1, if it is refined against the model of ISWI [691:991] with residues 697–706 removed there is no missing electron density, i.e. in the shorter construct that helix is disordered even though the anchoring residues Tyr701 and Phe702 are present— but not the leading residues Ala697 to Ala700. This also means that the short Nterminal helix is not crucial for the overall fold of the “Hand” domain.

69 70 CHAPTER 13. DESCRIPTION OF THE STRUCTURE OF ISWI-C

FIGURE 13.1: Structure of ISWI [691:991]. The right view is rotated by 90◦ around the y-axis. The three domains and the spacer helix are distinguished by colour. Domain borders are given on the left hand side. Helices are consecutively numbered per domain from N- to C-terminus. Parts that could not be modelled are indicated in cyan. These include the N- and C-terminus and two loop regions in the “Hand” domain. 13.2. “HAND” DOMAIN — A NEW FOLD 71

“SANT” domain that was predicted to span from residue Lys898 and Leu962: In the structure a short helixloop “Sl0” precedes at the Nterminus, and the Cterminal helix is twice as long than the sequence that is part of the SANT domain consensus.

13.2 “Hand” domain — a new fold

Neither the domain databases PFAM and SMART nor the DALIserver recognise or classify the “Hand” domain, i.e., one can assume that it represents a new fold. The sequence of the “Hand” domain is only present in proteins of the ISWI family. It consists of four helices H1–H4 and long loop regions arranging it in the shape of a left hand (sic!): helix H1 is the thumb. The following loop runs antiparallel to helix H2 and both together form the palm. The second loopturnhelix (H3) motif (see below) are the fingers of the bent hand. Helix H4 crosses H2 at an angle of 30◦ and leads to the SANT domain.

(a) Helices H2 and H3 are both flanked by preceding loops. (b) The short helix H1 is embedded into a strong hydropho- Helices H3 and H4 wrap around helix H2. bic area opened by helix H2 and its preceding loop.

FIGURE 13.2: Fold and stabilisation of the “Hand” domain. See text for details.

There are two loopturnhelix motifs in the hand domain. Both helix H2 and helix H3 pack against loops that run antiparallel to them. These two loopturnhelix motifs stand approximately perpendicular to each other as shown in Figure 13.2(b). The Nterminal elongation of the loop before helix H2, marked with an asterisk in Figures 13.2(a) and 13.2(b), runs parallel the loopH3 construct. The latter is stabilised mostly through two hydrophobic couples between Tyr745 Pro780 and Phe746 Pro719. A hydrogen bond forms between Val754 and Glu772 (N Oε1, ∆ =3.0 Å). The↔ two hydrophobic pairs↔ are “crossed” in the sense that the tyrosine which comes before the↔ phenylalanine in the sequence packs against the proline that comes after the other one. This reflects the fold of the “Hand” domain where the loopH3H4 winds around H2, see Figures 13.2(a) and 13.2(b). In the crystal, the bend near these two couples packs against helix Sa1 of the SANT domain of a symmetry related molecule, either directly, like Lys753 Glu834 (2.85Å), or mediated by water molecules (see also Section 13.5). Trypsin digestion of fulllength ISWI↔ results in the Cterminal third of ISWI with two cleavage sites: Lys693 and Ala713. These flank the short helix H1. Its residues Phe702 and Tyr701 are anchored in a hydrophobic “bed” opened by the loopH2 motif with residues Phe723, Val725, Phe731, Phe736, Leu735, Leu739, and Ile743. However, absence of H1 does not alter the solubility significantly because clone ISWI [713:991], which lacks the whole helix, could be purified and concentrated similarly to ISWI [691:991]. But it is noteworthy that the missing helix did alter crystallisation behaviour: using the construct ISWI [713:991] the hexagonal crystal form was never found to grow voluminous crystals but only clusters of needles as depicted at the left hand side of Figure 11.1(b) on page 62.

13.3 SANT domain

The SANT domain of ISWI is a classical three helix bundle containing a helixturnhelix motif as it occurs in many DNAbinding proteins (Jones et al., 2003). When the SANT domain was first described (Aasland et al., 1996) the 72 CHAPTER 13. DESCRIPTION OF THE STRUCTURE OF ISWI-C authors noticed significant homology to the bona fide DNAbinding modules of cMyb, an oncogene product with the same . Due to the high sequence similarity between cMyb and SANT domains, the authors suggested that SANT domains, present in proteins that interact with chromatin, also bind DNA. Analysis of the structure presented here in combination with biochemical data strongly discourages this notion (see the following chapter). The SANT domain of ISWI is the smallest of the three domains with only about 50 residues. It interacts with the “Hand” domain through a few hydrophobic residues. Phe728 and Phe730 of the “Hand” domain bind to the strongly hydrophobic core of the SANT domain; Phe730 is thereby sandwiched by Val841 and Leu849, Phe727 faces Phe805. Additional interactions with the spacer helix, for example the hydrogen bond between Asp727

(a) Interaction between “Hand” and SANT domain (b) Hydrophobic contacts with a symmetry related molecule.

FIGURE 13.3: The interaction between “Hand” and SANT domain are of both hydrophobic and electrostatic nature. The short turn preceding helix H2 reaches into the hydrophobic core of the SANT domain with Phe730 and Phe728. The latter packs against Phe805 while the former is sandwiched between Val841 and Leu849. The oxygen of the same Pheny- lalanine also forms a hydrogen bond with Nη2. Water W17 builds bridges Asp727 and Asp851. The right Figure 13.3(b) shows the hydrophobic pocket in the SANT domain that is covered by a symmetry related molecule in the crystal packing.

(Oδ2) and Asp851 (N) via water “W17”, see Figure 13.3(a) stabilise the orientation of the domain within the molecule. By vanderWaals interaction with symmetry related molecules three hydrophobic residues are hidden from exposure to the solvent, Phe797 with Trp942, Ile836 with Phe940, and Trp843 with His888.

13.4 SLIDE domain

The SLIDE domain is separated from the Nterminal end of the fragment by the Spacer helix and the two parts of the molecule therefore have no direct contacts. The SMART data base relates the SLIDE domain to SANT domains with a rather high expectation value of about 1%, i.e., it is only distantly related to SANT domains2. As opposed to the SANT domain, the sequence of the SLIDE domain is specific to ISWIlike proteins. As defined by the structure, the SLIDE domain exceeds the SANT domain both N and Cterminally. After the spacer helix follows a long loop that leads into the main core of the SLIDE domain. The loop contains a short helix, Sl0, and packs against the third helix, Sl3. Between the second and the third helix, a loop with a short helical turn is inserted. The consensus sequence for SANT domains ends about halfway through the Cterminal helix Sl3. In the SLIDE domain, this helix is extended and crosses the spacer helix.

13.5 Solvent molecules in the structure

Solvent molecules are not only randomly distributed within the crystal but also form specific hydrogen bonds with the protein molecule. During refinement of the model it is, especially for water molecules, not always easy

2The expectation or Evalue is the inverse of the number of random structures required to find a higher degree of homology than the two compared ones, i.e., a low Evalue means a high homology. 13.5. SOLVENT MOLECULES IN THE STRUCTURE 73

FIGURE 13.4: The three helices of the SLIDE domain corresponding to the SANT domain are coloured yellow, the rest of the SLIDE domain is shown in orange. to decide whether or not an electron density peak is due to an atom or simply noise, and a model can be easily overfitted (i.e., the crystallographic Rfactor reduced) by placing virtual solvent molecules. Apart from 49 water molecules that were detected in the asymmetric unit of ISWI [691:991], two glucose and one glycerol molecule could be placed, too. The source of the glucose was the cryo buffer in which the crystals were frozen. The attentive reader may have noticed the absence of glycerol in the preparation of the crystals (see page 61), but I relied on the judgement by R. Ravelli who spotted the density of the shape typical for glycerol. There are two possible sources, either the membrane of the filter that was used for the final concentration step. Membrane filters do normally contain traces of glycerol, however, the amount is probably not enough to saturate most protein molecules in solution. The second, and more likely, possibility stems from the glycerol used to store the protein at 20◦C. Because of the large yield, not all protein was immediately processed, and in order to reduce damage− due to freezing, 10–15% glycerol were added to aliquots after the second nickelcolumn. An aliquot was purified by gel filtration, but the binding of the glycerol may have been strong enough to remain at its defined position.

FIGURE 13.5: One of the two glucose molecules. It show van-der-Waals interaction with Phe887 from a symmetry related molecule (residues from the SLIDE domain in yellow) but mostly electrostatic interactions with both water molecules and Asp818. His888 is the same that is shown in Figure 13.3(b).

One of the two glucose molecules is shown in Figure 13.5. It is located at the surface of the SANT do main where it forms hydrogen bonds to Asp818 (O1O2 Oδ1) and a water molecule. This water molecule is coordinated via direct contact with Arg817 and indirectly↔ via a second water molecule in contact to a second molecule. The carbon ring of the glucose has the proper distance for vanderWaals interaction with the aromatic ring of Phe887 of a symmetry related molecule. The majority of water molecules concentrates at the surface of the SLIDE domain. As a little gift the crystal structure of ISWI [691:991] contained a beautiful configuration of water molecules and a glycerol molecule: a water molecule sitting right on the crystallographic twofold axis forms a penta ring with distances of 2.7Å to the neighbouring water molecules as they occur in crystalline ice (Gerthsen and Vogel, 1993). The second waterring is not completely planar but distorted by about 35◦, as can be seen in the left hand side of Figure 13.6. This is 74 CHAPTER 13. DESCRIPTION OF THE STRUCTURE OF ISWI-C most likely because the corresponding water molecule forms hydrogen bonds with the N η atom from Lys898 and with the nitrogen atom from Glu904. A third, rather distorted, water ring seems to be mimicked with the aid of two oxygen atoms from the glycerol. 13.5. SOLVENT MOLECULES IN THE STRUCTURE 75

FIGURE 13.6: A special water configuration at the interface between two molecules. Dark molecules are symmetry related to the bright ones. The symmetry axis runs through the central water molecule. Fo −Fc map contoured at 2σ. The bottom figure shows the location within the structure and the symmetry axis through the water molecule.

FIGURE 13.7: Location of the special water molecule within the structure. The water ring from Figure 13.6 lies at the interface between two molecules in the crystal. The symmetry axis that runs through the central water molecule is indicated. 76 CHAPTER 13. DESCRIPTION OF THE STRUCTURE OF ISWI-C Chapter 14

Comparison with known structures

The DALI server at the EBI, Hinxton, provides an interface to find similarities between folds. Three dimensional searches are far too complex to cover all possible orientations against all known structures. The DALI program applies a MonteCarlo method to compare distance matrices of the Cα atoms of two sets of coordinates. A distance matrix contains the distances of all Cα atoms within one set and represents redundant information to reconstitute the Cα trace of a protein apart from chirality (because the distance between one atom and another is the same as the other way round, the distance matrix is identical to that derived from a mirror image of the coordinates) (Holm and Sander, 1993).

description pdb alt. Z rmsd aligned ident. “Hand”, Ala697–Gln795 19.1 σ 91 acyl-coa dehydrogenase 3mde 2.1 σ 6.2 Å 71 6% elongn factor ts fragment 1tfe 2.1 σ 4.6 Å 54 9% phosphoenolpyruvate carboxylase 1fiy 2.1 σ 3.5 Å 42 2% SANT, Gly796–Glu863 17.2 σ 68 myb proto-oncogene product 1mbe 4.5 σ 2.6 Å 48 25% engrailed homeodomain 2hdd 4.4 σ 2.7 Å 49 6% pit-1 fragment (ghf-1) 1au7 4.3 σ 2.5 Å 49 10% htrf1 (telomeric repeat binding p.) 1ba5 4.2 σ 2.4 Å 44 16% b-myb fragment 1a5j 4.1 σ 3.1 Å 50 4% Pax-6 (homeobox protein) 6pax 4.0 σ 2.6 Å 51 4% SLIDE, Gly864–Glu977 24.7 σ 113 Pax-6 (homeobox protein) 6pax idem 5.6 σ 2.2 Å 61 10% htrf1 (telomeric repeat binding p.) 1ba5 1iv6 4.6 σ 1.7 Å 50 22% myb proto-oncogene product 1mbe 2hdd 4.2 σ 1.5 Å 46 24% mrf-2 modulator recognition factor 1ig6 4.1 σ 2.3 Å 52 25% b-myb fragment 1a5j 4.1 σ 2.3 Å 52 25%

TABLE 14.1: Results of DALI search for structural homologues, separated by domain. Listed entries are: short descrip- tion, the pdb-code, alternative structure with DNA used for superposition with the nucleosome, the Z-score, the root mean square deviation of the aligned Cα atoms, the length of the sequence, the number of aligned residues, and the percentage of sequence identity within the alignment. Search carried out June 2003.

Structural homology provides an important starting point for interpreting the meaning of a structure or of structural features that may lead to functional hints. In order to improve the search results, ISWI [691:991] was split into its subdomains that were submitted separately. The summary of the search is displayed in Table 14.1 The complete list contains three entries for the “Hand”, 58 entries for the SANT and 87 entries for the SLIDE domain above the cutoff of 2 σ above expected. It shows that for the “Hand” domain no proper match was found. Pairs with a Zscore below 2σ are considered structurally dissimilar. But even though the three reported entries have a Zscore just above this limit, the root mean square deviation shows that no significant matches existed in the PDBdatabase by June 2003. Since the “Hand” domain is not even recognised as a domain by its sequence by the SMART or PFAM databases, the “Hand” domain describes a new domain with a new fold, that appears to be characteristic for ISWIlike proteins. It was described in the previous chapter, but no indications for functional meaning could be deduced for it in this work. A recent result reports that the region just before the disordered loop region before helix H3 shows homology to a patch in the Nterminal tail of histone H3 and is susceptible to acetylation (Lys756, Lys753). A peptide corresponding to the ISWIsequence is acetylated by two H3 acetylases, GCN5 and p300, but not by the H4 acetylase MOF (private communication P. Becker). . The situation is very different for both the SANT and the SLIDE domain. The similarity between these two

77 78 CHAPTER 14. COMPARISON WITH KNOWN STRUCTURES

(a) SANT and SLIDE superimposed to DNA binding repeat (b) SANT and SLIDE superimposed to pax-6 (PDB code R2 from c-Myb (pdb-code 1hdd, a homolog to pdb-code 6pax). As with c-Myb the SANT domain superimposes 1mbe, but with DNA). Note that only the N- and C-terminal only with the first and third helix. SANT: rmsd = 1.56 Å (36 helices superimpose while the middle one is out of register. atoms), SLIDE: rmsd = 1.51 Å (52 atoms). SANT: rmsd = 1.72 Å (37 atoms), SLIDE: rmsd = 1.38 Å (45 atoms).

800Sa1 820 Sa2 Sa3 840 SANT, 796-846 GFTAWTKRDFNQF IKANEKYG---RDDIDNIAKDV------egktpeEVIEYNAVFwerc------c-Myb 94-138 (R2) ---PWTKEEDQRVIKLVQKYGP-k--RWSVIAKHL------KGRIGKQCRERWHNHL------V c-Myb 146-190 (R3) ---SWTEEEDRIIYQAHKRLG----NRWAEIAKLL------PGRTDNAIKNHWNSTMR------I SLIDE, 898-975 KGKNYTEIEDRFLVCMLHKLGFDKENVYEELRAAIRASPQFRFDWFIKSRTALELQRRCNTLITLIERENIELEEKERL 900Sl1 920 Sl2 940 Si Sl3 960

(c) Sequence alignment corresponding to the superposition of SANT and SLIDE with human c-Myb, repeats 2 and 3. Secondary structure elements are indicated. Bold column: Val103 of c-Myb, R2, is responsible for its low thermal stability, compare Sec- tion 10.2.2 on p. 58, melting curve of ISWI [691:991])

800 820 840 SANT, 796-850 GFTAWTKRDFNQFIKANEKYG------RDDI-DNIAK-DVE------GKTPEEVIEYNAVFWERCTELQ------Pax-6, 72-133 KPRVATPEVVSKIAQYKQECPSI---FAWEIRDRLLSEGVCTNDN----IPSVSSINRVLRNLASEKQQ------SLIDE, 900-966 --KNYTEIEDRFLVCMLHKLGFDKENVYEELRAAIRASPQFRFDWFIKSRTALELQRRCNTLITLIERE------900 920 940 960

(d) Sequence alignment corresponding to the superposition of SANT and SLIDE with Pax-6. Boxes indicate overlapping helices.

FIGURE 14.1: Superposition of SANT and SLIDE domains with their closest structural neighbours. Orientation of DNA is shown. Both figure are oriented in the same way with respect to the SLIDE domain. Superpositions calculated with the programs ono and lsqman (Jones et al., 1991; Kleywegt, 2002). 14.1. INTERPRETATION OF STRUCTURAL HOMOLOGY 79 domains is reflected by the fact that most proteins in Table 14.1 match both segments. All of the top matches of the DALI search are DNA binding proteins that bind to the major groove via a helixturnhelix motif, and most of these are sequence specific. DNA binding proteins with the helixturnhelix motif bind to the major groove of DNA which allows them to contact specific bases, but cMyb, for example, binds both specifically and unspecifically to DNA (Lüscher and Eisenman, 1990; Graf, 1992). It was a surprise, though, that the SLIDE domain scores much better than the SANT domain because the latter reaches a much better expectation value with a sequence based domain search against other SANT domains. Furthermore, the SLIDE domain contains large loop regions that are not covered by the homologues listed in Table 14.1. Since the resemblance between the SANT domain and the cMyb DNAbinding repeats has led to the suggestion of SANT domains being DNA binding modules and since the fragment ISWI [691:991] does induce a bandshift with cruciform DNA, Section 10.3, both SANT and SLIDE were compared with some of the DALI top scores. As a result the SLIDE domain most likely presents a DNA binding module whereas DNA binding by the SANT domain can be excluded with high probability. This conclusion will be explained in the following sections.

14.1 Interpretation of structural homology — ISWI-C as a substrate recog- nition module

(a) The SANT domain contacting DNA resulting from the (b) SLIDE in the orientation resulting from the superposition superposition with the c-Myb (repeat 3)/ DNA complex . with the c-Myb / DNA complex. Base pairs used for super- position with the nucleosome are highlighted.

800Sa1 820 Sa2 Sa3 840 SANT, 796-846 GFTAWTKRDFNQFIKANEKYG---RDDIDNIAKDV------EGKTPEEVIEYNAVFWERC------c-Myb 146-190 ---SWTEEEDRIIYQAHKRLG----NRWAEIAKLL------PGRTDNAIKNHWNSTMR------SLIDE, 898-975 KGKNYTEIEDRFLVCMLHKLGFDKENVYEELRAAIRASPQFRFDWFIKSRTALELQRRCNTLITLIERENIELEEKER 900Sl1 920 Sl2 940 Si Sl3 960

(c) Sequence alignment with unfavourable residues highlighted in red and favourable ones in cyan

FIGURE 14.2: Residues of SANT and SLIDE contacting DNA. Both fragments were superimposed with c-Myb (PDB code 1h88) which oriented them with respect to the DNA from the c-Myb structure. Residues favouring DNA contacts are coloured cyan, those not supporting DNA binding red.

Since all investigated structural homologues bind DNA in a similar fashion, namely by the major groove through a helixturnhelix motif, an obvious next step was to investigate the possibility of DNA binding in this manner by either of the two domains of ISWI [691:991]. Taking the orientation of the DNA as it resulted from the superposition with cMyb (repeat 3, Figure 14.1(a)), the residues of SANT and SLIDE domains were checked for possible contacts with the DNA. The results are shown in Figure 14.2. The sequence aliment corresponding to the superpositions with cMyb highlights the residues that favour (cyan) or hinder (red) DNA binding. The figure shows the negative charge distribution of SANT and that most of its residues in proximity to the DNA stand for 80 CHAPTER 14. COMPARISON WITH KNOWN STRUCTURES a role of SANT other than DNA binding. Binding to double stranded DNA in this fashion would also result in severe stereochemical clashes between the DNA and the rest of the structure, mostly the “Hand” domain. The SANT might instead present an interface for interaction with the histone tail H3 for the SANT domain of Ada2 as suggested in (Boyer et al., 2002). As already pointed out this analysis favours the SLIDE domain as a direct DNA binding module. Only very few residues in Figure 14.2(c) are highlighted as unfavoured contacts. The presence of these negative examples is overcome by the large number of supporting residues. One has to bear in mind that the model presented here is only very coarse. The deviation of Cα atoms is rather large and the sequence homology low. Upon binding, the disfavouring residues may change conformation to more favoured locations. The same argument applies to the stereochemical clashes still present. The loops before helix Sl1 and between Sl1 and Sl2, Figure 14.2(b) may undergo slight movements to better fit to the phosphate backbone. Furthermore, the possible models were not investigated with respect to sequence specificity for the SLIDE domain. EMSA with ISWI∆SLIDE and ISWIDeltaSANT mutants could show that removal of the SANT domain does not abolish binding of the protein to cruciform DNA while a missing SLIDE domain does (Grüne et al., 2003). This confirms the idea developed above that only the SLIDE domain binds to DNA while the SANT domain seems to have a different function.

14.1.1 Consequences for nucleosome recognition by ISWI Since ISWI is a chromatin remodelling enzyme and hence interacts directly with the nucleosome, the question arises whether these results could be used to deduce some information about nucleosome recognition by ISWI. If the orientation of the SLIDE domain with respect to the DNA is correct, one can use the short fragment of double stranded DNA and superimpose it with the DNA in the structure of the nucleosome. Since the nucleosomal DNA is curved — in contrast to the short DNA fragments present in all the structures considered — only two adjacent base pairs were used in the superposition, namely those close to where helix Sl3 and the preceding loop embrace the DNA (see Figure 14.2(b), atom O5′ and C1′ were used). Even with only four atoms to be superimposed the root mean square deviation lied between 0.5 Å and 1.2Å. In the structure of cMyb1, these nucleotides are the ones that

(a) Similar orientation from three top scores of DALI. The (b) Two binding possibilities for ISWI to the nucleosome. superposition with the nucleosomal DNA would have led to I: “single-contact” conformation, II: “double-contact” confor- very similar results as the ones described here, based on mation. Only the superposition via c-Myb is shown, the for the superposition with c-Myb. htrf1 and Pax-6 the result is very similar as can be told from Figure 14.3(a)

FIGURE 14.3: The first three results from Table 14.1 superimposed via the DNA — the orientation of the ISWI [691:991] fragment does not differ much. form the strongest contacts with the protein. The ISWI [691:991] structure docked to DNA from cMyb was used to check for possible binding modes of the protein to the nucleosome. All possible superpositions were investigated — only the cMyb orientation was used because, as shown in Figure 14.3(a), the differences of orientation for the

1even though cMyb has worse a Zscore than Pax6, its rmsd is much lower and in fact the contacts between the SLIDE domain and the DNA resulting from the superposition with cMyb make a more convincing impression than those from the Pax6 like orientation 14.1. INTERPRETATION OF STRUCTURAL HOMOLOGY 81 other superpositions are small and would not have yielded new results. With this approach there are only two principal ways how ISWI could bind to the nucleosome. In all other cases, there are large stereochemical clashes between the two models that could not be avoided by rearrangements of side chains or loop regions. The two possibilities are displayed in Figure 14.3(b). In the first orientation (the “single contact” conformation, labelled “I”), only the SLIDE domain contacts the DNA. The “Hand” and SANT stand away from the DNA. They might still contact the histones in the centre of the nucleosome. If the ISWI [691:991] fragment were rotated by about 30◦ around the radial axis of the nucleosome cylinder, the entire fragment would run along the DNA, but in that case the SLIDE domain would deviate a lot from the cMyb orientation with respect to the DNA. In the second position (the “double contact” conformation, labelled “II”), the ISWI [691:991] fragment is in contact with two DNA helices (since the DNA wraps around the histone core 1 3/4 times, there are two helices next to each other along the disk apart from where the DNA enters and leaves it) via the SLIDE domain and the “Hand”/SANT domain. The ISWI [691:991] fragment resembles a clamp set against the side of the nucleosome. In both positions I and II, the Nterminal end of the ISWI [691:991] fragment, i.e., the position where it is con nected to the Nterminal part of fulllength ISWI, faces away from the nucleosome so that there are not possible stereochemical clashes. As the DNA is wrapped around the nucleosome, the fragment can be rotated around the rotation axis of the disk in discrete steps moving from one major groove to the next one. There are 14 locations where the major groove points outwards, seven if one counts symmetry related positions as one. At the “back” of the nucleosome, i.e., opposite to where the DNA leaves and enters the histone core, the ISWI [691:991] model would collide with the very Cterminal helix of histone H2B, but all other positions cannot further distinguished with this approach. To interprete the results further, one has to keep in mind two biochemical results: 1. the importance of the aminoterminal end of histone H4 (Clapier et al., 2001; Clapier et al., 2002). ISWI activity relies on the presence of the Arg17His18Arg19 motif of histone H4. And it has been shown that interaction of ISWI depends marginally on the intact SLIDE domain, less on the SANT domain (Grüne et al., 2003). 2. the similarity between the ISWI SANT domain and the Ada2 SANT domain. The SANT domain of Ada2 is crucial for the activity of the histone H3 acetylase Gcn5 (Sterner et al., 2002) and it has been proposed to interact with the H3 tail. Since the SANT domains of ISWI and Ada2 are homologues, the H3 tail might be important for ISWI, too (sequence alignment in figure 3B of (Grüne et al., 2003), see appendix D). Of the 14 possible positions there are two positions where ISWI [691:991] is close to the Nterminal tails of both histones H3 and H4, one for the “single contact” conformation, one for the “double contact” conformation. They are depicted in Figures 14.4(a) and 14.4(b) respectively. Here the SLIDE domain can contact the H4 tail and the SANT domain can contact the H3 tail (the reader should keep in mind that the structure of the nucleosome contains only short parts of the flexible histone tails that could not be entirely built). All positions derived from the superposition with cMyb have in common that the negatively charged surface of the SANT domain faces away from the DNA. In fact, the side that would be in contact with the DNA contains some basic residues that could interact with the DNA backbone. In addition to this observation a series of basic residues (Arg854, Arg861, Lys865, Arg868) all point in the same direction towards the DNA backbone as displayed in Figure 14.5. Also the “Hand” domain contributes some arginines and lysines that are candidates for DNA binding. Two of them are located in the disordered loop between Leu706 and Lys715 (Arg707 and Lys712). 82 CHAPTER 14. COMPARISON WITH KNOWN STRUCTURES

(a) The “double contact” position of ISWI [691:991] in proximity of histones H3 and H4. The SLIDE domain contacts one double helix, the SANT and “Hand” domain contact the “other” one. Right image rotated by 90◦ about the x-axis.

(b) The “single contact” position of ISWI [691:991] in proximity of histones H3 and H4. Both the SLIDE domain and the SANT and “Hand” domains contact the “same” double helix. Right image rotated by ≈ -90◦ about the y-axis and tilted towards the viewer.

FIGURE 14.4: The two possibilities where the ISWI [691:991] fragment is closest to the N-terminal tails of both histone H3 and H4. Note that the N-termini of the histones in the structure lack several residues. 14.1. INTERPRETATION OF STRUCTURAL HOMOLOGY 83

FIGURE 14.5: Supporting for the model of the orientation of the ISWI [691:991] fragment towards the DNA is the obser- vation that a series of basic residues along the spacer helix all face towards the DNA. All acidic residues of the SANT domains are shown in red; they concentrate on the part of the SANT domain that faces away from the DNA. 84 CHAPTER 14. COMPARISON WITH KNOWN STRUCTURES Chapter 15

Discussion and perspective

This work started with the objective to add structural information about dISWI to the pool of knowledge about chromatin remodelling. It resulted in the Xray structure of the Cterminal third of ISWI that covers about 300 out of 1,027 residues of the fulllength protein. Even though the structure of the Nterminal part of the protein could not be solved, the results still provide important information for the research field of chromatin remodelling. The Nterminal part of ISWI comprises the ATPase domain common to all chromatin remodelling enzymes while the Cterminus is specific to the ISWI subfamily. The structure of ISWI [691:991] presents itself as an elongated molecule with the shape resembling a seahorse. Three domains can be distinguished, the Nterminal “Hand” domain, the central SANT domain and the Cterminal SLIDE domain. The model has a total length of 90–95 Å but the body (“Hand” and SANT domains) is only about 20 Å by 30 Å wide. The Cterminal tail “curls” up with a 45 Ålong helix running orthogonally to the long axis. A nearly straight, 50 Ålong spacer helix separates the SLIDE domain from the Nterminal part while the “Hand” and the SANT domains are connected via vanderWaals and electrostatic interactions. The fragment has a remarkably inhomogeneous charge distribution. The “Hand”SANT domain has a theoretical pI of 4.6, the pI of the SLIDE domain is above 9. The surface of the SANT domain and one side of the “Hand” domain are mostly acidic, while the two lateral faces of the cylindrically shaped molecule display basic patches. The sequences of the ISWISANT and the ISWISLIDE domains are homologues to each other ; in fact the SLIDE domain is still classified as a SANT domain by the database SMART but this work strongly suggests that they are functionally different. The similarity between SANT domains and bona fide DNA binding modules like those of cMyb led to the suggestion that SANT domains were also involved in DNA binding. Recently, evidence has been published that this is not that case but instead SANT domains may contact histone tails and/or are involved in complex formation. With the combination of biochemical and structural data the two domains can now be distinguished. The SLIDE domain is more closely related to cMyb than the SANT domain despite large insertions in the sequence. The SLIDE domain is specific to ISWIlike proteins and is probably involved in DNA binding. The SANT domain occurs in several other nonremodelling related proteins. Structurally, it is only weakly related to cMyb like proteins and direct interaction with DNA via the helixturnhelix motif can now be excluded. The negative surface charge distribution of this module makes it a good candidate for interaction with the lysine and arginine rich regions of the Nterminal histone tails. The analysis of the SANT and the SLIDE domains with respect to DNA binding finally led to a model of how the Cterminal end of ISWI interacts with the nucleosome. Crystallographic models often lead to speculations, many of which have turned out to be wrong, e.g. the predicted binding of DNA to the TATA box binding protein (Nikolov et al., 1992; Kim et al., 1993b; Kim et al., 1993a), or the model of heterodimerisation between the nuclear mRNA export protein TAP and p15 (Suyama et al., 2000; Fribourg et al., 2001). In the latter example the prediction was even confirmed by mutagenesis but the model was disproven by the crystal structure. The power of pictures is strong and when presenting such speculations care must be taken that the reader does not mistake what is being presented with experimental evidence. However, hypotheses can be made if they can be tested, and future experiments must either support or reject them. The models in Figures 14.4(b) and 14.4(a) provide several starting points for an analysis.

1. Point mutations in the SLIDE domain. Comparison of the SLIDE domain with the structural homologues that bind DNA via the HTHmotif should indicate residues that would be important for sequence specific contacts and hence increase its affinity for DNA.

2. A series of basic residues on the spacer helix (Arg854, Arg861, Lys865 and Arg868) all point towards the phosphate backbone and might be important for binding. They, too, can be mutated into acidic residues to check whether they impede nucleosome binding.

3. Shortening of the spacer helix. If the “clamp” position of Figure 14.4(a) is correct, the length of the spacer helix plays an important role. It fixes the distance and relative orientation (the angle between them with

85 86 CHAPTER 15. DISCUSSION AND PERSPECTIVE

respect to the axis formed by the spacer helix) of the hooklike protrusions of the SLIDE and the “Hand” domains. Deletion mutants would disrupt this configuration but should not alter the overall structure. The first one of these possibilities would be the most promising one because it should result in a signal enhance ment. Considering the weak binding of ISWI to nucleosomal substrate that has always been difficult to detect, “negative” mutations might not be so convincing. The orientation between the ISWI [691:991] fragment and the nucleosome was chosen because it conforms to several other results previously published. It is close to both the H3 and H4 Nterminal tails and, in the model where it contacts both double helices, also to the entry site of the DNA to the nucleosome. The importance of the H4 Nterminal tail for ISWI remodelling activity has been shown directly (Clapier et al., 2001). Since the H4 tail regulates the activity of ISWI but has no influence on the activity of either SNF2/SWI2 or Mi2 complexes, there is reason to assume that the interaction takes place in the Cterminal part of ISWI because it is specific to ISWIlike chromatin remodellers. An influence of the H3 Nterminal tail has not yet been shown but was deduced from the homology between the ISWISANT domain and the Ada2SANT domain. Ada2 is part of the SAGA complex, a histone H3 acetylase. Removal of the SANT domain reduces acetylation and the SANT domain of Ada2 has therefore been suggested to recognise the histone H3 tail. The proximity to the entry/exit site of the DNA to the histone core is supported by the fact that ISWI ATPase activity requires overhanging DNA and is less active with a minimal nucleosome as substrate. One might again argue that the Nterminal part of ISWI, which is two times larger than the ISWI [691:991] fragment, might bind to this overhang. However, the ATPase domain provides the energy that is used for remodelling. Considering the stable nature of the nucleosome, most of this energy ought to be required to disrupt the histoneDNA interactions and it would be inefficient to transfer the energy from the location of hydrolysis to the carboxyl end of the enzyme. Given the symmetry of the nucleosome, a second ISWI molecule could be placed at the symmetry related position. Dimerisation of ISWI has not been reported and gel filtration indicates that ISWI by itself is a monomer, but the molecular mass of the complex CHRAC has been estimated to 600 kDa which could only be explained by two copies of its four constituents ISWI, Acf1, Chrac14, and Chrac16. A striking difference between the enzyme and the complex is the directionality of nucleosome sliding. While CHRAC moves the nucleosome to a central position, sliding by ISWI results in an end position of the nucleosome. ISWI acting as a monomer and CHRAC as a dimer would be the easiest explanation for this symmetry breakdown, but there is not experimental evidence for this notion. The affinity between ISWI [691:991] and the histone tails could in principle be investigated by NMR. With 36 kDa the entire fragment is still to large, but due to knowledge about the structure two subclones could be designed with the SLIDE and the “Hand”/SANT domains separated. With molecular weights of 15 kDa and 20 kDa respectively NMR experiments become more easily feasible. Titration with peptides of the Nterminal histone tails could show if and where specific interaction takes places. The drawback of NMR in this case is that solutions of high ionic strength lower the signal and 500 mM NaCl that required to solubilise ISWI [691:991] is far too much. The “Hand”/SANT fragment has already been cloned successfully and preliminary tests showed that it mg is soluble at 10 ml at only 100 mM NaCl. Another problem is the design of the peptides. The entire tails of H4 and H3 are still very long and expensive to produce in sufficient amounts. For H4 one should test the region around the R17H18R19 motif. For a concise answer to all the questions that arouse with analysis of the structure of ISWI [691:991] structural information about a complex between ISWI and the nucleosome will be necessary. Approaches to this surely very difficult goal may include electron microscopy where the structure of the Cterminus of ISWI, that is now available, can help locating the enzyme within the complex. The interaction between ISWI and the nucleosome are probably not very strong as indicated by the electromobility shift assays that seem to be difficult to produce. A way to “trap” the remodelling molecule in one conformation should be found which might make the complex more suitable for crystallisation. Part V

Appendix

87

Appendix A

Secondary structure prediction of full-length ISWI

The following is the secondary structure prediction of fulllength ISWI from www.expasy.ch. Probabilities below 82% are denoted by a period “.”. “H” means helix, “L” loop, “E” βstrand. The whole protein shows a very low content of βstrands. This is in agreement with the data from circular dichroism, even though those data are only weak (p. 57).

....,....1....,....2....,....3....,....4....,....5....,....6 residue MSKTDTAAVEATEENSNETTSDAATSSSGEKEAEFDNKIEADRSRRFDFLLKQTEIFTHF predicted LLL..HHHHH.HH.LLLLL.....LLLLL..HHHHHHHHHH.HHHHHHHHHHH..HHHHH

....,....7....,....8....,....9....,....10...,....11...,....12 residue MTNSAKSPTKPKGRPKKIKDKDKEKDVADHRHRKTEQEEDEELLAEDSATKEIFRFDASP predicted ...LLLLLLLLLLLLLL.L.LLLLLLLLLLLL.L..HHHHHHHH..LLLLL....E.LLL

....,....13...,....14...,....15...,....16...,....17...,....18 residue AYIKSGEMRDYQIRGLNWMISLYENGINGILADEMGLGKTLQTISLLGYLKHFKNQAGPH predicted L...LL...HHHHHHHHHHHHHHH..LL..E...LL..HHHHHHHHHHHHHHHH..LLLE

....,....19...,....20...,....21...,....22...,....23...,....24 residue IVIVPKSTLQNWVNEFKKWCPSLRAVCLIGDQDTRNTFIRDVLMPGEWDVCVTSYEMCIR predicted EEEELL...HHHHHHHHH.LLL.EEEEEE.LL...HHHHHHH..LLL..EEEE....HHH

....,....25...,....26...,....27...,....28...,....29...,....30 residue EKSVFKKFNWRYLVIDEAHRIKNEKSKLSEILREFKTANRLLITGTPLQNNLHELWALLN predicted HHHHHH..L.EEEEE...... LLLL..HHHHHHH..LL.EEEE.LL...L.HHHHHHHHH

....,....31...,....32...,....33...,....34...,....35...,....36 residue FLLPDVFNSSEDFDEWFNTNTCLGDDALITRLHAVLKPFLLRRLKAEVEKRLKPKKEMKI predicted H....LLLL.HHHHH..LL..LLL..HHHHHHHHHH.HHHHHHHHHHHHH.LLLL.EEEE

....,....37...,....38...,....39...,....40...,....41...,....42 residue FVGLSKMQRDWYTKVLLKDIDVVNGAGKVEKMRLQNILMQLRKCTNHPYLFDGAEPGPPY predicted E....HHHHHHHHHHHH...... LLLLLL..HHHHHHHHHHH..L...LLLLLLLLLL

....,....43...,....44...,....45...,....46...,....47...,....48 residue TTDTHLVYNSGKMAILDKLLPKLQEQGSRVLIFSQMTRMLDILEDYCHWRNYNYCRLDGQ predicted LLLL.....LL.HHHHHHHHHHHHH.LL.EEE....HHHHHHHHHHHH..LL..EE.LLL

....,....49...,....50...,....51...,....52...,....53...,....54 residue TPHEDRNRQIQEFNMDNSAKFLFMLSTRAGGLGINLATADVVIIYDSDWNPQMDLQAMDR predicted LL.H.HHHHHHHHLLLLLL.EEEEEEE.LLL...... LLEEEEE.LLLLLL.HHHHHHH

....,....55...,....56...,....57...,....58...,....59...,....60 residue AHRIGQKKQVRVFRLITESTVEEKIVERAEVKLRLDKMVIQGGRLVDNRSNQLNKDEMLN predicted HHH.LLLL.EEEEE...... HHHHHHHHHHHH....HH...... LL..HHHH

....,....61...,....62...,....63...,....64...,....65...,....66 residue IIRFGANQVFSSKETDITDEDIDVILERGEAKTAEQKAALDSLGESSLRTFTMDTNGEAG predicted HHHHHHHH.....LLLLL...HHHHHHH...HHHHHHHHHHH.L.HHHH...LLLLLLLL

....,....67...,....68...,....69...,....70...,....71...,....72

89 90 APPENDIX A. SECONDARY STRUCTURE PREDICTION OF FULL-LENGTH ISWI residue TSSVYQFEGEDWREKQKLNALGNWIEPPKRERKANYAVDAYFREALRVSEPKAPKAPRPP predicted L...E..LL.HHHHHHHHHH....LLLLLL.....HHHHHHHHHHHH..LLLLLLLLLLL

....,....73...,....74...,....75...,....76...,....77...,....78 residue KQPIVQDFQFFPPRLFELLDQEIYYFRKTVGYKVPKNTELGSDATKVQREEQRKIDEAEP predicted LLLLL.LLLLLLHHHHHHHHHHHHHHH.....EE.LLLLLLLL..HHHHHHHHH.LLLLL

....,....79...,....80...,....81...,....82...,....83...,....84 residue LTEEEIQEKENLLSQGFTAWTKRDFNQFIKANEKYGRDDIDNIAKDVEGKTPEEVIEYNA predicted LL.HHHHHHHHHH...... HHHHHHHHHHHHH.L..HHHHHHHH.LLLLHHHHHHHHH

....,....85...,....86...,....87...,....88...,....89...,....90 residue VFWERCTELQDIERIMGQIERGEGKIQRRLSIKKALDQKMSRYRAPFHQLRLQYGNNKGK predicted HHHHHHHHHHHHHHHHHHHH...HHHHHHHHHHHHHHHHHH..LLL...... LLLLLL

....,....91...,....92...,....93...,....94...,....95...,....96 residue NYTEIEDRFLVCMLHKLGFDKENVYEELRAAIRASPQFRFDWFIKSRTALELQRRCNTLI predicted LLL....HH..HHHHH.LLL.HHHHHHHHHHHH.LLL...... HHHHHHH.HHHH

....,....97...,....98...,....99...,....100..,....101..,....102 residue TLIERENIELEEKERAEKKKKAPKGSVSAGSGSASSNTPAPAPQPKASQKRKSEVVATSS predicted HHHH..LLHHHHHHHHHHHHH.LLLL.LLLLLLLLLLLLLLLLLLL..LLLL.EEEEE.L

....,....1 residue NSKKKKK predicted LLLLLLL

name begin end E-value name begin end E-value SEG 6 23 - SEG 614 623 - SEG 68 86 - SEG 71 723 - AT-hook 69 81 199 SANT 796 845 553 10−7 SEG 96 107 - SANT 898 962 111 × 10−2 DEXDc 124 316 394 10−43 COIL 951 980× - HELICc 461 545 71 ×10−27 SEG 971 1010 - × TABLE A.1: Domain prediction by SMART, Version 3.1. SEG: low complexity region, COIL: coiled-coil region Appendix B

List of clones

The following table provides an concise overview of most the clones prepared during the work for this thesis. The names are chosen illustratively to indicate the regions spanned by the clone:

1. either from the fragments determined after restricted proteolysis (p. 56), “B” for starting position Ala 88, which is at the end of a predicted loop between Met 69 and Thr 95, and C1 or C2 for the two trypsin cleavage sites determined by Nterminal sequencing.

2. or according to the domain names of the SMART database; AT: AThook, DEX: DEXDc, HEL: HELIc.

All clones carry glycine, alanine, methionine Nterminally to the start position (except flISWI which starts with GAMA). Solubility required expression at 20◦C and 0.4–0.5 M sodium chloride.

abbrev. subsequence residues Mr pI ε280nm 1 l [Da] [ Mcm ] [ g ] fl-ISWI 3–1027 1029 118985.4 8.45 118,630 0.997 BC 88–991 907 105651.6 7.30 119,230 1.129 N-terminal clones flDEX 2–687 369 42548.6 7.23 47,750 1.122 BC1 88–712 628 72697.1 6.54 90,520 1.245 BC2 88–685 601 69494.5 6.26 87,960 1.266 ATC2 63–685 627 72402.9 7.67 87,960 1.215 AThel 63–583 524 60848.3 8.90 75,300 1.238 ATDEX 63–367 309 35816.4 9.00 47,750 1.333 DEXhel 126–582 460 53526.1 8.90 74,020 1.383 DEXC2 126–685 563 65080.8 7.25 86,680 1.332 HelC2 369–601 321 36864.8 5.44 40,090 1.087

TABLE B.1: List of N-terminal ISWI subclones (pProEx-Htb) and characteristics. Subsequence with respect to full-lengt ISWI. fl-ISWI begins with GAMA, all remaining ones with GAM.

91 92 APPENDIX B. LIST OF CLONES

C-terminal clones subsequence residues Mr pI ε280nm expr. 1 l [Da] [ Mcm ] [ g ] C1 713–991 279 32,935.4 8.28 28,710 0.868 ++ C2 691–991 304 35,851.7 8.86 31,270 0.869 ++ C1fl 713–1027 315 36,625.6 9.33 28,710 0.784 ++ C2fl 697–1027 340 39,544.9 9.40 31,270 0.791 ++ C2-11 701–991 294 34,708.5 8.73 29,870 0.861 ++ C2-31 721–991 274 32,416.8 7.78 28,590 0.882 ++ C2-41 731–991 264 31,185.4 7.79 28,590 0.917 - C2-51 741–991 254 29,956.9 8.29 28,590 0.954 -

TABLE B.2: List of C-terminal ISWI subclones (pProEx-Htb) and characteristics. Subsequence with respect to full-lengt ISWI. For C2-11 – C2-51 start 10–50 residues after C2 (≡ C2-0). The C-terminal clone C-31 was never fully cleaved by TEV. C-41 and C-51 were successfully cloned but did not express. The naming of C1 and C2 is a little counter-intuitive since C1 begins after C2; this is due to the order the results were received. ε280 means the absorption coefficient. All but the last two clones expressed well in E. coli. Appendix C

Calculating the slope of CD data

The following perl script was used to plot the slope of the melting curve data in Section 10.2.2. In order to smoothen the output, the slope for data point n is calculated from data points within a window of width k0 to the left and to the right, i.e.

k0 1 Θn+k Θn mnk0 = − 2 k0 Tn+k Tn k=X−k0 − k=0

k0 1 Θ Θ − = n+k − n k (C.1) 2 k0τ k kX=1 where it is assumed that the temperature intervals between two adjacent points of measurment are of constant difference ∆T = τ. The script uses τ = 1 K since this was the interval of all measurements; the window width k0 is the variable $width. The script requires three mandatory arguments, the data file, the basename for the output file and the window width k0. The input data may contain comment lines beginning with a hash # and must contain three entries per line of data: the temperature, the measured CDquantity and its error (which is not used in this script). The number of data points is determined at runtime.

#!/usr/bin/perl # script to calculate mean slope and error from a data file containing # three entries per line: # comments are allowed by lines starting with "#" use strict; use warnings; my @data; # array holding data my $dpoints; # number of data points, determined from input file my ($n,$k); # running index variables my @slope; # array holding slope my $tmp; #temporary variable my $width=$ARGV[2]; # window size given on the command line sub usage{ print ("Usage: inflection.pl width\n"); print ("Data format: temperature value error(value)\n"); } if ($#ARGV lt 2) { &usage ; exit;} #script requires three parameters

# open input and output file # output naming: basename-window_size open (INPUT, "< $ARGV[0]" ) || die ("cannot open input file $ARGV[0]\n"); open (OUTPUT, "> $ARGV[1]-$ARGV[2]" )|| die ("cannot open output file $ARGV[1]\n");

#read and split input data while ($tmp=) { if (!($tmp=~ /^\#/)) {

93 94 APPENDIX C. CALCULATING THE SLOPE OF CD DATA

chomp $tmp; @data=(@data, split(/ /,$tmp, 3)) ; } } close(INPUT);

$dpoints=($#data+1)/3; #total number of datapoints

##################################################### #the central block # ##################################################### for ($n=$width; $n<($dpoints-$width); $n++){ $slope[$n]=0; for ($k=1; $k<=$width;$k++){ #calculate slope for n-th data point $slope[$n]+= ($data[3*($n+$k)+1]-$data[3*($n-$k)+1])/$k; } $slope[$n]/= 2*$width; } ##################################################### #first ’width’ datapoints need special treatment # ##################################################### for ($n=0; $n<$width;$n++){ $slope[$n]=0; #upstream of N, same as central block for ($k=1; $k<=$width;$k++){ $slope[$n]+= ($data[3*($n+$k)+1]-$data[3*$n+1])/$k; } #the first width are less for ($k=1;$k<=$n;$k++){ $slope[$n]-= ($data[3*($n-$k)+1]-$data[3*$n+1])/$k; } $slope[$n] /= 2*$width; } ##################################################### # last ’width’ data points need special treatmant # ##################################################### for ($n=$dpoints-$width;$n<$dpoints;$n++){ $slope[$n]=0; #downstream of N, same as central block for ($k=1;$k<=$width;$k++){ $slope[$n]-= ($data[3*($n-$k)+1]-$data[3*$n+1])/$k; } for ($k=1;$k<$dpoints-$n;$k++){#the last data-points $slope[$n]+= ($data[3*($n+$k)+1]-$data[3*$n+1])/$k; } $slope[$n]/= 2*$width; }

##################################################### #print results with short header of information # ##################################################### printf OUTPUT "#window width %i K\n", $ARGV[2]; printf OUTPUT "#T mean sigma\n"; for ($n=0;$n<$dpoints;$n++){ printf OUTPUT "%i %f \n", ($data[3*$n],$slope[$n]); } close(OUTPUT);

#EOF

Figure C.1 demonstrates the smoothing effect of the window size of the script applied to the data shown in Fig ure 10.6, p.59. 95

window size 1 window size 2 window size 4 window size 8 slope [°/K]

20 25 30 35 40 45 50 55 60 65 70 Temperature [°C]

FIGURE C.1: First derivative with different window sizes of 9 sample data. A thin window includes too much noise and shows too many fluctuactions. Larger windows leave the actual peaks but will result in a flat curve for too wide a window. 96 APPENDIX C. CALCULATING THE SLOPE OF CD DATA Appendix D

Article

97 Molecular Cell, Vol. 12, 449–460, August, 2003, Copyright 2003 by Cell Press Crystal Structure and Functional Analysis of a Nucleosome Recognition Module of the Remodeling Factor ISWI

Tim Gru¨ ne,1,3 Jan Brzeski,2,3,5 consuming “remodeling factors,” which alter the histone- Anton Eberharter,2 Cedric R. Clapier,1,2 DNA interactions within nucleosomes such that the DNA Davide F.V. Corona,2,4 Peter B. Becker,2,* sequence can be recognized by regulatory factors (Becker and Christoph W. Mu¨ ller1,* and Ho¨ rz, 2002; Kingston and Narlikar, 1999; Peterson, 1European Molecular Biology Laboratory 2002). The ATPases involved belong to the SWI2/SNF2 Grenoble Outstation subfamily of DEAD/H-helicases (Bork and Koonin, 1993; B.P. 181 Eisen et al., 1995) that are characterized by an ATPase F 38042 Grenoble, Cedex 9 domain made of seven characteristic protein motifs. Be- France yond their ATPase domains, enzymes in the family differ 2Adolf-Butenandt-Institut by their domain organization, their associated proteins, Molekularbiologie and the remodeling complexes they reside in as well as Ludwig-Maximilians-Universita¨ t in the remodeling phenomenology (Becker and Ho¨ rz, 80336 Mu¨ nchen 2002). One function common to all remodeling ATPases Germany so far is their ability to cause the translocation of histone octamers—their “sliding”—on DNA, suggesting that cer- tain mechanistic aspects may be shared (Becker, 2002). Summary However, how nucleosome remodeling ATPases recog- nize their substrates and how they convert conforma- Energy-dependent nucleosome remodeling emerges tional changes induced by ATP hydrolysis into disrup- as a key process endowing chromatin with dynamic tion of histone DNA interactions are unknown. properties. However, the principles by which remodel- The remodeling ATPase ISWI of Drosophila melanogas- ing ATPases interact with their nucleosome substrate ter serves as a model enzyme to elucidate basic principles to alter histone-DNA interactions are only poorly un- of the nucleosome remodeling reaction (La¨ ngst and derstood. We have identified a substrate recognition Becker, 2001b). ISWI is a conditional ATPase that hydro- domain in the C-terminal half of the remodeling lyzes ATP maximally only in the presence of nucleo- ATPase ISWI and determined its structure by X-ray somes, its natural substrate. In contrast, free histones crystallography. The structure comprises three do- do not affect the low basal level of ATP hydrolysis while mains, a four-helix domain with a novel fold and two free DNA triggers ATPase activity only modestly (Corona ␣-helical domains related to the modules of c-Myb, et al., 1999). ISWI interacts with DNA at the edge of model SANT and SLIDE, which are linked by a long helix. An nucleosomes (La¨ ngst and Becker, 2001a). Nucleosome- integrated structural and functional analysis of these induced ATP hydrolysis results in an unconstrained distor- domains provides insight into how ISWI interacts with tion of DNA that can be measured in sensitive assays the nucleosomal substrate. (Havas et al., 2000). A productive ATPase cycle not only requires nucleosomal DNA but involves the histone moi- ety as well. Mutation of a DNA-bound patch of basic Introduction amino acids within the N terminus of histone H4 renders nucleosomes inert against remodeling by ISWI (Clapier Nucleosomes, the fundamental unit of chromatin, con- et al., 2001, 2002; Hamiche et al., 2001). Integration of sist of segments of eukaryotic DNA wrapped around these observations into a coherent model of the nucleo- histone octamers. The association of DNA with the basic some remodeling process has been hampered by the histone proteins neutralizes the negatively charged ri- lack of structural information about the nucleosome in- bose-phosphodiester backbone and bends the other- teraction domains of ISWI and, in fact, of any remodeling wise fairly stiff DNA into two tight gyrations of approxi- ATPase. mately 80 bp. The DNA double helix contacts the histone This report describes a potent substrate recognition surface at 14 sites with clusters of hydrogen bonds and domain spanning 301 amino acid residues within the salt links (Davey et al., 2002; Luger et al., 1997). Collec- C-terminal half of ISWI. Determination of the crystal tively, these weak interactions render nucleosomes structure and complementary functional analyses allow rather stable particles. In vitro, access to nucleosomal us to formulate testable models for the interaction of DNA for interacting proteins is limited due to steric oc- ISWI with the nucleosome substrate. Central to nucleo- clusion and distortion of the canonical B form (Beato some recognition and binding are two modules related and Eisfeld, 1997; Kornberg and Lorch, 1999; Workman to the c-Myb DNA binding modules, SANT and SLIDE. and Kingston, 1998). In vivo, however, nucleosomal DNA These data provide the first detailed structural informa- can be rendered accessible by the action of energy- tion about a nucleosome remodeling ATPase.

*Correspondence: [email protected] (P.B.B.), Results [email protected] (C.W.M.) 3These authors contributed equally to this work. Identification of a Nucleosome Binding Domain 4Present address: Department of Molecular, Cell and Developmental Biology, University of California, Santa Cruz, California 95064. in the C Terminus of ISWI 5Present address: Institute of Biochemistry and Biophysics, Polish Drosophila melanogaster ISWI (1027 residues, 118 kDa) Academy of Sciences, Pawinskiego 5A, 02-106 Warsaw, Poland. can be roughly divided into two parts (Figure 1A). The Molecular Cell 450

Figure 1. DNA and Nucleosome Binding Properties of Full-Length ISWI, ISWI-N, and ISWI-C (A) Schematic representation of the domain structure of the ATPase ISWI. Starting resi- dues of proteolytic fragments A, B, C1, and C2 with their estimated molecular weight and boundaries of constructs ISWI-N and ISWI-C are depicted. (B) Recombinant ISWI derivatives. Full-length ISWI (FL), the N- and C-terminal fragments, were expressed in E. coli, purified, resolved by SDS-PAGE, and stained with Coomassie blue. (C) ISWI binds four-way-junction (4WJ) DNA. Binding reactions contained 4WJ DNA and ISWI-FL (31, 62, 125, 250, 500, 1000 fmols), ISWI-N, or ISWI-C (0.5, 1, 2, 4, 8, 16 pmols each). The resulting complexes were re- solved by native gel electrophoresis in 4.5% polyacrylamide and stained with SYBR- GOLD. (D) ISWI binds nucleosomes with overhang- ing DNA. Binding reactions contained mono- nucleosomes assembled on a radiolabeled 248 bp DNA fragment and ISWI-FL (16, 31, 62, 125, 250, 500, 1000 fmols), ISWI-N, or ISWI-C (0.25, 0.5, 1, 2, 4, 8, 16 pmols each). The re- sulting complexes were resolved by electro- phoresis on a 1.4% agarose gel and visual- ized by autoradiography. (E) ISWI binds nucleosome core particles. Binding reactions contained mononucleo- somes assembled on a radiolabeled 146 bp DNA fragment and either of ISWI-FL, ISWI-N, or ISWI-C (8, 16, 31, 62, 125 fmols; 0.25, 0.5, 1, 2, 4, 8 pmols). The resulting complexes were resolved by electrophoresis on a 1.4% agarose gel and visualized by autoradiography.

N-terminal half of the protein contains the SWI2/SNF2 remodeling by ISWI, we refer to this module as “SLIDE” (ATPase) domain while the C-terminal part contains two (SANT-like ISWI domain). regions related to the SANT domain (Aasland et al., Our attempts to obtain crystals of the entire ISWI mol- 1996). According to the domain classification program ecule expressed in bacteria (Corona et al., 1999) were SMART (Schultz et al., 1998), the first region (residues unsuccessful. We therefore performed limited proteoly- 796–845) is recognized as a SANT domain with a rela- sis using trypsin in order to probe the domain structure tively high score (E-value of 5.5 ϫ 10Ϫ7), whereas the of ISWI and to define stable domains that may be suit- second (residues 898–962) scores significantly lower able for crystallization and X-ray analysis. After 3 hr of (E ϭ 3.3 ϫ 10Ϫ3). This second domain contains consider- digestion we observed four major bands on a denaturat- able sequence insertions compared to the canonical ing polyacrylamide gel (data not shown), estimated their SANT domain, and these correspond to additional fea- molecular weights using standards, and determined tures in the three-dimensional structure as shown by our their N termini by sequencing. Accordingly, fragment A crystallographic results. Moreover, PSI-Blast searches (Ϸ80–85 kDa), fragment B (Ϸ70–75 kDa), and fragments with this sequence do not retrieve canonical SANT do- C2 and C1 (Ϸ35 kDa) start at ISWI residues Ϫ8 (the mains but only the corresponding regions in the ISWI full-length construct included the N-terminal FLAG-tag), proteins of various species (data not shown), suggesting Lys77, Lys693, and Ala713, respectively. Cleavages an ISWI-specific function. In order to distinguish it from were deduced to occur at about residue 700 and at the SANT and because of its importance for nucleosome N- and C-terminal end of the full-length protein (Figure Nucleosome Recognition by ISWI 451

1A). Fragments C1 and C2 lack about 40 residues from the C terminus in agreement with secondary structure predictions that identified the C-terminal 50 residues of ISWI as a low-complexity region. The proteolysis therefore suggested that the N-ter- minal ATPase domain and the C-terminal domain are each compact folding units that can be separated by cleavage around residue 700. We expressed the N-ter- minal part of ISWI (ISWI-N; residues 1–697; see Figure 1A) and the C-terminal part (ISWI-C; residues 691–991) in bacteria and purified them to homogeneity (Figure 1B). Both fragments contain structures that might serve to contact nucleosomal DNA: RNA/DNA helicases con- tact their substrates via motifs that are interspersed with other conserved features of the helicase/ATPase domain (for review see Tanner and Linder, 2001); on the other hand, the structural relatedness of SANT/SLIDE domains with c-Myb DNA binding modules (Ogata et al., 1994; Tahirov et al., 2002) might also indicate involve- ment in DNA binding. We therefore tested both ISWI fragments separately in substrate binding assays. The preferred DNA binding site for ISWI at the nucleo- Figure 2. C-Terminal Deletion of ISWI Abolishes Substrate Recog- nition somal edge (La¨ ngst and Becker, 2001a) is also preferen- The ATPase activity of 0.84 pmols of either ISWI-FL or ISWI-N was tially bound by other linker DNA binding proteins, such assayed during a time course in reactions containing no effector as histone H1 and HMGB1 (Nightingale et al., 1996). A (blue lines), or 100 ng of DNA (green lines) or the same common feature of these proteins is the ability to interact amount of DNA assembled into nucleosomes (red lines). The number with four-way junction (4WJ) DNA, a synthetic, cruciform- of ATP molecules hydrolyzed per ISWI molecule are approximate like structure that frequently serves as a model for dis- numbers calculated from the percentage of hydrolysis of a known torted DNA (Po¨ hler et al., 1998). Therefore, we tested the amount of ATP in the reaction. ability of ISWI-N and ISWI-C to interact with 4WJ DNA in a bandshift assay (Figure 1C). Full-length ISWI (ISWI-FL) Crystal Structure of ISWI-C efficiently bound the 4WJ DNA to form a series of com- We were able to obtain crystals of ISWI-C which dif- plexes, indicating interaction of several ISWI molecules fracted to better than 1.8 A˚ resolution. The structure with cruciform DNA. ISWI-N and ISWI-C also interacted was solved using seleno-methionine substituted protein with the substrate but each with about 50-fold reduced and was refined to a crystallographic R factor of 21.8% efficiency. ISWI-C formed distinct complexes in contrast (R ϭ 25.3%) using data between 20.0 and 1.9 A˚ (Table to ISWI-N which aggregated the DNA nonspecifically. free 1). The final 2Fo-Fc electron density is of excellent qual- ISWI-FL efficiently formed complexes also with nucleo- ity. Of the 301 ISWI residues present in ISWI-C, 268 somes containing linker DNA (Figure 1D). ISWI-N and residues have been included in the model (Figure 3A). ISWI-C could also bind this substrate, but with about 20- Six ISWI residues at the N-terminal and 14 residues fold reduced affinity. The agarose gel employed in these at the C-terminal end are disordered. In addition, we assays also allowed detection of interactions of ISWI and observe a small number of disordered residues in two its derivatives with nucleosome core particles, although loop regions, as depicted in Figure 3B. higher enzyme concentrations were required (Figure 1E). These results suggest contributions of elements in both Overall Structure parts of ISWI to interactions with nucleosomal DNA. ISWI-C is an ␣-helical protein with 12 helices which can The ATPase activity of full-length ISWI is stimulated be divided into four segments. The first domain (residues by including free DNA in the reaction, but the presence 697–795) contains four helices, three of which resemble of nucleosomal DNA leads to another 10-fold increase an open hand with the fourth reposing in the palm of in ATPase activity (Figure 2, upper panel; see also Co- the hand. We therefore refer to this domain as the HAND rona et al., 1999). Since ISWI-N contains a functional domain. This is immediately followed by the SANT domain ATPase domain and is also able to bind nucleosomal (residues 796–850), a 33 residue spacer helix (residues DNA, we wondered whether its ATPase activity could be 851–885), and the SLIDE domain (residues 886–977). modulated by nucleosomes or DNA. The basal ATPase Nearly the entire molecule can be enclosed within a cylin- activity could not be increased by addition of either der roughly 100 A˚ in length and 20 A˚ in diameter, except free or nucleosomal DNA at concentrations that highly at the two extremities, where elements of the HAND and stimulate the ATPase of intact ISWI (Figure 2, lower SLIDE domains form hook-like protrusions perpendicular panel), and ISWI-N could not catalyze nucleosome slid- to each other and to the cylinder axis (Figure 3A). ing (data not shown). These observations suggested the presence of a DNA/nucleosome recognition function in HAND Domain the C-terminal part of ISWI. We therefore set out to The sequence of the HAND domain showed no homol- determine the structure of ISWI-C at atomic resolution. ogy to other domains in an RPS-Blast search of the Molecular Cell 452

Table 1. Crystallographic Data and Refinement Statistics of ISWI-C

Se-Met Data

Data Statisticsa Native Data Inflection Point Peak Remote

Wavelength (A˚ ) 0.9393 0.979295 0.978776 0.918394 Resolution (A˚ ) 19–1.9 (2.0–1.9) 20–2.4 (2.5–2.4) b Rmeas (%) 4.8 (30.0) 6.9 (23.8) 6.4 (23.4) 6.9 (31.0) I/␴I 14.8 (4.1) 8.8 (3.6) 12.3 (4.7) 8.7 (3.4) Reflections Total 141,793 (20,046) 71,295 (6,930) 117,812 (11,553) 51,303 (5,727) Unique 38,275 (5,452) 37,774 (3,906) 37,953 (4,031) 30,219 (3,469) Completeness 98.6 (99.1) 97.2 (88.3) 97.6 (91.1) 77.5 (77.8) ESRF beamline ID14-4 BM14 Number sites Found/total 3⁄4 Z-scorec 18.9 Figure of meritc 0.44 Figure of merit after 0.56 solvent flattening Refinement Statistics

Resolution range (A˚ ) 19–1.9 Total number of nonhydrogen atoms 2,292 Number of protein atoms 2,214 (266 residues) Number of water molecules 49 Number of glucose molecules 2 Number of glycerol molecules 1

Rwork (%) 21.8 (36,396 reflections) d Rfree (%) 25.3 (1,876 reflections) Rms deviations Bond lengths (A˚ ) 0.019 Bond angles (Њ) 1.701 Average B factor (A˚ 2)57 a Values in parentheses are for reflections in the highest resolution bin. b 1/2 Rmeasϭ⌺h(nh/(nh Ϫ 1)) ⌺i|Ii(h) ϪϽI(h)Ͼ|/⌺h⌺iI(h,i) with I(h), mean of the i observations of reflection h; n, multiplicity of reflection h. c Calculated according to program SOLVE. d Rfree was calculated using 4.9% of the reflections.

Conserved Domain Database (Marchler-Bauer et al., helices SA1, SA2, and SA3. As predicted, it closely re- 2002), although it is well conserved within the SWI2L sembles the tandem repeats R1, R2, and R3 present in family. Furthermore, no significant similarity between the DNA binding domain of the transcription factor c-Myb the HAND domain and any known structure was de- (Ogata et al., 1994; Tahirov et al., 2002) as well as homeo- tected when we queried the Protein Data Bank using domains of eukaryotic transcription factors (Wintjens program DALI (Holm and Sander, 1993). Thus, the HAND and Rooman, 1996). Highest similarities according to domain appears to be a novel domain with a novel fold. program DALI (Holm and Sander, 1993) are observed with

Three of the four helices of the HAND domain (H2, c-Myb repeat R3 (Z-score ϭ 5.3, rmsd45CA ϭ 2.2 A˚ ) and H3, and H4) form an L-like configuration. Helix H2 runs the homeodomain of the transcription factor Engrailed antiparallel to helices H3 and H4, packing closely against (Z-score ϭ 5.0, rmsd49CA ϭ 2.5 A˚ ). The largest differences helix H4 (Figure 3A). Helix H1 reposes in the concave between the ISWI SANT domain and repeat R3 of c-Myb surface formed by these three helices and runs perpen- and the Engrailed homeodomain are observed in the loop dicular to them. The residues immediately before and between helices SA1 and SA2, which packs against the after helix H1 are disordered, explaining their sensitivity following spacer helix and helix SA2. Compared to c-Myb to proteolysis (see above; Figures 1A and 3B). Helices repeat R3, helix SA2 is shifted by about 3 A˚ toward its H1 and H2 as well as H2 and H3 are connected by two N-terminal end (Figure 4A), while the corresponding helix large excursions of 26 and 15 residues, respectively. of the Engrailed homeodomain contains an additional turn. Two aromatic residues in the first excursion (Phe728, In addition, the conserved c-Myb tryptophans in helices Phe730) partake in a hydrophobic cluster together with 2 and 3 are replaced by smaller residues (Ile820, Asn839) SANT domain residues Trp800, Phe805, Tyr838, and in the SANT domain (Figure 4B). As a result, helix SA2 Phe842. The intimacy of these interactions makes it un- packs closer against the other two helices. likely that the HAND and SANT domains can move inde- pendently with respect to each other. SLIDE Domain The core of the SLIDE domain comprises helices SL1, SANT Domain SL2, and SL3. Differences to the SANT domain include The HAND domain of ISWI-C immediately leads into a large loop preceding helix SL1, a four residue insertion the compact SANT domain, which consists of the three extending the loop between helices SL1 and SL2, a Nucleosome Recognition by ISWI 453

longer and differently oriented helix SL2, an extended Roles of SANT and SLIDE Domains loop and an additional short helix SLi inserted between for Substrate Recognition helices SL2 and SL3, and finally a much longer third Because of their similarity with the c-Myb DNA binding helix SL3 (Figure 3). The considerable differences from domain, SANT domains have been hypothesized to be canonical SANT domains contribute to the significantly DNA binding modules (Aasland et al., 1996). We evalu- lower E-score in program SMART. However, the three ated this hypothesis by superimposing the SANT and core helices SL1, SL2, and SL3 of the SLIDE domain SLIDE domain structures onto c-Myb modules R2 and also superimpose very well with c-Myb repeats (Figure R3 in complex with their DNA target sites (Ogata et al., 4A) and different homeodomains. The highest score is 1994) (Figure 4). These modules bind to DNA within observed for the Pax6 N-terminal subdomain (Z-score ϭ the major groove, where they recognize the phosphate

6.6, rmsd61CA ϭ 2.2 A˚ ). Indeed, the superposition is better backbone via a classical helix-turn-helix (HTH) motif than for the SANT domain, where helix SA2 adopts a formed by helices 2 and 3 (Figure 4A). DNA contacts slightly different orientation (see above). are made primarily by residues from the recognition helix 3 (Lys128, Gln129, Arg131, Glu132, Arg133 in Spacer Helix c-Myb repeat R2) with additional contacts formed by SANT and SLIDE domains are connected by a continu- residues at or preceding the N terminus of helices 1 and ous ␣ helix of about 50 A˚ length with nine helical turns. 2 (e.g., Arg114, Trp115, Ser116 in repeat R2). Remark- The sequence and length of this helix are highly con- ably, most corresponding residues in the ISWI SANT served across species (Figure 3). Several hydrophobic domain are acidic (Glu833, Glu834, Asp818, Asp819, residues in the SANT domain pack against conserved and Asp821; Figure 4B) and therefore incompatible with isoleucines of the spacer helix, and salt bridges are the interactions made by their c-Myb counterparts. In- formed by Arg817 and the conserved glutamates Glu860 deed, the SANT domain has a mainly negatively charged and Glu863 (Figure 3B). In the SLIDE domain helices surface (Figure 3C) and an overall negative charge (pIcalc SL1 and SL3 form a V-like structure which accommo- of 4.6), in contrast to the positively charged c-Myb re- dates the spacer helix through various hydrophobic in- peats R2 and R3 (pI of 10.0 and 10.4, respectively). teractions, while SLIDE residues Glu964 and Glu920 c-Myb also contacts the phosphate backbone through form salt bridges with residues Arg868 and Arg869, re- residue Trp95 at the N terminus of helix 1. However, in spectively. our superposition of the SANT domain, this phosphate We expect substantial domain movements of ISWI position is occupied by the side chain of residue Phe797, during catalysis in analogy to other ATPases. However, which would thus block the accessibility of the corre- the crystal structure shows that SANT and SLIDE do- sponding Trp800 to the DNA backbone. Taken together, mains are both tightly packed against the spacer helix. these observations indicate that the SANT domain is In addition, SANT and HAND domain are also tightly incompatible with the DNA recognition mode exhibited connected (see above). We therefore consider ISWI-C by c-Myb. as a rather rigid entity where the individual domains do On the other hand, the SLIDE domain appears highly not significantly move with respect to each other. The compatible with a role in DNA binding, as it displays an overall positive charge (pI of 8.3) and because residues spacer helix thus may serve as a “molecular ruler” defin- corresponding to the DNA-contacting residues of c-Myb ing the distance between the SANT and SLIDE domains. are nearly all conserved or conservatively substituted. The only exceptions are the three residues Glu926, Leu950, Charge Distribution and Glu951. However, Glu926 and Glu951 both point The calculated electrostatic surface potential of ISWI-C away from the DNA backbone and therefore probably shows a nonuniform charge distribution (Figure 3C). One do not prevent DNA binding. Leu950 protrudes from side of ISWI-C is primarily negatively charged (Figures helix SL3 and corresponds to c-Myb residues Lys128 3C, I). The central part of this surface is formed by the (in R2) and Asn179 (in R3), which make specific contacts SANT domain where several aspartates and glutamates to DNA bases. Although hydrophobic residues rarely point into the solvent assisted by three glutamates at occur in protein/DNA interfaces, they are nevertheless the N-terminal end of HAND helix H4 and six glutamates compatible with DNA binding. Indeed, in the structurally at the C-terminal end of SLIDE helix SL3. The negative related Engrailed homeodomain a leucine residue con- overall charge of this surface and in particular the region tacting DNA bases is found at a similar position (Fraenkel formed by the SANT domain makes direct interaction et al., 1998). Less favorable interactions might also re- with DNA unlikely; however, it makes it a prime candi- flect the differences between sequence-specific DNA date for the interaction with positively charged protein binding by c-Myb and sequence-independent interac- constituents as, for example, histone tails. At the oppo- tions by ISWI. site side basic residues seem to prevail, although the Additionally, residues at the N-terminal end of the charge distribution is less homogeneous (Figure 3C, II). SLIDE domain and those protruding from the SL1-SL2 Three positively charged residues in helix H1 and loop loop could conceivably also contribute to DNA binding, H1-H2 of the HAND domain, together with several argi- as they are positioned to contact the minor grooves on nine and lysine residues in the spacer helix and the either side of the major groove occupied by helix 3 SLIDE domain, form a row of basic patches. Interest- (Figure 4C). These regions might be particularly impor- ingly, most of these residues are strongly conserved tant for the recognition of distorted or bent ISWI sub- across species (Figure 3B) and might be involved in strates like 4WJ DNA or nucleosomal DNA. binding of ISWI-C to 4WJ DNA and nucleosomes (see We conclude that the SLIDE domain most likely con- below). tacts DNA target sites similar to c-Myb repeats or ho- Molecular Cell 454 Nucleosome Recognition by ISWI 455

meodomains, whereas for the SANT domain such a derivative and the ⌬SANT⌬SLIDE were unable to sup- binding mode seems unlikely. port either of the two reactions (Figures 6B and 6C). The ⌬SANT enzyme showed a low but detectable activity in Roles of SANT and SLIDE Domain the sliding assay (Figure 6B) and was still able to support for Nucleosome Remodeling the nucleosome spacing reaction (Figure 6C). Obviously, To assess the relative contributions of SANT and SLIDE the SANT module is not absolutely required for ISWI to ISWI function, we generated ISWI derivatives in which function in vitro. By contrast, our data identify SLIDE as one (⌬SANT, ⌬SLIDE) or both domains (⌬SANT⌬SLIDE) a novel module, which is absolutely required for ISWI were deleted in the context of full-length ISWI. To mini- activity. mally affect the remainder of ISWI, we replaced the mod- ules by flexible linker sequences designed to bridge the Discussion gap created by the deletion (Figure 3B). In the double mutant we substituted both domains but left the spacer Here, we present the detailed structure-function analy- helix region intact. Mutant proteins were expressed and sis of a substrate recognition domain of a nucleosome purified as before (Figure 5A) and were analyzed for remodeling factor. Our analysis led to the discovery of substrate binding in bandshift assays. Deleting the a novel, ISWI-specific DNA binding domain, SLIDE, re- SANT domain impaired neither 4WJ binding (Figure 5B) lated to the DNA binding modules of c-Myb. SLIDE is nor nucleosome binding (Figure 5C). Deleting the SLIDE connected to a SANT domain by a spacer helix to form domain strongly reduced the DNA binding capacity of a structure predicted to be rather rigid. The use of a c-Myb- ISWI, as did the double deletion (Figure 5B). Deletion of like DNA binding domain for substrate recognition is a either SANT or SLIDE individually did not affect interac- distinguishing feature of ISWI-like ATPases. The combi- tion with the nucleosome; however, deleting both do- nation of our structural and functional analysis leads to mains abolished all substrate recognition at the protein a model of how ISWI may interact with the nucleosome concentrations tested (Figure 5C). This result confirms substrate. the importance of the C-terminal structure as a substrate recognition unit and demonstrates the role for SLIDE in Modeling the Interaction of ISWI-C DNA binding. The fact that a SLIDE-deleted ISWI is un- with the Nucleosome able to bind 4WJ DNA but can still interact with nucleo- Because we showed experimentally that ISWI-C can somes suggests a contribution of histones in substrate bind to nucleosomes, we attempted to model such a recognition. complex using the available nucleosome structure (data The deletion mutants were also analyzed for their not shown; the model can be obtained from the authors ATPase activity. Deletion of the SANT domain did not upon request). Since the SLIDE domain is most impor- affect the stimulation of ISWI by free DNA, consistent tant for DNA binding, the structure of ISWI-C was mod- with the earlier conclusion that SANT does not contrib- eled to bind a DNA duplex by superimposing SLIDE with ute to DNA binding. However, the response of ISWI to c-Myb repeat R3 bound to DNA (Figure 4C). Subse- nucleosome effectors was reduced to approximately 40 quently, this complex was positioned onto the nucleoso- percent (Figure 6A). Deletion of SLIDE, however, led to mal DNA as detailed in the Experimental Procedures. a dramatic reduction of ATPase stimulation by either Most potential DNA binding sites on the nucleosome DNA or nucleosomes under these conditions (Figure (Luger et al., 1997) are excluded because either the 6A). Not surprisingly, deleting both modules essentially major groove of the nucleosomal DNA is inaccessible abolished the ATPase activity. or because steric clashes occur between ISWI-C and We next tested whether the reduced ATPase activity the nucleosome. However, at seven different positions of the mutant enzymes would still support nucleosome at the top and bottom of the nucleosomal disk, the major remodeling. For this we employed two established groove is exposed and ISWI-C can be positioned without assays: the nucleosome sliding assay, which measures steric clashes and without assuming any deviations from the movement of a single histone octamer on a short the crystal structure. At these positions ISWI-C can be DNA fragment (La¨ ngst et al., 1999), and the nucleosome clamped onto the nucleosome with the central cylinder spacing assay, which monitors the repositioning of sev- functioning as a molecular ruler of about 50 A˚ spanning eral nucleosomes with respect to each other on long, the width of the nucleosomal disk. One jaw of the clamp circular DNA (Clapier et al., 2001; Corona et al., 1999). is formed by the recognition helix of the SLIDE domain In keeping with its very low ATPase activity, the ⌬SLIDE inserted into the major groove of DNA, and the other

Figure 3. Structure of ISWI-C (A) Two orthogonal views of ISWI-C (691–991). The HAND domain is depicted in blue, SANT domain in green, SLIDE domain in yellow, and the spacer helix connecting the SANT and SLIDE domains in red. Disordered loops are depicted in gray. (A), (C), and Figures 4A and 4C were produced with the programs Molscript (Kraulis, 1991) and Raster3D (Merritt and Bacon, 1997). (B) Sequence alignment of ISWI homologs from Drosophila melanogaster (dmISWI), human (hSNF2H), yeast (scIsw1p), the Arabidopsis thaliana homolog At5g18620 (atISWI), and the SANT domain of yeast Ada2p. Conserved and conservatively substituted residues are highlighted in blue. The secondary structure elements of the dmISWI structure are indicated. SANT (796–845) and SLIDE (898–962) domains as predicted by program SMART are underlined; brackets correspond to the deletion mutants ⌬SANT and ⌬SLIDE. Dashed lines indicate disordered regions. (C) Electrostatic surface representation of ISWI-C. Depicted are the surfaces pointing away from the nucleosome (I) and facing it (II) in the hypothetical nucleosome/ISWI-C model described in the Discussion. Molecular Cell 456

Figure 4. Comparison of SANT and SLIDE Domains with DNA Binding Modules of c-Myb

(A) Stereo diagram of the CA-backbones of SANT (green, rmsd27CA ϭ 1.2 A˚ ) and SLIDE (yellow, rmsd34CA ϭ 1.1 A˚ ) domain superimposed with DNA-bound repeat R3 of c-Myb (red). For the SANT domain only helixes SA1 and SA3 were used in the superposition. (B) Structural sequence alignment of SANT and SLIDE domain with DNA binding modules R2 and R3 of c-Myb. Residues in SANT and SLIDE which allow similar contacts as in c-Myb modules R2 and R3 are indicated by green dots. Residues which do not allow similar contacts are indicated by red squares. Residues in c-Myb modules R2 and R3 contacting DNA bases and DNA backbone are highlighted in blue and yellow, respectively. Conserved and conservatively substituted residues are depicted on a light blue background. (C) Models of hypothetical complexes of SANT (left) and SLIDE (right) bound to DNA based on the superposition of both domains onto c-Myb repeat R3. Residues compatible and incompatible with DNA binding are depicted in green and red, respectively. In (B) the corresponding residues are marked by red squares or green spheres. Nucleosome Recognition by ISWI 457

Figure 5. Roles of SANT and SLIDE Domains in Substrate Binding (A) Full-length ISWI (ISWI-FL) and derivatives characterized by individual deletions of either the SANT or SLIDE domains or simultaneous deletion of both domains (⌬SANT, ⌬SLIDE, ⌬SANT⌬SLIDE, respectively) were expressed in E. coli, purified, and resolved by SDS- PAGE. (B) Binding of ISWI derivatives to 4WJ DNA. Binding reactions contained 4WJ DNA, and either full-length ISWI (ISWI-FL) or the deleted proteins as indicated (62, 125, 250, 500, and 1000 fmols). Complexes were analyzed as in Figure 1C. (C) Binding of ISWI derivatives to nucleo- somes. Binding reactions contained mono- nucleosomes reconstituted on a 248 bp DNA fragment, and either full-length ISWI (ISWI- FL) or the deleted proteins as indicated (16, 31, 62, 125, 250, 500, and 1000 fmols). Com- plexes were analyzed as in Figure 1D.

jaw is formed by the HAND domain protruding from the tion on the surface. It is therefore conceivable that the opposite side of the cylinder. In these models conserved ISWI SANT domain (perhaps together with the HAND basic residues of the SLIDE domain, the spacer helix, domain) is involved in H3 tail binding or presentation. and the HAND domain would point toward the nucleo- some, whereas the negatively charged SANT domain surface would point away from it (Figure 3C). Communication between Substrate Recognition Further information is required to limit the number of and ATPase Domains possible binding sites of ISWI-C, but we currently favor The N-terminal ATPase domain can bind structured DNA a model whereby SLIDE interacts with the major groove and nucleosomes, and this interaction may contribute to closest to the H4 tail allowing direct interactions be- nucleosome remodeling. However, by itself this binding tween H4 tail residues and ISWI-C. Nucleosome remod- remains nonproductive since it does not significantly eling by ISWI is absolutely dependent on the presence stimulate the ATPase activity and does not catalyze of a patch of the basic amino acids R17H18R19 in the nucleosome sliding (data not shown). Efficient remodel- N-terminal tail of histone H4 which are associated with ing depends on the presence of the C-terminal nucleo- nucleosomal DNA in vivo 1.5 helical turns from either some recognition domain identified here. Our proteoly- side of the dyad axis (Clapier et al., 2002; Ebralidse et sis experiments suggest that ISWI-N and ISWI-C are al., 1988). Our current functional analysis highlights the flexibly linked, a prerequisite for the dynamic intramolec- SLIDE domain as being crucial for DNA binding and ular communication between the substrate recognition ATPase activity, and in addition we observed a weak and ATPase functions we envision to occur during the interaction of ISWI with H4 tails that is sensitive to dele- nucleosome remodeling process. In our model for the tion of the SLIDE domain (J.B. and P.B.B., unpublished ISWI-C-nucleosome complex the N terminus of ISWI-C data). In this scenario ISWI-C and the exit point of the points away from the nucleosome leaving sufficient H3 tail through the channel in the DNA-superhelix would space for the large ISWI-N moiety, which may interact remain at a distance of about 40 A˚ which could still be with other parts of the nucleosome. Because a pseudo- spanned by the flexible H3 tail peptide. SANT domains dyad runs through the center of the nucleosomal parti- are crucial for the in vivo functions of the Swi3p, Ada2p, cle, the nucleosome presents two quasiequivalent bind- and Rsc8p subunits of the yeast SWI/SNF, GCN5, and ing sites for ISWI-C, which may, in principle, be bound RSC complexes (Boyer et al., 2002). Careful analyses of by two ISWI molecules. the role of SANT in Ada2p revealed that the domain was At this point our model is hypothetical and many alter- required for full HAT activity of the associated Gcn5p native arrangements may be possible. Given the inher- (Boyer et al., 2002; Sterner et al., 2002), and the presence ent dynamic aspects of the nucleosome remodeling pro- of the Ada2 SANT domain had a significant impact on cess, we also envision that the interactions of the

Gcn5’s KM for histone tail peptides, supporting the idea remodeling machinery with its substrate will change as that SANT is a histone tail binding or presentation mod- a function of -induced conformational changes ule (Boyer et al., 2002). SANT domains of ISWI and in the enzyme. Our model may thus represent a snapshot Ada2p show considerable sequence similarity (Figure taken from a series of changing enzyme-nucleosome 3B), a similar negative overall charge (SANT-Ada2p, interactions which collectively describe the nucleosome pIcalc,ϭ 5.1; SANT-ISWI, pIcalc ϭ 4.6), and charge distribu- remodeling process. Molecular Cell 458

Figure 6. Roles of SANT and SLIDE Domains in ISWI Activity (A) The ATPase activity of the indicated ISWI derivatives was tested analogously to Figure 2 during a 40 min time course in reac- tions containing either no effector (blue lines), 100 ng of plasmid DNA (green lines), or the same amount of DNA assembled into nucleo- somes (red lines). Note the different scale in the lower part of the figure. The figure sum- marizes three independent experiments. (B) Nucleosome sliding assay. As indicated, 35, 70, 140, and 280 fmols of full-length ISWI or of the ISWI derivatives were tested for their ability to move a nucleosome positioned cen- trally on a 248 bp fragment to a peripheral position. The bands corresponding to central and peripheral nucleosome positions are in- dicated to the left. The sliding reaction was stopped by the addition of competitor DNA and samples were separated by native PAGE. The radiolabeled nucleosomal DNA was de- tected by autoradiography. The asterisk marks the positions of nucleosomes moved by ISWI⌬SANT. (C) Nucleosome spacing assay. One pmol of each ISWI derivative was tested for nucleo- some spacing activity in a NAP-1 assisted nucleosome assembly reaction. The chroma- tin was partially digested with three increas- ing amounts of micrococcal nuclease (from left to right), and the resulting DNA fragments were purified, resolved on an agarose gel, and stained with SYBR Gold. The size marker is a 123 bp ladder. Note the DNA fragment ladders indicative of regularity of the nucleo- some array in the reactions containing intact ISWI or the ⌬SANT derivative.

Experimental Procedures 8.0), sonication, and DNase I treatment. The cell lysate was cleared by centrifugation for 45 min at 45,000 g. The 6ϫ His-tagged protein Protein Expression and Purification was eluted from a Ni-chelating agarose (Amersham Pharmacia) by ISWI-C (residues 691–991) and ISWI-N (residues 1–697) were sub- an imidazole gradient from 50–500 mM. The His-tag was removed cloned from the full-length ISWI construct into the NcoI site of the by overnight incubation with TEV protease at 4ЊC and a second pProEx-Htb vector (Life Technologies) and transformed into Esche- passage over a Ni-column. Fractions containing the protein were richia coli BL21(DE3). Cells were grown in LB media to an OD600nm ϭ pooled and subsequently purified by gel filtration using a Superdex- 0.6–0.8 at 37ЊC and induced with 0.2 mM IPTG. Subsequently, the 200 column (Pharmacia) equilibrated with 0.5 M NaCl, 20 mM protein was expressed overnight at 20ЊC. Cells were lysed by resus- HEPES, or Tris-HCl (pH 7.0–7.5) and concentrated to about 60 mg/ml pending in 0.5 M NaCl, 20 mM Tris/HCl, or HEPES/NaOH (pH 7.5– prior to crystallization. For seleno-methionine incorporation the pro- Nucleosome Recognition by ISWI 459

tein was grown in M9 minimal medium and supplemented with sele- the sequence GGGGG and the SLIDE domain (aa 900–948) with the nomethionine and other essential amino acids as previously de- sequence GGSRG (see Figure 3B). The identity of the clones was scribed (VanDuyne et al., 1993). verified by sequence analysis. Full-length ISWI and all deletion mutants were expressed from pMYB4-based vectors (Corona et al., 1999). The E. coli of strain Bandshift Assays ER2566 transformed with the respective plasmids were induced Nucleosomes were assembled by salt gradient dialysis on 248 or overnight at 18ЊC with 0.1 mM IPTG. Cells were lysed by sonication 146 bp long body labeled DNA fragments. ISWI or its derivatives in lysis buffer (50 mM Tris-Cl [pH 7.6], 500 mM NaCl, 0.1 mM EDTA, (for amounts see figure legends) were incubated with approximately 0.1% Triton X-100), lysates were spun down, and supernatants were 10 fmols of nucleosomes in 20 ␮l buffer of 50 mM Tris-HCl (pH 8.0), applied to the chitin affinity resin (Biolabs). The column was washed 50 mM NaCl, 1 mM MgCl2, 100 ␮g/ml chicken albumin, 0.05% NP40, with the same buffer, and bound protein was released by overnight 10% glycerol for 15 min at room temperature. The samples were clevage in 50 mM DTT. Pooled fractions containing ISWI and all then resolved on 1.4% agarose gels in 0.3ϫ TBE at 4ЊC for 30 min. deletion mutants were purified on a Superdex-200 column where The gels were visualized by autoradiography. they ran as a defined peak of predicted molecular mass. Finally, Binding to 4WJ oligonucleotide (100 fmols) was assayed by incu- pooled gel filtration fractions containing ISWI were dialyzed against bating enzyme and nucleosome substrate in 50 mM Tris-HCl (pH 50 mM Tris-Cl (pH 7.6), 100 mM NaCl, 50% glycerol. Protein aliquots 8.0), 1 mM MgCl2, 50 mM NaCl, 10% glycerol, 100 ␮g/ml chicken were stored at Ϫ20ЊC. albumin, 0.05% NP40 at room temperature for 15 min and separation on a 4% native polyacrylamide gel. The 4WJ substrate was prepared Crystallization, Data Collection, and Structural Analysis according to Teo et al. (1995). Crystals were grown at 20ЊC by hanging drop vapor diffusion above a reservoir containing 4% PEG 6000, 0.1 M HEPES (pH 7.0). Mixing ATPase Assay equal volumes of reservoir and protein solutions yielded hexagonal 0.8 pmols of ISWI and of the deletion mutants was incubated in 20 elongated crystals with dimensions of 1000 ϫ 100 ϫ 100 ␮m which mM HEPES-KOH (pH 7.6), 50 mM KCl, 1 mM MgCl2 with 66 ␮M ATP diffracted to about 3.5 A˚ resolution. A different crystal form which at room temperature in a volume of 15 ␮l. At different times 1 ␮l appeared once under the same conditions was used for microseed- aliquots were spotted onto PEI-cellulose (Merck), and the hydrolysis ing. The best seeded crystals with dimensions of 400 ϫ 400 ϫ 80 products were separated by thin-layer chromatography. Spots were ␮m diffracted to better than 1.8 A˚ at the ESRF synchrotron. They quantified using Fuji phosphorimager and AIDA software. DNA stim- belong to space group C2, a ϭ 109.5 A˚ ,bϭ 66.3 A˚ ,cϭ 82.8 A˚ , ulation was achieved by the addition of plasmid DNA as indicated ␤ϭ124.4Њ. Crystals were harvested in reservoir buffer with maxi- in the figure legends. Nucleosomal substrates were generated by mally 35% glucose as cryoprotectant. Diffraction data were col- reconstitution of nucleosomes on plasmid DNA by standard salt- lected on ESRF beam lines ID14-4 and CRG beamline BM14 (Table gradient dialysis protocols from purified Drosophila histones (La¨ ngst 1). Data were processed and scaled with program XDS (Kabsch, et al., 1999). 1988). The selenium sites of the MAD data set were located and refined with program SOLVE (Terwilliger and Berendzen, 1999) using Nucleosome Remodeling Assays all three wavelengths. The experimental electron density was further The nucleosome sliding assay was as described (La¨ ngst and Becker, improved by solvent flattening with program RESOLVE (Terwilliger, 2001a, 2001b). The nucleosome spacing assay was according to 2000). The structure was manually built with program O (Jones et Clapier et al. (2001). al., 1991) and Turbo-Frodo (Roussel and Cambillau, 1989) using the position of the selenium sites as guidance. The nearly completed Acknowledgments model was then refined against native data using program REFMAC (CCP4, 1994). The final model using data between 20–1.9 A˚ resolu- This work was supported by Deutsche Forschungsgemeinschaft tion has a crystallographic R factor of 21.8% (R ϭ 25.3%). It free through SFB 594. T.G. was supported by the EMBL International contains 268 residues, 49 water molecules, 2 glucose, and 1 glycerol PhD program. J.B. gratefully acknowledges the support by a FEBS molecule. The geometry of the model is excellent with 94.1% of the postdoctoral fellowship. D.F.V.C. was supported by EMBO and residues in the most favored region, 5.9% in additional allowed HFSP postdoctoral fellowships. We are grateful to M. Walsh and H. regions, and no residues in the generously or disallowed regions of Belrhali for access and support at CRG beamline BM14 and R. the Ramachandran plot. Ravelli for access and support at beamline ID14-4 at the ESRF. We To construct a model of an ISWI-C/nucleosome complex, ISWI-C thank Ralf Strohner for nucleosomal DNA, R. Aasland for stimulating was superimposed onto repeat R3 of c-Myb bound to DNA (PDB discussions, and Carlo Petosa for critical reading of the manuscript. code 1h88, rmsd45CA ϭ 1.4 A˚ ). Subsequently, the ISWI-C/DNA com- plex was positioned onto the nucleosome at every base step by superimposing the 3 base pairs closest to SLIDE helix SL3 in the Received: February 10, 2003 hypothetical ISWI-C/DNA complex with 3 base pairs of the nucleo- Revised: May 1, 2003 somal DNA (PDB code 1AOI). Accepted: May 27, 2003 Published: August 29, 2003

Construction of ISWI Deletion Mutants References ISWI deletion mutants were constructed by a two-step PCR strat- egy. In the first PCR reaction sequences corresponding to the five glycine residues which replace the SANT domain were linked by Aasland, R., Stewart, A.F., and Gibson, T. (1996). The SANT domain: PCR to N- and C-terminal ISWI sequences adjacent to the sequence a putative DNA-binding domain in the SWI-SNF and ADA complexes, to be deleted. The two resulting fragments contained ISWI C-ter- the transcriptional co-repressor N-CoR and TFIIIB. Trends Biochem. minal and N-terminal sequences until the deletion breakpoint, re- Sci. 21, 87–88. spectively, linked to the replacement sequence. In a second step Beato, M., and Eisfeld, K. (1997). Transcription factor access to these two fragments were connected by annealing the complemen- chromatin. Nucleic Acids Res. 25, 3559–3563. tary mutagenic sequence and a second round of amplification (de- Becker, P.B. (2002). Nucleosome sliding: facts and fiction. EMBO tails are available upon request). The SLIDE domain was replaced J. 21, 4749–4753. by a similar strategy. SfiI and XbaI fragments containing the muta- Becker, P.B., and Ho¨ rz, W. (2002). ATP-dependent nucleosome re- genized region were isolated from the PCR product and used to modeling. Annu. Rev. Biochem. 71, 247–273. replace the wild-type fragment of ISWI cDNA in pMYB-ISWI (Corona et al., 1999); the correct product was verified by sequencing. To Bork, P., and Koonin, E.V. (1993). An expanding family of helicases generate the ⌬SANT⌬SLIDE mutant, the SLIDE domain was deleted within the ‘DEAD/H’ superfamily. Nucleic Acids Res. 21, 751–752. in the ⌬SANT context according to the same strategy. This proce- Boyer, L.A., Langer, M.R., Crowley, K.A., Tan, S., Denu, J.M., and dure resulted in substitution of the SANT domain (aa 800–839) with Peterson, C.L. (2002). Essential role for the SANT domain in the Molecular Cell 460

functioning of multiple chromatin remodeling enzymes. Mol. Cell 10, Merritt, E.A., and Bacon, D.J. (1997). Raster3D: photorealistic molec- 935–942. ular graphics. Methods Enzymol. 277, 505–524. CCP4 (Collaborative Computational Project Number 4) (1994). The Nightingale, K., Dimitrov, S., Reeves, R., and Wolffe, A.P. (1996). CCP4 suite: programs for protein crystallography. Acta Crystallogr. Evidence for a shared structural role for HMG1 and linker histones D 50, 760–776. B4 and H1 in organizing chromatin. EMBO J. 15, 548–561. Clapier, C.R., Langst, G., Corona, D.F., Becker, P.B., and Nightin- Ogata, K., Morikawa, S., Nakamura, H., Sekikawa, A., Inoue, T., gale, K.P. (2001). Critical role for the histone H4 N terminus in nucleo- Kanai, H., Sarai, A., Ishii, S., and Nishimura, Y. (1994). Solution struc- some remodeling by ISWI. Mol. Cell. Biol. 21, 875–883. ture of a specific DNA complex of the Myb DNA-binding domain Clapier, C.R., Nightingale, K.P., and Becker, P.B. (2002). A critical with cooperative recognition helices. Cell 79, 639–648. epitope for substrate recognition by the nucleosome remodeling Peterson, C.L. (2002). Chromatin remodeling enzymes: taming the ATPase ISWI. Nucleic Acids Res. 30, 649–655. machines. Third in review series on chromatin dynamics. EMBO Rep. 3, 319–322. Corona, D.F.V., La¨ ngst, G., Clapier, C.R., Bonte, E.J., Ferrari, S., Tamkun, J.W., and Becker, P.B. (1999). ISWI is an ATP-dependent Po¨ hler, J.R., Norman, D.G., Bramham, J., Bianchi, M.E., and Lilley, nucleosome remodeling factor. Mol. Cell 3, 239–245. D.M. (1998). HMG box proteins bind to four-way DNA junctions in their open conformation. EMBO J. 17, 817–826. Davey, C.A., Sargent, D.F., Luger, K., Maeder, A.W., and Richmond, T.J. (2002). Solvent mediated interactions in the structure of the Roussel, A., and Cambillau, C.T.-F. (1989). TURBO-Frodo. In Silicon nucleosome core particle at 1.9 A˚ resolution. J. Mol. Biol. 319, 1097– Graphics Geometry Partners Directory (Mountain View, CA: Silicon 1113. Graphics), pp. 77–79. Ebralidse, K.K., Grachev, S.A., and Mirzabekov, A.D. (1988). A highly Schultz, J., Milpetz, F., Bork, P., and Ponting, C.P. (1998). SMART, a basic histone H4 domain bound to the sharply bent region of nucleo- simple modular architecture research tool: identification of signaling somal DNA. Nature 331, 365–367. domains. Proc. Natl. Acad. Sci. USA 95, 5857–5864. Eisen, J.A., Sweder, K.S., and Hanawalt, P.C. (1995). Evolution of Sterner, D.E., Wang, X., Bloom, M.H., Simon, G.M., and Berger, S.L. the SNF2 family of proteins: subfamilies with distinct sequences (2002). The SANT domain of Ada2 is required for normal acetylation and functions. Nucleic Acids Res. 14, 2715–2723. of histones by the yeast SAGA complex. J. Biol. Chem. 277, 8178– 8186. Fraenkel, E., Rould, M.A., Chambers, K.A., and Pabo, C.O. (1998). Engrailed homeodomain-DNA complex at 2.2 A˚ resolution: a de- Tahirov, T.H., Sato, K., Ichikawa-Iwata, E., Sasaki, M., Inoue-Bungo, tailed view of the interface and comparison with other engrailed T., Shiina, M., Kimura, K., Takata, S., Fujikawa, A., Morii, H., et al. structures. J. Mol. Biol. 284, 351–361. (2002). Mechanism of c-Myb-C/EBP beta cooperation from sepa- rated sites on a promoter. Cell 108, 57–70. Hamiche, A., Kang, J.G., Dennis, C., Xiao, H., and Wu, C. (2001). Tanner, N.K., and Linder, P. (2001). DExD/H box RNA helicases: Histone tails modulate nucleosome mobility and regulate ATP- from generic motors to specific dissociation functions. Mol. Cell 8, dependent nucleosome sliding by NURF. Proc. Natl. Acad. Sci. USA 251–262. 98, 14316–14321. Teo, S.H., Grasser, K.D., Hardman, C.H., Broadhurst, R.W., Laue, Havas, K., Flaus, A., Phelan, M., Kingston, R., Wade, P.A., Lilley, E.D., and Thomas, J.O. (1995). Two mutations in the HMG-box with D.M., and Owen-Hughes, T. (2000). Generation of superhelical tor- very different structural consequences provide insights into the na- sion by ATP-dependent chromatin remodeling activities. Cell 103, ture of binding to four-way junction DNA. EMBO J. 14, 3844–3853. 1133–1142. Terwilliger, T.C. (2000). Maximum-likelihood density modification. Holm, L., and Sander, C. (1993). Protein structure comparison by Acta Crystallogr. D 56, 965–972. alignment of distance matrices. J. Mol. Biol. 233, 123–138. Terwilliger, T.C., and Berendzen, J. (1999). Automated MAD and Jones, T., Zhou, J., Cowan, S., and Kjeldgaard, M. (1991). Improved MIR structure solution. Acta Crystallogr. D 55, 849–861. methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, VanDuyne, G.D., Standaert, R.F., Karplus, P.A., Schreiber, S.L., and 110–119. Clardy, J. (1993). Atomic structures of the human immunophilin FKBP-12 complexes with FK506 and rapamycin. J. Mol. Biol. 229, Kabsch, W. (1988). Evaluation of single-crystal X-ray diffraction data 105–124. from a position-sensitive detector. J. Appl. Crystallogr. 21, 916–924. Wintjens, R., and Rooman, M. (1996). Structural classification of Kingston, R.E., and Narlikar, G.J. (1999). ATP-dependent remodeling HTH DNA-binding domains and protein-DNA interaction modes. J. and acetylation as regulators of chromatin fluidity. Genes Dev. 13, Mol. Biol. 262, 294–313. 2339–2352. Workman, J.L., and Kingston, R.E. (1998). Alteration of nucleosome Kornberg, R.D., and Lorch, Y. (1999). Twenty-five years of the structure as a mechanism of transcriptional regulation. Annu. Rev. nucleosome, fundamental particle of the eukaryote chromosome. Biochem. 67, 545–579. Cell 98, 285–294. Kraulis, P.E. (1991). MOLSCRIPT: a program to produce both de- Accession Numbers tailed and schematic plots of protein structures. J. Appl. Crystallogr. 24, 946–950. Coordinates of the ISWI-C structure have been deposited with the La¨ ngst, G., and Becker, P.B. (2001a). ISWI induces nucleosome Protein Data Bank under ID code 1OFC. sliding on nicked DNA. Mol. Cell 8, 1085–1092. La¨ ngst, G., and Becker, P.B. (2001b). Nucleosome mobilization and positioning by ISWI-containing chromatin remodeling factors. J. Cell Sci. 114, 2561–2568. La¨ ngst, G., Bonte, E.J., Corona, D.F.V., and Becker, P.B. (1999). Nucleosome movement by CHRAC and ISWI without disruption or trans-displacement of the histone octamer. Cell 97, 843–852. Luger, K., Ma¨ der, A.W., Richmond, R.K., Sargent, D.F., and Rich- mond, T.J. (1997). Crystal structure of the nucleosome core particle at 2.8 A˚ resolution. Nature 389, 251–260. Marchler-Bauer, A., Panchenko, A.R., Shoemaker, B.A., Thiessen, P.A., Geer, L.Y., and Bryant, S.H. (2002). CDD: a database of con- served domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281–283. 110 APPENDIX D. ARTICLE Bibliography

Aasland, R., Stewart, A. F., and Gibson, T. (1996). The SANT domain: a putative DNAbinding domain in the SWISNF and ADA complexes, the transcriptional corepressor NCoR and TFIIIB. TIBS, 21:87–88.

Angelov, D., Novakov, E., Khochbin, S., and Dimitrov, S. (1999). Ultraviolet laser footprinting of histone H1(0) fourway junction complexes. Biochemistry, 38(35):11333–11339.

Aoyagi, S. and Hayes, J. J. (2002). hSWI/SNFcatalysed nucleosome sliding does not occur solely via a twit diffusion mechanism. Mol. Cell. Biol., 22:7484–7490.

Badenhorst, P., Voas, M., Rebay, I., and Wu, C. (2002). Biological functions of the ISWI chromatin remodeling complex NURF. Genes & Development, 16:3186–3198.

Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., R., E. S., GriffithsJones, S., Howe, L. K., Marshall, M., and Sonnhammer, E. L. L. (2002). The Pfam Protein Families Database. Nucl. Acids. Res., 30(1):276– 280.

BazettJones, D. P., Côté, J., Landel, C. C., Peterson, C. L., and Workman, J. L. (1999). The SWI/SNF complex creates loop domains in DNA and polynucleosome arrays and can disrupt DNAhistone contacts within these domains. Mol. Cell. Biol., 19:1470–1478.

Bernstein, B. E., Humphrey, E. L., Erlich, R. L., Schneider, R., Bouman, P., Liu, J. S., Kouzarides, T., and Schreiber, S. L. (2002). Methylation of histone H3 Lys 4 in coding regions of active genes. Proc. Nat. Acad. Sci., 99(13):8695–8700.

Bork, P. and Koonin, V. E. (1993). An expanding family of helicases within the DEAD/H superfamily. Nucleic Acids Research, 21(3):751–752.

Bouazoune, K., Mitterweger, A., Längst, G., Imhof, A., Akhtar, A., Becker, P. B., and Brehm, A. (2002). The dMi2 chromodomains are DNA binding modules important for ATPdependent nucleosome mobilisation. EMBO J., 21(10):2430–2440.

Boyer, L., Logie, C., Bonte, E., Becker, P. B., Wade, P. A., Wolffe, A. P., Wu, C., Imbalzano, A. N., and Peterson, C. L. (2000). Functional delineation of three groups of the ATP–dependent family of chromatin remodelling complexes. J. Biol. Chem., 275(25):18864–18870.

Boyer, L. A. et al. (2002). Essential role for the SANT domain in the functioning of multiple chromatin remodeling enzymes. Mol. Cell, 10(4):935–42.

Brehm, A., Längst, G., Kehle, J., Clapier, C. R., Imhof, A., Eberharter, A., Muller, J., and Becker, P. B. (2000). dMi2 and ISWI chromatin remodelling factors have distinct nucleosome binding and mobilisation properties. EMBO J., 19:4332–4341.

Brünger, A. T. (1992). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature, 355:472–474.

Brünger, A. T. (1993). Assessment of phase accuracy by cross validation: the free R value. Methods and applica tions. Acta Cryst., D49(1):24–36.

Brünger, A. T. (1997). Patterson Correlation Searches and Refinement. In (Carter and Sweet, 1997), chapter 32, pages 558–580.

Carter, Jr., C. W. and Sweet, R. M., editors (1997). Macromolecular Crystallography, volume 276 Part A of Methods in Enzymology. Academic Press London.

Clapier, C. R., Längst, G., Corona, D. F. V., Becker, P. B., and Nightingale, K. P. (2001). Critical role for the histone h4 n terminus in nucleosome remodeling by iswi. Mol. Cell. Biol., 21(3):875–883.

111 112 BIBLIOGRAPHY

Clapier, C. R., Nightingale, K. P., and Becker, P. B. (2002). A critical epitope for substrate recognition by the nucleosome remodelling ATPase ISWI. Nucl. Acids Res., 30(3):649–655.

Collaborative Computational Project Number 4 (1994). The CCP4 Suite: Programs for Protein Crystallography. Acta Cryst., D50:760–763.

Conte, M. R., Grune, T., Ghuman, J., Kelly, G., Ladas, A., Matthews, S., and Curry, S. (2000). Structure of tandem RNA recognition motifs from polypyrimidine tract binding protein reveals novel features of the RRM fold. EMBO J., 19(12):3132–3141.

Corona, D. F. V., Eberharter, A., Budde, A., Deuring, R., Ferrari, S., VargaWeisz, P., Wilm, M., Tamkun, J., and Becker, P. B. (2000). Two histone fold proteins, CHRAC14 and CHRAC16, are developmentally regulated subunits of chromatin accessibility complex CHRAC. EMBO J., 19(12):3049–3059.

Corona, D. F. V., Längst, G., Clapier, C. R., Bonte, E. J., Ferrari, S., Tamkun, J. W., and Becker, P. B. (1999). ISWI is an ATPdependent nucleosome remodelling factor. Molecular Cell, 3:239–245.

Dauter, Z. (1999). Datacollection strategies. Acta Cryst., D55:1703–1717.

Davey, C. A., Sargent, D. F., Luger, K., Maeder, A. W., and Richmond, T. J. (2002). Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 Å resolution. J. Mol. Biol., 319(5):1097–113.

Deuring, R., Fanti, L., Armstrong, J. A., Sarte, M., O., P., Prestel, M., Daubresse, G., Verardo, M., Moseley, S. L., Berloco, M., Tsukiyama, T., Wu, C., Pimpinelli, S., and Tamkun, J. W. (2000). The ISWI chromatin– remodelling protein is required for gene expression and the maintenance of higher order chromatin structure in vivo. Mol. Cell, 5:355–365.

Ebbert, R., Birkmann, A., and Schüller, H.J. (1999). The product of the SWI2/SNF2 paralogue INO80 of Sac- charomyces cerevisiae required for efficient expression of various yeast structural genes is part of a high molecularweight protein complex. Mol. Microbiol., 32(4):741–751.

Eisen, J. A., Sweder, K. S., and Hanawalt, P. C. (1995). Evolution of the SNF2 family of proteins: subfamilies with distinct sequences and functions. Nucleic Acid Research, 23(14).

Evans, G. and Pettifer, R. F. (2001). CHOOCH: a program for deriving anomalous scattering factors from Xray fluorescence spectra. J. Appl. Cryst., 34:82–86.

Fribourg, S., Braun, I. C., Izaurralde, E., and Conti, E. (2001). Structural basis for the recognition of a nucleoporin FG repeat by the NTF2like domain of the TAP/p15 mRNA nuclear export factor. Mol. Cel., 8:645–656.

Gerthsen, C. and Vogel, H. (1993). Physik: Ein Lehrbuch zum Gebrauch neben Vorlesungen. SpringerVerlag Berlin, 17th edition. Gill, S. M. and Hippel, P. H. (1989). Calculation of protein extinction coefficients from amino acid sequence data. Analytical Biochemistry, 182:319–326.

Glykos, N. M. and Kokkinidis, M. (2000). A stochastic approach to molecular replacement. Acta Cryst., D55:169– 174.

Graf, T. (1992). Myb: a transcriptional activator linking proliferation and differentiation in hematopoietic cells. Cur. Op. Gen. Dev., 2:249–255.

Greenfield, N. and Fasman, G. D. (1969). Computed Circular dichroism spectra for the evaluation of protein conformation. Biochemistry, 8(10).

Grüne, T. (1999). Kristallstrukturen der beiden Proteinkomplexe Humanes Serum Albumin mit Stearat und Hu manes Serum Albumin mit Oleat. Diplomarbeit, Universität Karlsruhe (TH), Institut für Kristallographie.

Grüne, T., Brzeski, J., Eberharter, A., Clapier, C. R., Corona, D. F. V., Becker, P. B., and Muller, C. W. (2003). Crystal structure and functional analysis of the nucleosome recognition module of the remodeling factor ISWI. accepted, Mol. Cell.

Hansen, C. L., Skordalakes, E., Berger, J. M., and R., Q. S. (2002). A robust and scalable microfluidic metering method that allows protein crystal growht by free interface diffusion. PNAS, 99:16531–16536.

Havas, K., Flaus, A., Phelan, M., Kingston, R., Wade, P. A., Lilley, D. M. J., and OwenHughes, T. (2000). Generation of superhelical torsion by atpdependent chromatin remodelling activities. Cell, 103:1133–1142.

Hayes, J. J. and Hansen, J. C. (2001). Nucleosomes and the chromatin fiber. Cur. Op. Gen. Dev., 11:124–129. BIBLIOGRAPHY 113

Holm, L. and Sander, C. (1993). Dali ver. 2.0. J. Mol. Biol., 233:123–138.

Iizuka, M. and Smith, M. M. (2003). Functional consequences of histone modifications. Cur. Op. Gen. Dev., 13:154–160.

Ito, T., Levenstein, M., Fyodorov, D. V., Kutach, A. K. Ryuji, K., and Kadonage, J. T. (1999). ACF consists of two subunits, Acf1 and ISWI, that function cooperatively in the ATPdependent catalysis of chromatin assembly. Genes & Development, 13:1529–1539.

Jackson, J. D. (1998). Classical Electrodynamics. John Wiley & Sons, Inc., 3rd edition. Jones, S., Barker, J. A., Nobeli, I., and Thornton, J. M. (2003). Using structural motif templates to identify proteins with DNA binding function. Nucl. Acids. Res., 31(11):2811–2823.

Jones, T. A., Bergdoll, M., and Kjeldgaard, M. (1991). O: A macromolecular modelling environment. In Bugg, C. and Ealick, S., editors, Crystallographic and Modelling Methods in Molecular Design, pages 189–195. Springer Verlag.

Kabsch, W. (2002). XDS, X-ray Detector Software. MaxPlanckInstitute for Medical Research, Jahnstrasse 29, D69120 Heidelberg.

Kadonaga, J. T. (1998). Eukaryotic transcription: An interlaced network of transcription factors and chromatin modifying machines. Cell, 92:307–313.

Kassabov, S. R., Zhang, B., Persinger, J., and Bartholomew, B. (2003). SWI/SNF unwraps, slides, and rewraps the nucleosome. Mol. Cell, 11:391–403.

Kim, J. L., Nikolov, D. B., and Burley, S. K. (1993a). Cocrystal structure of TBP recognizing the minor groove of a TATA element. Nature, 365:520–527.

Kim, Y., Sigler, P. B., et al. (1993b). Crystal structure of a yeast TBP/TATAbox complex. Nature, 365:512–520.

Kingston, R. E. and Narlikar, G. J. (1999). ATPdependent remodelling and acetylation as regulators of chromatin fluidity. Genes & Development, 13:2339–2352. Review.

Kleywegt, G. J. (2002). Uppsala Software Factory. Uppsala University, Dept. of Cell. and Mol. Biology. http://www.xray.bmc.uu.se/usf/manuals/welcome2usf.html.

Kornberg, R. D. and Lorch, Y. (1999). Twenty–five years of the Nucleosome, Fundamental Particle of the Eukary ote Chromosome. Cell.

Kornberg, R. D. and Thomas, J. O. (1974). Chromatin structure: oligomers of the histones. Science, 184:865–868.

Kraulis, P. J. (1991). Molscript: a program to produce both detailed and schematic plots of protein structures. J. Appl. Cryst., 24:946–950.

Längst, G., Bronte, E. J., Corona, D. F. V., and Becker, P. B. (1999). Nucleosome movement by CHRAC and ISWI without disruption or transdisplacement of the histone octamer. Cell, 97:843–852.

Leslie, A. G. W. (1992). Recent changes to the MOSFLM package for processing film and image plate data. Joint CCP4 + ESF-EAMCB Newsletter on Protein Crystallography, 26.

Luger, K. et al. (1997). Crystal structure of the nucleosome core particle at 2.8 å resolution. Nature, 389:251–260.

Lüscher, B. and Eisenman, R. N. (1990). New light on Myc and Myb. Part II. Myb. Genes and Dev., 4:2235–2241.

Luscombe, N. M., Laskowski, R. A., and Thornton, J. M. (1997). NUCPLOT: a program to generate schematic diagrams of proteinDNA interactions. NAR, 25:4940–4945.

Martens, J. A. and Winston, F. (2003). Recent advances in understanding chromatin remodelling by Swi/Snf complexes. Cur. Op. Genes and Dev., 13:136–142.

Memedula, S. and Belmont, A. S. (2003). Sequential Recruitment of HAT and SWI/SNF Components to Con densed Chomatin by VP16. Current Biology, 13:241–246.

Merritt, E. A. and J., B. D. (1997). Raster3d photorealistic molecular graphics. Methods in Enzymology, 277:505– 524.

Nikolov, D., Chen, H., Halay, E. D., Hoffman, A., and Burley, S. (1992). Crystal structure of TFIID TATAbox binding protein. Nature, 360:40–46. 114 BIBLIOGRAPHY

Ogata, K., KaneiIshii, C., Sasaki, M., Hatanaka, H., Nagadoi, A., Enari, M., Nakamura, H., Nishimura, Y., Ishii, S., and Sarai, A. (1996). The cavity in the hydrophobic core of Myb DNAbinding domain is reserved for DNA recognition and trans-activation. Nature Structural Biology, 3(2). Owen, D. J., Ornaghi, P., Yang, J.C., Lowe, N., Evans, P. R., Ballario, P., Neuhaus, D., P., F., and Travers, A. A. (2000). The structural basis for the recognition of acetylated histone H4 by the bromodomain of histone acetyltransferase Gcn5p. EMBO J., 19(22):6141–6149. Panigrahi, A. K., Tomar, R. S., and Chaturvedi, M. M. (2003). A SWI/SNFlike factor from chicken liver that disrupts nucleosomes and transfers histone octamers in cis and trans. Biochem. Biophys., 414(1):24–33. Qiagen (2001). The QIAexpressionist. QIAGEN, 5th edition. Ravelli, R. B. G., Schrøder Leiros, H.K., Pan, B., Caffrey, M., and McSweeney, S. (2003). Specific radiation damage can be used to solve macromolecular crystal structures. Structure, 11:217–224. Read, R. (2001). Pushing the boundaries of molecular replacement with maximum likelihood. Acta Cryst., D57:1373–1382. Rodgers, D. W. (1997). Practical cryocrystallography. In (Carter and Sweet, 1997), chapter 14. Römpp, H. (1962). Chemie-Lexikon. Franck’sche Verlaggruppe, Stuttgart, 5th edition. Sanner, M. F., Spehner, J. C., and Olson, A. J. (1996). Reduced surface: an efficient way to compute molecular surfaces. Biopolymers, 38(3):305–320. Santoro, R., Li, J., and Gummt, I. (2002). The nucleolar remodelling complex NoRC mediates heterochromatin formation and silencing of ribosomal gene transcription. Nature Gen., 32:393–396. Sarai, A., Uedaira, H., Morii, H., Yasukawa, T., Ogata, K., Nishimura, Y., and Ishii, S. (1993). Thermal stability of the DNAbinding domain of the Myb oncoprotein. Biochemistry, 32:7759–7764. Schultz, J., Milpetz, F., Bork, P., and Ponting, C. P. (1998). SMART, a simple modular architecture research tool: Identification of signalling domains. Proc. Natl. Acad. Sci. USA, 95:5857–5864. Shen, X., Mizuguchi, G., Hamiche, A., and Wu, C. (2000). A chromatin remodelling complex involved in tran scription and DNA processing. Nature, 406:541–544. Steger, D. J., Haswell, E. S., Miller, A. L., Wente, S. R., and O’Shea, E. K. (2003). Regulation of Chromatin Remodeling by Inositol Polyphosphates. Science, 299:114–116. Sterner, D. E., Wang, X., Bloom, M. H., Simon, G. M., and Berger, S. L. (2002). The SANT domain of Ada2 is required for normal acetylation of histones by the yeast SAGA complex. J. Biol. Chem., 277(10):8178–8186. Strahl, B. D. and Allis, C. D. (2000). The language of covalent histone modifications. Nature, 403:41–45. Suyama, M., Doerks, T., Braun, I. C., Sattler, M., Izaurralde, E., and Bork, P. (2000). Prediction of structural domains of TAP reveals details of its interaction with p15 and nucleoproins. EMBO Rep., 1:53–58. Tariq, M., Saze, H., Probst, A., Lichota, J., Habu, Y., and Paszowski, J. (2003). DNA methylation at CpG sites directs histone H3 methylation in Arabidopsis. In Chromatin and , Alan Wolffe EMBO workshop. Terwilliger, T. (2002). SOLVE-Automated structure solution for MAD and MIR. Los Alamos National Laboratory, 2.03 edition. Version 2.02. Terwilliger, T. C. (2000). Maximumlikelihood density modification. Acta Cryst., D56:965–972. Tsukiyama, T., Daniel, C., Tamkun, J., and Wu, C. (1995). ISWI, a member of the SWI2/SNF2 ATPase Family, encodes the 140kDa subunit of the Nucleosome remodelling factor. Cell, 83:1021–1026. Tsukiyama, T. and Wu, C. (1995). Purification and properties of an ATPdependent nucleosome remodeling factor. Cell, 83:1011–1020. Voet, D. and Voet, J. G. (1995). Biochemistry. John Wiley & Sons, second edition. Weckert, E. and Hümmer, K. (1997). MultipleBeam Xray Diffraction for physical determination of reflection phases and its application. Acta Cryst., A53:108–143. Whitehouse, I., Stockdale, C., Flaus, A., Szcelkun, M. D., and OwenHughes, T. (2003). Evidence for DNA Translocation by the ISWI Chromatin Remodelling Enzyme. Mol. Cell. Biol., 23(6):1935–1945. Index

βmercaptoethanol, 42 PFAM, 21, 55 2,5Hexanediol, 61 SMART, 21, 55, 85, 90 DEAD/H helicases, 18, 21 ACF strand specific activity, 24 composition, 22 DNA binding identification, 22 helix turn helix motif, 22 Acf1, 23 domains bromo, 21 betalactamase , see βlactamase41 chromo, 21 Bradford assay, 43 DEXDc, 21 bulging, 24 Hand, 71 loopturnhelix motif, 71 CD, see Circular dichroism helicase_C, 21 chitin binding protein, 43 HELICc, 21 CHRAC PHD, 21 composition, 22 SANT, 21, 22, 71 identification, 22 SLIDE, 72 chromatin, 17 SNF2_N, 21 eu, 17 DTT, 43 hetero, 17 Circular dichroism, 57 EDTA, 42 JASCOW instrument, 57 electrophoretic mobility shift assay, 23, 59 circular dichroism EMSA, see electrophoretic mobility shift assay comparison with structure, 69 Ethidium Bromide concentration measurement, 43 staining concentration, 47 cruciform DNA, 47, 59 euchromatin, see chromatin17 crystal growth Expression, 41 importance of PEG, 61 expression level, 42, 56 influence of salt, 61 hosts crystallisation Bl21(DE3), 41 additive screen, 44 induction, 41 batch setup, 61 IPTG, 41 cryo agents, 44, 45, 73 vectors freezing, 44 pMYB4, 43 grid screen, 43 pProExHtb, 41, 42 hanging drop, 33 pTYB4, 43 harvesting, 44 extinction coefficient, 43 in capillary, 62 liquid phase diffusion, 33, 44 fourwheeljunction, see cruciform DNA matrix screen, 43 fractional coordinates, 35 micro seeding, 44 free R, see Rfree requirement of PEG, 61 saltingin, 61, 62 glycerol saltingout, 62 in structure, 73 sitting drop, 33 protein storage, 43, 73 vapour diffusion, 33, 43 Harker construction, 36 DALI HAT, see histone, acetyltransferase method, 77 HDAC, see hstone, deacetylase18 search results, 77 helicity data sets negative, 31 settings, 49 of light, 31 statistics, 65 positive, 31 Databases helix turn helix motif, see DNA binding

115 116 INDEX heterochromatin, see chromatin, hetero mosflm, 66 histone, 17 ono, 50, 66, 78 acetylation, 18 peakmax, 50 acetyltransferase, 18 plt2mol.pl, 50 deacetylase, 18 procheck, 50, 69 methylation, 18 refmac5, 38, 50, 69 modifications, 18 resolve, 50, 66 phosphorylation, 18 solve, 50, 66 Holliday junction, see cruciform DNA turbo, 50 HTH, see DNA binding watpeak, 50 xds, 49, 66 IPTG, see Expression protein storage, 43 isopropylthiogalactoside, see IPTG Purification ISWI Nichelating agarose, 42 function in vivo, 24 NiIDA, 56 phenotypes of mutants, 24 NiNTA, 56 tridentate iminodiacetic acid, 42 loopturnhelix motif, see domains, Hand71 looping, 24 radiation damage induced phasing, 36 restricted proteolysis, 47, 55 M9 media, see minimal media Rfree, 49 MAD , see also statistics, MAD66 Rmeas, definition, 65 massspectroscopy, 55 microseeding, 61, 69 silencing, 17 micrococcal nuclease, MNase, 23 SMART, see Databases, 69 minimal media, 42 SMART Molecular Replacement, 36 Evalue, 72 multi wavelength anomalous dispersion, 35 SNF2family, 18 multiple isomorphous replacement, 35 statistics datasets, 65 nitrilo triacetic acid, see Purification, NTA MAD phasing, 66 nucleosome, 17 Stokes parameters, 31 core particle, 17 superposition, 50 dimensions, 17 histone tails, 17 TEV protease, 42 micrococcal nuclease, 17 cleavage conditions, 42 nucleosome remodelling TLS refinement, 38 models, 24 tobacco etch virus, see TEV protease translocase activity, 24 oligo design, 41 tridentate iminodiacetic acid, see Purification, IDA overfitting of data, 72 triplex DNA, 24 twisting, 24 PAGE native, 47 unit cell, 35 PCR polymerase chain reaction, 41 PDB Walker motif A, 21 data submission, 69 wave entry code, 69 plane electromagnetic, 31 perl, 93 Wilson plot, 65 PFAM, see Databases phase problem, 35 polarisation circular, 31 of light, 31 vector, 31 profile fitting, 66 Programs amore, 68 arp_waters, 50 beast, 68 cns_solve, 68 lsqman, 50, 78 molrep, 68 Etude structurale d’ISWI, une ATPase impliquée dans le remodelage de la chromatine Ce travail présente la structure du tiers Cterminal d’ISWI, une protéine participant au remodelage de la chromatine. La structure a été résolue à 1.9 Å et présente une forme cylindrique qui peut être divisée en trois domaines : un domaine HAND présentant un repliement inconnu, un domaine SANT relié par une hélice de 45 Å à un domaine SLIDE ressemblant SANT.` Il n’existe aucune protéine structurellement semblable au domaine HAND et sa signification fonctionelle demeure inconnu. Les domaines SANT et SLIDES partagent des homologies avec un certain nombre de protéines interagissant avec de l’ADN. L’association du fragment Cterminal d’ISWI avec de l’ADN cruciformé a été montrée et un mode putatif d’interaction au domaine SLIDE a été proposé. Le domaine SANT s’est avéré peu susceptible de se lier à de l’ADN. L’analyse et la comparaison avec les homologues du domaine SLIDE liants de l’ADN ont conduit à proposer un modèle suggérant comment ISWI interagit avec le nucléosome.

Mots de clés : – remodelage de la chromatine – ISWI – structure de cristal – SANT – nucleosome

Structural studies on ISWI, an ATP-dependent nucleosome remodelling enzyme This work presents the structure of the Cterminal third of ISWI, an ATPdependent chromatin remodelling enzyme. The structure was solved with data from 1.9Å to 35Å and revealed a cylindrical shape which can be divided into three tandem domains: the “Hand” domain with an unknown fold, directly interacting with the SANT domain, and a SANTlike SLIDE domain, separated by a 45Å spacer helix. There are no structurally similar proteins for the “Hand” domain, and little could be deduced about its functional meaning. SANT and SLIDE domain share homology with a number of DNA binding proteins. Binding of the ISWI fragment to cruciform DNA could be demonstrated and the putative binding mode mapped to the SLIDE domain. The SANT domain was shown to be unlikely to bind DNA. Analysis and comparison with DNA binding homologues of the SLIDE domain resulted in a model suggesting how ISWI interacts with the nucleosome.

Keywords : – Chromatin remodelling – ISWI – crystal structure – SANT – nucleosome recognition