View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Research Article 1247

The Ig fold of the core binding factor α Runt domain is a member of a family of structurally and functionally related Ig-fold DNA-binding domains Marcelo J Berardi1, Chaohong Sun1, Michael Zehr1, Frits Abildgaard2, Jeff Peng3, Nancy A Speck4 and John H Bushweller1*

Background: CBFA is the DNA-binding subunit of the Addresses: 1Department of Molecular Physiology complex called core binding factor, or CBF. Knockout of the Cbfa2 in and Biological Physics, University of Virginia, Charlottesville, VA 22906, USA, 2NMRFAM, mice leads to embryonic lethality and a profound block in hematopoietic University of Wisconsin, Madison, WI 53706, USA, development. Chromosomal disruptions of the human CBFA gene are 3Protein NMR Group, Vertex Pharmaceutical associated with a large percentage of human leukemias. Incorporated, Cambridge, MA 02139, USA and 4Department of Biochemistry, Dartmouth Medical School, Hanover, NH 03755, USA. Results: Utilizing nuclear magnetic resonance spectroscopy we have determined the three-dimensional fold of the CBFA Runt domain in its *Corresponding author. DNA-bound state, showing that it is an s-type immunoglobulin (Ig) fold. DNA E-mail: [email protected] binding by the Runt domain is shown to be mediated by loop regions located at both ends of the Runt domain Ig fold. A putative site for CBFB binding has Key words: CBF, core binding factor, hematopoiesis, leukemia, transcription been identified; the spatial location of this site provides a rationale for the ability of CBFB to modulate the affinity of the Runt domain for DNA. Received: 29 April 1999 Revisions requested: 25 May 1999 Conclusions: Structural comparisons demonstrate that the s-type Ig fold found Revisions received: 7 June 1999 Accepted: 8 July 1999 in the Runt domain is conserved in the Ig folds found in the DNA-binding domains of NF-κB, NFAT, , STAT-1, and the T-domain. Thus, these Published: 30 September 1999 form a family of structurally and functionally related DNA-binding domains. Unlike the other members of this family, the Runt domain utilizes loops at both Structure October 1999, 7:1247–1256 http://biomednet.com/elecref/0969212600701247 ends of the Ig fold for DNA recognition. 0969-2126/99/$ – see front matter © 1999 Elsevier Science Ltd. All rights reserved.

Introduction The CBFB subunit interacts with the Runt domain of the Core binding factors (CBFs) are heterodimeric transcription CBFA subunit, increasing its affinity for DNA approxi- factors consisting of a DNA-binding subunit (CBFA) and a mately sixfold [3] (BE Crute et. al., unpublished results). non-DNA-binding subunit (CBFB) [1–4]. All CBFA sub- CBFB modulates the DNA-binding affinity of the Runt units share a region of homology known as the Runt domain, domain without establishing additional contacts to the which is the DNA-binding domain of these proteins [2,4,5]. DNA [4]. Circular dichroism (CD) spectroscopy docu- CBFs are global developmental regulators in both Drosophila mented a conformational change in either or both the and mammals. The Drosophila Runt gene, the founding Runt domain and the CBFB subunit upon heterodimer- member of the CBFA family, is required for sex determina- ization (BE Crute et al., unpublished results). The data tion, segmentation, and neurogenesis. The mammalian support a model whereby heterodimerization with the CBFA1 gene is required for bone development [6]. CBFA2 CBFB subunit ‘locks in’ a high-affinity DNA-binding con- (AML1) and CBFB, which encode the CBFA and CBFB formation of the Runt domain. Genetic data clearly subunits, respectively, are essential for the emergence of all demonstrate that heterodimerization with the CBFB fetal and adult blood cell lineages in the mammalian embryo subunit is essential for CBFA function in vivo, in that [7,8]. An in vivo function for a third mammalian CBFA gene, homozygous disruption of the Cbfa2 and Cbfb in CBFA3, is currently unknown. Mutations in the CBF genes mice results in identical phenotypes. are also associated with human disease. Chromosomal rearrangements involving CBFA2 [t(8;21)(q22;q22), The primary sequence of the Runt domain shows no t(12;21)(p13;q22)], t(3;21)(q26;q22), t(16;21)(q24;q22), homology to any of the known DNA-binding motifs. t(1;21)(p36;q22), t(5;21)(q13;q22), t(12;21)(q24;q22), The mechanism by which CBFB stabilizes the t(14;21)(q22;q22), t(15;21)(q22;q22), t(17;21)(q11.2;q22)] CBFA–DNA complex is unusual in that contacts to the and CBFB [inv(16)(p13;q22)] are found in a large DNA are not substantially altered. A predicted model of number of myeloid and lymphocytic leukemias [9]. the structure of the Runt domain has been described 1248 Structure 1999, Vol 7 No 10

[10], but no structural data has yet been presented on Arg, Cys, His, Ile, Leu, Lys, Phe, Thr, Trp, Tyr, and Val this functionally important domain. We recently using auxotrophs for the various amino acids [16]. In presented the high-resolution structure of the CBFB addition, we recorded HNCA and HN(CA)CB spectra on [11]. Here we describe the fold of the a sample of 98% 2H,13C,15N-labeled Runt domain at DNA-bound CBFA2 Runt domain in solution, demon- 500 MHz, which provided additional triple-resonance strate its structural similarity to the DNA-binding correlations not obtained previously. HNCO spectra on domains of the eukaryotic transcription factors NF-κB, this sample provided CO chemical shifts for the calcula- NFAT, STAT-1, and p53, and identify the DNA- and tions with the program TALOS (see below). On the basis CBFB-binding sites. of this data we have obtained at least partial assignments for all residues except Ala1, Ser2, Pro116, Arg124, Results Gly132, Arg170, Pro173 and His174. HN, N, CA, and CB Choice of construct and backbone resonance assignments chemical-shift assignments have been obtained for 161 of The Runt domain was first identified from a 128 amino the 174 amino acids in this Runt-domain fragment. In acid region of extremely high homology (92% identity) order to identify elements of secondary structure and observed between the human CBFA2 (AML1) and the protein–DNA NOEs, a 3D 15N-edited NOESY spectrum Drosophila Runt proteins. Subsequent deletion-mutagene- was recorded on this sample at 800 MHz. Amino acid 1 of sis experiments showed that this domain is responsible for our Runt-domain fragment begins at amino acid 41 in both DNA and CBFB binding. We previously character- CBFA2/p49 (AML1b, PEBP2aB1 [12]). ized a Runt domain fragment of the murine CBFA2 protein spanning residues 41–190 (numbered from CBFA2/p49, or Collection of conformational constraints and calculation of PEBP2aB1 according to Bae et al. [12]), that included the the 3D structure Runt homology region [13] (residues 51–178). Sedimenta- It has recently been very elegantly demonstrated that, with tion-equilibrium measurements on this protein showed it the use of a very limited set of distance constraints obtained to have a pronounced tendency to aggregate (data not from a protein that is fully deuterated with the exception of shown). We therefore explored the ability of additional N- the methyl groups of Val, Leu, and Ile, a high-quality deter- and C-terminal sequences to improve the solution behavior mination of the fold of the protein can be obtained [17–19]. of the protein. A construct spanning residues 41–214 was We have labeled the Runt domain in this manner to assist found to be predominantly monomeric at modest concen- in the determination of the fold of the protein. Assignment trations (~50 µM); however, it was soluble only as a of the methyl resonances of the protein were obtained from complex with DNA. Therefore all studies have been HCCCONNH and CCCONNH total correlation spec- carried out on a complex of CBFA2(41–214) and an 18 base troscopy (TOCSY) experiments [20] recorded at 500 MHz. pair DNA molecule containing a high-affinity core site Because of the poor dispersion of the methyl resonances [14,15]. Even under these conditions, the complex has a (90% of the methyl 1H δs within 0.25 ppm and 95% of all limited solubility, and we have only been able to prepare 13C δs within 6.0 ppm), subsequent NOESY data for struc- ~0.5 mM solutions for nuclear magnetic resonance (NMR) ture determination were collected at 800 MHz. 15N- and spectroscopy. Interestingly, the 41–214 protein loses most 13C-edited 3D NOESY spectra were recorded at 800 MHz of the additional C-terminal residues after prolonged mea- with a mixing time of 180 ms for collection of distance con- surement times, yielding a protein spanning residues straints. CA, CB, and CO assignments for the 100% deuter- 41–192 according to N-terminal sequencing and matrix- ated protein were corrected for the effects of the 2H on the assisted laser desorption ionization (MALDI) mass spectral chemical shifts of CA and CB as described previously [18]. analysis (data not shown). These values were used in the program TALOS to derive dihedral-angle restraints for phi and psi angles as described All NMR data were recorded at 40°C. A sample of 50% previously [21]. 2H/98% 13C, 15N-labeled Runt domain complexed to DNA was utilized to collect three-dimensional (3D) On the basis of a total of 227 useful distance constraints HNCACB, HNCOCACB, HN(CA)CO, and 15N-edited derived from the NOESY data and 80 dihedral angle nuclear Overhauser efffect spectroscopy (NOESY) data restraints derived from TALOS (Table 1), we have cal- at 750 MHz at the National Magnetic Resonance Facility culated the fold of the Runt domain using DYANA v1.5. at Madison (NMRFAM). These spectra yielded the In addition, a total of 12 hydrogen-bond restraints were majority of the backbone assignments; however, reso- employed for residues that met the following criteria: nance assignments were not obtained for a number of TALOS predicted a β conformation, appropriate inter- residues and regions of the protein where apparent strand NOEs were observed, slowed NH exchange was exchange broadening limited the quality of triple-reso- observed, and they were not located at the nance data. In order to obtain assignments for these ends of β-strand elements. Subsequently, the conform- residues and complete the assignments, we prepared ers were subjected to energy minimization with the samples specifically 15N-labeled for the amino acids Ala, program OPAL. Research Article Ig fold of CBFA Runt domain Berardi et al. 1249

Table 1 N terminus or for residues 131–174 at the C terminus. Residues 11–14 at the N terminus did display strong Experimental constraints and structure calculations. sequential dNN NOEs indicative of a local helical or turn NMR constraints conformation for this region, but no long-range NOEs were Distance constraints (total / long range) observed for these residues. These C-terminal and N-ter- Total non-redundant NOE constraints 227 / 164 minal regions displayed decreased linewidth and a lack of NH–NH 145 / 112 NOEs, consistent with their being mobile in solution. The NH/CH3–CH3 82 / 52 Angle constraints basis for such a long, apparently flexible C-terminal tail Phi 40 contributing to increased solubility is not clear at this point. Psi 40 There are seven clearly discernible β strands in the struc- Hydrogen bonds 12 ture of the Runt domain (Figure 2) comprised from the fol- Energy minimization lowing residues: 31–33 (β1), 50–53 (β2), 61–64 (β3), 81–83 AMBER energy (Kcal mol–1) –1137 ± 411 (β4), 88–90 (β5), 108–111 (β6) and 118–121 (β7). As we Residual distance violations Sum (Å) 16.1 ± 0.6 have not obtained resonance assignments for R124, the Maximum (Å) 0.4 ± 0.01 exact length of the C-terminal strand may be longer than Residual angle violations observed thus far in our structure calculations. Strands 1, 2, Sum (°) 55.2 ± 5.7 and 5 combine to form one antiparallel β sheet and strands Maximum (°) 5.0 ± 0.03 3, 4, 6, and 7 combine to form a second antiparallel β sheet. Structure statistics* The two sheets are packed on one another at a shallow All backbone atoms (Å) 2.14 angle to yield a β-sandwich arrangement. No other regular Heavy atoms in all regular secondary structure (Å) 1.34 Backbone atoms in regular secondary structure (Å) 0.63 secondary structural elements were consistently identified Backbone atoms in β1, β2, β5 (Å) 0.43 in the structure, in agreement with previous CD measure- Backbone atoms in β3, β4, β6, β7 (Å) 0.41 ments [13] which indicated that the domain contained only extended elements of regular secondary structure. Strong *Rmsd to the mean of residues 20–135 in the ten lowest energy conformers. sequential dNN NOEs were observed for residues 69–75 of the loop connecting strand 3 to strand 4; however, neither Backbone fold of the CBFA2 Runt domain TALOS nor the subsequent structure calculations identi- Figure 1 shows a stereoview of ten conformers represent- fied this region as being α-helical. The various strands are ing the fold of the Runt domain obtained from these data. connected by a series of loops of varying length and com- Only residues 21–135 are displayed because no long-range position. Because of the antiparallel nature of the β strands, NOEs were observed in our data for residues 1–22 at the these loops alternate along the sequence between the top

Figure 1

Stereoview of the ensemble of ten energy- minimized conformers representing the solution structure of the CBFA Runt domain. Residues 20–135 are shown. The backbone atoms corresponding to the s-type Ig fold are colored in cyan, loops connecting β1–β2, β2–β3, β3–β4, β4–β5, β5–β6, β6–β7 are colored in green, magenta, yellow, blue, red and orange, respectively.

Structure 1250 Structure 1999, Vol 7 No 10

Figure 2

S-type Ig fold of the Runt domain. (a) Ribbon representation (left) of the minimum-energy conformer and topology representation (right) of the Runt-domain structure. Consistent β-sheet regions are colored in cyan and connecting loops are in gray. The short β-sheet regions mentioned in the text are indicated in their respective loops. (b) Ribbon representation (left) and topology diagram (right) of a fibronectin domain. (c) Topological representation of the s-type Ig fold. Strands are labeled as described by Bork et. al. [22]

and bottom of the protein. This results in two groups of The core β-sheet region is quite well defined, with an loops, located at the top and bottom of the structure as overall root mean square deviation (rmsd) to the mean of depicted in Figure 2. Interestingly, there are several strong 0.63 Å for the backbone of residues comprising the two inter-loop long-range interactions present in the structure β sheets (Table 1). The quality of the structure for this that are likely to be of functional significance. There is a portion of the protein compares quite favorably with struc- short two-stranded antiparallel β sheet involving residues tures obtained from much denser sets of constraints and Leu77/Arg78–Arg95/Phe96 and a short two-stranded paral- provides a meaningful framework for interpretation of the lel β sheet involving residues Ile128/Thr129–His38/Trp39, biological function of the domain. The rmsd for the entire both located at the top of the structure as depicted in backbone of all residues depicted in Figure 1 is 2.1 Å, Figure 2. demonstrating that the loop regions are not nearly as well Research Article Ig fold of CBFA Runt domain Berardi et al. 1251

defined. This is to be expected on the basis of the lack of critical cellular functions; it would therefore appear that long-range NOE or dihedral-angle restraints for these por- this family has been utilized for very specialized critical tions of the structure. The lack of definition in these regions roles in transcriptional regulation. means that their overall location is defined; however, details of their relative positions cannot be considered. The fold of the Runt domain shows a strong resemblance to that of the recently solved STAT-1 [27] protein in par- Discussion ticular (Figure 3). In addition to the shared s-type Ig fold The Runt domain displays an immunoglobulin fold of STAT-1, several structural subtleties are also con- Examination of the structure of the Runt domain served between these two proteins. As seen in Figure 3, (Figure 2) shows that it clearly has the classic immunoglob- STAT-1 has a short β-sheet interaction involving strands ulin (Ig) fold consisting of two antiparallel β sheets with a in L1 and the region C-terminal to the Ig fold that is total of seven to nine strands packed on one another at a structurally homologous to the short β-sheet interaction small angle, with the connections between the strands pro- between His38/Trp39 and Ile128/Thr129 observed in the vided by varying loop regions. Despite their structural sim- Runt domain. In addition, there is another short β-sheet ilarity, Ig domains typically do not display high identity to interaction involving strands in loops L3 and L5 in one another at the primary sequence level, explaining why STAT-1 that is structurally homologous to the β-sheet the primary sequence of the Runt domain did not exhibit interaction in the Runt domain between Leu77/Arg78 any obvious homology to other proteins. Bork et al. [22] and Arg95/Phe96. have subdivided the folds of Ig domains observed in protein structures into four classes on the basis of β-sheet Location of the DNA-binding site in the Runt domain and and loop topology. On the basis of this analysis, the Runt related proteins domain falls into the s-type Ig fold, which is perhaps best Previous mutagenesis studies [10,29] have provided identi- exemplified by the structure of the fibronectin repeat fication of a number of residues that are critical for DNA element. Indeed, a Dali [23] search yielded fibronectin as binding by the Runt domain. Figure 4 shows those amino one of the proteins with the highest degree of structural acids that have been shown by mutagenesis to be impor- homology to the Runt domain. The β-sheet and loop tant for DNA binding. We have recorded 15N-edited topology of fibronectin and the Runt domain are illustrated NOESY spectra on samples of fully deuterated Runt in Figure 2, which also gives a ribbon representation of the domain complexed to unlabeled DNA to identify back- two proteins and the s-type Ig fold, illustrating the high bone NHs with NOEs to the DNA. 15N half-filtered 3D degree of structural similarity. As mentioned by Bork et al., NOESY data have also been recorded to distinguish the outer strands in the sheets and especially the loops are protein–protein from protein–DNA NOEs. Those residues conformationally flexible and can be relocated in different displaying intermolecular NOEs are also indicated in families without perturbing the core Ig structure. Figure 4. All of the loop regions in the Runt domain are implicated by mutagenesis or NOE data to be involved in The Runt domain is a member of a family of s-type Ig-fold DNA binding. Loop L5 connecting βE and βF in the Runt DNA-binding proteins domain (Figure 3) is the location of the P-loop element A number of DNA-binding proteins possessing Ig folds, that has been shown by mutagenesis to be critical for DNA including NF-κB, NFAT, p53, STAT-1 and the T binding. We also observe NOEs to the DNA in this loop, domain, have recently been structurally characterized confirming its close proximity to the DNA. Loop L6 con- [24–28]. Strikingly, all of these proteins share the s-type necting strands βF and βG (Figure 3) shows very strong core Ig fold observed in the Runt domain. This is illus- NOEs to the DNA and also contains a novel primary trated definitively in Figure 3, where both ribbon and sequence of VFTNPPQ (single-letter amino acid code) schematic representations of the folds of all the relevant containing two sequential prolines. The intimate contact domains of these proteins are shown. The differences observed between this region of the Runt domain and among these proteins arise in the connecting loop regions DNA is consistent with the known importance of where short additional secondary structural elements have proline–aromatic interactions in protein–ligand recogni- been added that in some cases interact with the core Ig tion, for example in the recognition of proline-containing scaffold. As the edges of the β sheets have been shown to peptides by SH3 (Src-homology 3) domains. be consistent sites of alteration in the Ig fold [22] and loops are readily modified over time to alter function, this DNA binding is mediated by the loops between the similarity may imply an evolutionary relationship among β strands of the Ig scaffold, as has also been seen for these proteins. They could all have evolved readily from a NF-κB, p53, NFAT, STAT-1, and the T-domain. Unlike common ancestor having a core Ig fold by loop modifica- these other proteins, where the DNA-binding loops in tions and gene duplication. For these reasons, these pro- each of the Ig domains are located spatially at one end of teins define a family of structurally related DNA-binding each domain, loops at both ends of the Runt domain are domains. In all five cases, these proteins regulate very involved in binding to DNA (Figures 3,4). In contrast to 1252 Structure 1999, Vol 7 No 10

the other proteins of this family, the Runt domain has There is a similarity in the location of the loops utilized been shown to substantially bend DNA [30]. This for DNA binding among the proteins of this family. In all bending may require contacts at both ends of the Ig fold. cases except the C-terminal domain of NF-κB (referred to Research Article Ig fold of CBFA Runt domain Berardi et al. 1253

Figure 3 proteins (Crute et. al., unpublished results). In the context of the full-length protein CBFB has been shown to relieve Comparison of six topologically similar transcription factor DNA- binding domains from (a–g) CBFA, STAT-1, T-domain, NFAT inhibition mediated by sequences C-terminal to the (C-terminal Ig domain), NFκB N-terminal domain, p53, and NFκB domain, resulting in up to a 40-fold enhancement in DNA C-terminal domain. The ribbon representations of each protein shown binding [31] (T-L Gu, TL Goetz, BJ Graves and NAS, on the left in each panel (CBFA in stereo) were produced using the unpublished results). Mutations in the Runt domain that DNA-binding domains extracted from the corresponding protein–DNA complexes. A topological map of each protein is shown on the right in affect CBFB binding are shown in Figure 4. There is a each panel. The conserved β-sheet core of the s-type Ig fold is colored cluster of mutations involving residues 66–69, 106, and in cyan. The locations of protein–DNA contacts are indicated with 108 that have been shown to affect CBFB binding to arrows. Protein–DNA contacts were obtained from the NOESY CBFA, suggesting that this region is critical for interaction information on the CBFA complex and extracted from PDB coordinates with CBFB. The recently solved structure of CBFB [11] with database entries 1bf5, 1xbr, 1a66, 1vkx, and 1tsr for the other proteins, respectively. identified a moderate-sized interaction surface for binding with the Runt domain, in agreement with binding via a limited region of the Runt domain. A mutation in Q118, as domain 2 in Figure 3), the loops labeled L1 and L5, located at the other end of the fold, was also shown to located at the top of the Ig core, participate in DNA affect CBFB binding. Whether this is a direct contact or binding. In addition, elements C-terminal to the Ig fold, causes a perturbation in the structure cannot be deter- also located at the top of the core, are involved in DNA mined at this time. Interestingly, this putative binding in all of these cases. The Runt domain and the CBFB-binding region is spatially proximal to residues T-domain also employ the L3 loop for mediating DNA shown to be involved in DNA binding, thus providing a contacts. As mentioned above, the Runt domain employs rationale for the effect of CBFB on the DNA-binding loops at both ends of the protein for DNA binding, unlike affinity of the Runt domain. Consistent with this we have all of the other members of this family. Loops L2 and L4 observed substantial changes in the NMR spectra of the located at the bottom of the fold, as illustrated in Figure 3, Runt domain upon binding of CBFB, particularly for a are used for DNA binding by the Runt domain, as is also number of residues in close proximity to the DNA (J Yan seen for the C-terminal domain of NF-κB but for no other and JHB, unpublished results). These data agree with a member of this family. model in which CBFB induces a conformational change in CBFA that increases its affinity for the DNA. These data Binding to CBFB also provide an explanation for the ability of CBFB to CBFB has been shown to be essential for the in vivo func- release inhibited forms of full-length CBFA — the tion of CBFA2 [8]. Previous studies have shown that inhibitory domain must contact the DNA-binding region CBFB enhances the DNA-binding affinity of the isolated on the top of the structure, as illustrated in Figure 2. The Runt domain approximately sixfold and that binding is binding of CBFB would disrupt this interaction and accompanied by a conformational change in one or both expose the proximal loops for DNA binding. Interestingly,

Figure 4

Schematic representation of the intermolecular contacts of the CBFA Runt domain. The red spheres correspond to the spatial location in CBFA of intermolecular protein (NH)–DNA NOEs observed in a CBFA–DNA complex in which the DNA is protonated and all the nonexchangeable protons in CBFA are deuterated. Known point mutations that affect DNA binding and CBFB binding are shown as blue and gray spheres, respectively. 1254 Structure 1999, Vol 7 No 10

this site predicted to be involved in CBFB interaction on described previously [13], the protein was complexed to an 18 base the Runt domain is at a spatial location in NFAT where pair DNA duplex of sequence containing a high-affinity core site and further purified by size-exclusion chromatography on a Sephacryl interactions with AP-1 have been shown to occur [32]. S-100 column. The final buffer employed for NMR measurements con- sisted of 10 mM potassium phosphate (pH 6.0), 1 µM EDTA, 1 mg/ml Biological implications sodium azide, 1 µM thioredoxin, and 4 mM DTT. All samples were Core binding factor (CBF) has been shown to be critical purged with argon prior to recording of the NMR data. A final concen- tration of 0.5 mM, or less, of Runt domain–DNA complex was used in for hematopoietic development. CBF is comprised of a all measurements. DNA-binding CBFA subunit and a non-DNA-binding CBFB subunit. Both proteins have been shown to be Samples of CBFA2(41–214) specifically labeled with 15N in specific essential for function in hematopoiesis. CBFB has been amino acids were prepared by transformation into auxotrophic strains shown to modulate the DNA-binding affinity of CBFA, described elsewhere [16] and growth of cells in rich media employing specific 15N-labeled amino acids. The protein was purified as but it does not seem to contact the DNA directly. Chro- described above. Runt domain–DNA complexes with amino-acid-spe- mosomal translocations involving CBF genes have been cific labeling in the following amino acids were prepared: Ala, Arg, Cys, shown to be the most common genetic abnormality iden- His, Ile, Leu, Lys, Phe, Thr, Trp, Tyr, and Val. tified in leukemia patients. The two proteins of this novel Preparation of a 15N, 13C, 2H (1H-methyl Leu, Val, transcriptional enhancer have not been structurally 1Hδ1-methyl Ile) sample chracterized previously. We have recently solved the The procedure of Gardner and Kay [19] was followed, with some structure of the CBFB protein. In this paper we describe modifications. A fresh transformation of CBFA2(41–214) plasmid the structural characterization of the DNA and CBFB- DNA into BL21 DE3 pLysS cells was made and the cells were inocu- lated directly onto a 50% D O minimal plate containing 4 g l–1 of binding domain, the Runt domain, of the CBFA subunit. 2 glucose and 100 µgml–1 of carbenicillin. Colonies (2 mm) were obtained after 12 h incubation at 37°C. These colonies were then –1 Using NMR spectroscopy we have determined that the transferred to ~100% D2O plates containing 4 g l glucose and incu- CBFA Runt domain possesses an s-type immunoglobu- bated for 24–30 h at 37°C. The colonies adapted to the D2O media were transferred to 50 ml of 100% D O containing 0.75 g l–1 2H, lin (Ig) fold. This core fold of the Runt domain was 2 13C-glucose and the culture diluted 1:5 in fresh media when the shown to be conserved in the Ig folds found in the DNA- OD600 reached 0.1 +/– 0.01 units (about 14 h). The diluted culture κ –1 binding domains of NFAT, p53, NF- B, STAT-1, and was grown to OD600 = 0.25, at which point an additional 0.75 g l of the T-domain. The structural and functional similarity 2H, 13C-glucose was added along with 50 mg l–1 of 15N, 13C-valine and 55 mg l–1 of (3,3-2H ) 13C 2-ketobutyrate (Na+ salt). At this time, among these proteins indicates that the DNA-binding 2 the culture was also induced with 0.8 mM isopropyl-β-D-thiogalacto- domains of these proteins form a family of structurally side (IPTG) and incubated for an additional 20 h. The protein was and functionally-related proteins. The determinants for purified and the DNA complex prepared as described above. DNA binding on the Runt domain have been mapped from a combination of intermolecular NOEs and muta- NMR spectroscopy for backbone resonance assignment and genesis data showing that DNA binding is mediated via mapping of the DNA-binding site NMR measurements were carried out at 40°C on a Varian Inova the loop regions of the Ig fold, as is seen for the other 500 MHz NMR spectrometer equipped with an actively shielded triple- members of this family. Unlike the other members of resonance probe (Nalorac Corporation) and pulsed-field gradients, and this family, the Runt domain was shown to be unique in on a Bruker DRX 750 MHz NMR spectrometer at the NMRFAM. employing loops at both ends of the Ig fold for DNA Pulsed-field gradients were utilized for suppression of the water signal and undesired coherence pathways. NMR data were processed with binding. There are clear similarities in the use of the the program PROSA and the resulting NMR data were visualized and loop regions for DNA binding that support the related- analyzed using the program XEASY. ness of these proteins. Interaction with CBFB has been mapped using previous mutagenesis results to a specific For backbone resonance assignment, HNCACB, HN(CO)CACB, HN(CA)CO, and 15N-edited NOESY spectra were recorded on a location on the structure whose proximity to protein ele- sample of 50% 2H/98% 13C,15N-labeled Runt domain–DNA using the ments involved in DNA binding provides an explanation 15N–1H reverse INEPT element derived from the FHSQC experiment for CBFB’s ability to modulate the DNA-binding on a 750 MHz Bruker DRX spectrometer located at the NMRFAM. affinity of the Runt domain. These structural studies in Gradient sensitivity-enhanced 15N–1H HSQC spectra of all of the specifically labeled samples were recorded at 500 MHz on a Varian conjunction with our recent determination of the struc- Inova spectrometer. HNCA, HN(CA)CB, and HNCO spectra were ture of the CBFB protein provide a detailed molecular recorded on a sample of 98% 2H,13C,15N-labeled Runt domain–DNA characterization of the two proteins of this critical at 500 MHz on a Varian Inova spectrometer. transcription factor. For purposes of mapping of the DNA-binding site, 15N-edited NOESY spectra were recorded on the 98% 2H,13C,15N-labeled Runt ω 15 15 Materials and methods domain–DNA at 800 MHz and an 2- N-half-filtered 3D N-edited Sample preparation for backbone resonance assignment NOESY at 500 MHz. 1H and 13C assignments for the methyl sub- CBFA2(41–214) was expressed and purified according to the proce- stituents were obtained from CCCONNH and HCCCONNH TOCSY dure described by Crute et al. [13]. Celtone media (Martek, Inc.) was data recorded at 500 MHz on a Varian Unity Inova spectrometer on the employed for preparation of the 50% 2H, 98% 13C/15N sample. Bio- 15N,13C,2H-(1H-methyl Leu, Val, 1Hδ1-methyl Ile) Runt domain–DNA Express media (Cambridge Isotopes) was employed for the prepara- sample. 15N-edited and 13C-edited NOESY data were collected at tion of the 100% 2H, 98% 13C/15N sample. Following purification as 800 MHz on a Bruker DRX spectrometer with a mixing time of 180 ms. Research Article Ig fold of CBFA Runt domain Berardi et al. 1255

Structure calculations 3. Kagoshima, H., et al., & Gergen, J.P. (1993). The Runt-domain The N, CA, CB, and CO chemical shifts from the fully deuterated Runt identifies a new family of heteromeric DNA-binding transcriptional domain were corrected for deuteration as described previously [18] and regulatory proteins. Trends Genet. 9, 338-341. utilized as input for the program TALOS to generate chemical-shift-based 4. Wang, S., Wang, Q., Crute, B.E., Melnikova, I.N., Keller, S.R. & Speck, φ ψ N.A. (1993). Cloning and characterization of subunits of the T-cell dihedral-angle restraints for and angles. The fold was calculated on and murine leukemia virus enhancer core-binding factor. Mol. the basis of a total of 227 useful distance constraints derived from the Cell. Biol. 13, 3324-3339. NOESY data and 80 angle constraints derived from TALOS. The back- 5. Meyers, S., Downing, J.R. & Hiebert, S.W. (1993). Identification of bone bone φ and ψ angles angle restraints were created by using a 60° AML-1 and the (8;21) translocation protein (AML-1/ETO) as window on the predicted angles. In addition, a total of 12 hydrogen-bond sequence-specific DNA-binding proteins: the Runt homology domain restraints were employed for residues in which all the following condi- is required for DNA binding and protein–protein interactions. Mol. tions were met simultaneously: appropriate strong interstrand NOEs Cell. Biol. 13, 6336-6345. were observed, slowed NH exchange was observed, TALOS predicted 6. Komori, T., et al., & Kishimoto, T. (1997). Targeted disruption of Cbfa1 results in a complete lack of bone formation owing to maturational angles in a conformation that falls within a 60° window centered on the β arrest of osteoblasts. Cell 89, 755-764. ideal -sheet values, and they were not located at the ends of the 7. Wang, Q., Stacy, T., Binder, M., Marín-Padilla, M., Sharpe, A.H. & β-strand elements in a previously calculated structure. NH–NH and NH- Speck, N.A. (1996). Disruption of the Cbfa2 gene causes necrosis methyl NOE crosspeak intensities obtained from the deuterated samples and hemorrhaging in the central nervous system and blocks definitive were converted to distance constraints iteratively using dmin = 2.6 Å, hematopoiesis. Proc. Natl Acad. Sci. USA 93, 3444-3449. 8. Wang, Q., et al., & Speck, N.A. (1996). The CBFβ subunit is essential dave = 3.6 Å and dmax = 6 Å. Structure calculations were performed with DYANA 1.5. A total of 160 random structures were generated and sub- for CBFα2 (AML1) function in vivo. Cell 87, 697-708. jected to one cycle of 4000 steps of heating and 6000 steps of anneal- 9. Meyers, S. & Hiebert, S.W. (1995). Indirect and direct disruption of ing followed by 1000 steps of conjugate gradient minimization. The transcriptional regulation in cancer, , and AML-1. Crit. Rev. Eukaryot. Gene Expr. 5, 365-383. β2 2 2 β 2 0.5 error-tolerant target function /2((1+(D –b )/ b ) –1) was used 10. Levanon, D., Eisenstein, M. & Groner, Y. (1998) Site-directed during the distance-constraint calibration and switched to the more mutagenesis supports a three-dimensional model of the Runt domain. restrictive target function ((D2–b2)/2b)2, in which D is the actual distance, J. Mol. Biol. 277, 509-512. b is the bound and β is a parameter that weights the sensitivity of the 11. Huang, X., Peng, J.W., Speck, N.A. & Bushweller, J.H. (1999). Core target function distance violations. From 160 minimized structures, the binding factor β revealed, solution structure and map of the CBFα 20 lowest target function structures were selected. The 20 DYANA binding site. Nat. Struct. Biol. 6, 624-627. structures were refined using the program OPAL v2.6. The refinement 12. Bae, S.-C., et al., & Ito, Y. (1994). PEBP2aB/Mouse AML1 consists of was performed as follows. The structures were first subjected to multiple isoforms that possess differential transactivation potentials. Mol. Cell. Biol. 14, 3242-3252. restrained energy minimization using the hydrogen bond and experimen- 13. Crute, B.E., Lewis, A.F., Wu, Z., Bushweller, J.H. & Speck, N.A. tal NMR distance and angle restraints. The electrostatic contributions (1996). Biochemical and biophysical properties of the CBFα2 (AML1) were underweighted by using a dielectric constant of 20 D. 1–4 van der DNA-binding domain. J. Biol. Chem. 271, 26251-26260. Waals interactions were weighted to 1.0. The AMBER94 force field was 14. Thornell, A., Hallberg, B. & Grundstrom, T. (1991). Binding of SL3-3 used and terms for van der Waals, electrostatic, bond, bond angle, dihe- enhancer factor 1 transcriptional activators to viral and chromosomal dral angle, improper dihedral angle and experimental constraint energies enhancer sequences. J. Virol. 65, 42-50. were evaluated. After the initial minimization, the conformers were heated 15. Wang, S. & Speck, N.A. (1992). Purification of core-binding factor, a to 300K and subjected to a 2 ps constrained molecular dynamics simula- protein that binds the conserved core site in murine leukemia virus 12 tion. The experimental violation energies were set to 0.1 kT/2 kcal mol–1 enhancers. Mol. Cell. Biol. , 89-102. 16. Waugh, D. (1996). Genetic tools for labeling of proteins with at room temperature for a 0.25 Å distance or 5.0° angle violation during α-15N-amino acids. J. Biomol. NMR 8, 184-192. the minimization step and 2.0 Å distance or 30.0° angle violations during 17. Rosen, M.K., Gardner, K.H., Willis, R.C., Parris, W.E., Pawson, T. & the molecular dynamics simulation. The final 1000 steps of minimization Kay L.E. (1996). Selective methyl group protonation of perdeuterated were performed after energy minimization, using the same energy para- proteins. J. Mol. Biol. 263, 627-636. meters as above. The ten lowest energy conformers were selected to 18. Gardner, K.H., Rosen, M.K. & Kay, L.E. (1997). Global folds of highly represent the fold of the Runt domain. The minimized structures were deuterated, methyl-protonated proteins by multidimensional NMR. fitted to the mean structure and all figures were prepared using the Biochemistry 36, 1389-1401. 15 program MOLMOL v2.6. 19. Gardner, K.H. & Kay, L.E. (1997). Production and incorporation of N, 13C, 2H, (1H-δ1 methyl) Isoleucine into proteins for multidimensional NMR studies. J. Am. Chem. Soc. 119, 7599-7600. Accession numbers 20. Grzesiek, S., Anglister, J. & Bax, A. (1993). A correlation of backbone The coordinates for the fold of the Runt domain have been deposited in amide and aliphatic side-chain resonances in 13C/15N-enriched the under accession code 1co1. proteins by isotropic mixing of 13C magnetization. J. Magn. Reson. 101, 114-119. 21. Cornilescu, G., Delaglio, F. & Bax, A. (1999). Protein backbone angle Acknowledgements restraints from searching a database for chemical shift and sequence JHB is supported by grants from the United States Public Health Service homology. J. Biomol. NMR, 13, 289-302. R29 AI39536, K02 AI10481, and R01 AI42097 from the Institute for 22. Bork, P., Holm, L. & Sander, C. (1994). The immunoglobulin fold Allergy and Infectious Disease. NAS is a Leukemia Society of America structural classification, sequence patterns and common core. J. Mol. Scholar and is supported by grants from the United States Public Health Biol. 242, 309-320 . Service R01 CA58343 and R01 CA75611. NMRFAM is supported by 23. Holm, L. & Sander, C. (1997). Dali/FSSP classification of three- grants from the National Center for Research Resources. We wish to thank dimensional protein folds. Nucleic Acids Res. 25, 231-234. Dr David Waugh for providing the auxotrophic strains of Escherichia coli 24. Chen, F.E., Huang D.B., Chen, Y.Q. & Ghosh, G. (1998). Crystal used in this work. structure of p50/p65 heterodimer of transcription factor NF-κB bound to DNA. Nature 391, 410-413. References 25. Zhou, P., Sun, L.J., Dotsch,V., Wagner, G. & Verdine, G.L. (1998). 1. Ogawa, E., et al., & Shigesada, K. (1993). Molecular cloning and Solution structure of the core NFATC1/DNA complex. Cell characterization of PEBP2β, the heterodimeric partner of a novel 92, 687-696. Drosophila Runt-related DNA binding protein PEBP2α. Virology 26. Cho, Y., Gorina, S., Jeffrey, P.D. & Pavletich, N.P. (1994). Crystal 194, 314-331. structure of a p53 tumor suppressor-DNA complex: understanding 2. Ogawa, E., et al., & Ito, Y. (1993). PEBP2/PEA2 represents a new tumorigenic mutations. Science 265, 346-355. family of transcription factor homologous to the products of the 27. Chen, X., Vinkemeier, U., Zhao, Y., Jeruzalmi, D., Darnell, J.E., Jr. & Drosophila runt and the human AML1 gene. Proc. Natl Acad. Sci. Kuriyan, J. (1998). Crystal structure of a tyrosine phosphorylated USA 90, 6859-6863. STAT-1 dimer bound to DNA. Cell 93, 827-839. 1256 Structure 1999, Vol 7 No 10

28. Muller, C.W. & Herrmann, B.G. (1997). Crystallographic structure of the T domain-DNA complex of the transcription factor. Nature 389, 884-888. 29. Akamatsu, Y., Tsukumo, S., Kagoshima, H., Tsurushita, N. & Shigesada, K. (1996). A simple screening for mutant DNA binding proteins: application to murine transcription factor PEBP2a subunit, a founding member of the Runt domain protein family. Gene 185, 111-117. 30. Golling, G., Li, L.-H., Pepling, M., Stebbins, M. & Gergen, J.P. (1996). Drosophila homologues of proto-oncogene product PEBP2/CBFb regulate the DNA-binding properties of Runt. Mol. Cell. Biol. 16, 932-942. 31. Kanno, T., Kanno, Y., Chen, L.-F., Ogawa, E., Kim, W.Y. & Ito, Y. (1998). Intrinsic transcriptional activation-inhibition domains of the polyomavirus enhancer binding protein 2/core binding factor α subunit revealed in the presence of the β subunit. Mol. Cell. Biol. 18, 2444-2454. 32. Chen, L., Glover, J.N.M., Hogan, P.G., Rao, A. & Harrison, S.C. (1998). Structure of the DNA-binding domains from NFAT, Fos, and Jun bound specifically to DNA. Nature 392, 42-48.

Because Structure with Folding & Design operates a ‘Continuous Publication System’ for Research Papers, this paper has been published on the internet before being printed (accessed from http://biomednet.com/cbiology/str). For further information, see the explanation on the contents page.