The EMBO Journal vol.13 no.15 pp.3413-3422, 1994 Crystal structure of the DNA modifying :-glucosyltransferase in the presence and absence of the substrate diphosphoglucose

Alice Vrielinkl, Wolfgang Ruger2, Rabussay, 1982). Moreover, the host DNA is degraded by Huub P.C.Driessen3 and Paul S.Freemont4 phage encoded (Warner et al., 1970; Mathews et al., 1983). To protect it's own genome against phage Protein Structure Laboratory, Imperial Cancer Research Fund, encoded and host restriction endonuclease Lincoln's Inn Fields, London WC2A 3PX, UK, 2Arbeitsgruppe Molekulare Genetik, Fakultat fUr Biologie, Ruhr Universitat, Bochum, systems, the phage has evolved a specific DNA modifica- Germany and 3ICRF Unit of Structural Molecular , Birkbeck tion system. In T-even phage this specific DNA modi- College, Malet Street, London WCIE 7HX, UK fication process involves two steps. First, cytosine is 'Present address: Department of , McGill University, replaced by 5-hydroxymethylcytosine which is incorpor- 3655 Drummond Street, Montreal, Canada ated into DNA synthesis forming hydroxymethylated DNA 4Corresponding author (HMC-DNA) (Wyatt and Cohen, 1952; Lamm et al., 1988). As a second step, in a post-replicative mechanism, Communicated by M.Crumpton the hydroxymethylated cytosines are glucosylated forming glucose-HMC-DNA (Revel, 1983): Bacteriophage T4 P-glucosyltransferase (EC UDP-glucose + HMC-DNA - catalyses the transfer of glucose from uridine diphos- glucosyl-HMC-DNA + UDP. phoglucose to hydroxymethyl groups of modified cyto- The enzymes catalysing DNA glucosylation in T4 phage sine bases in T4 duplex DNA forming f-glycosidic are a-glucosyltransferase (AGT) and 0-glucosyltransferase linkages. The enzyme forms part of a phage DNA (BGT) (Kornberg et al., 1961; Josse and Kornberg, 1962; protection system. We have solved and refined the Zimmerman et al., 1962). In T2 and T6 phage ,-glucosyl- crystal structure of recombinant P-glucosyltransferase transferase is replaced by ,B-glucosyl-HMC-a-glucosyl- to 2.2 A resolution in the presence and absence of transferase (Lehman and Pratt, 1960). While AGT and the substrate, uridine diphosphoglucose. The structure BGT form a- and P-glycosidic linkages directly to the comprises two domains of similar topology, each remin- hydroxymethylcytosine bases respectively, 3-glucosyl- iscent of a binding fold. The two domains HMC-a-glucosyltransferase links a second glucose are separated by a central cleft which generates a molecule in a 1,6, linkage to bases which have already concave surface along one side of the molecule. The been a-glucosylated (Lehman and Pratt, 1960). The gluco- substrate-bound complex reveals only clear electron sylation pattern occurs in a species-specific fashion for density for the portion of the each of the three phage strains (Lehman and Pratt, 1960). substrate. The UDPG is bound in a pocket at the The glucosylation reaction catalysed by these enzymes bottom of the cleft between the two domains and makes involves the transfer of glucose from host-synthesized extensive hydrogen bonding contacts with residues of uridine diphosphoglucose (UDPG) to the hydroxymethyl the C-terminal domain only. The domains undergo a group of cytosine bases in double-stranded DNA. rigid body conformational change causing the structure The for these three enzymes have been sequenced to adopt a more closed conformation upon and the proteins overexpressed and purified (Gram and binding. The movement of the domains is facilitated Ruger 1985; Tomaschewski et al., 1985; Winkler and by a hinge region between residues 166 and 172. Ruger 1993). Sequence comparisons among the three Electrostatic surface potential calculations reveal a glucosyltransferase enzymes show only limited homology large positive potential along the concave surface of (Tomaschewski et al., 1985; Winkler and Roger 1993), the structure, suggesting a possible site for duplex suggesting that these enzymes may have different three- DNA interaction. dimensional structures, although convergent structural Key words: DNA modification/enzyme/glucosylation/ evolution cannot be excluded. T-phage/X-ray crystal structure Apart from the protective function, glucosylation of phage DNA has also been implicated as having a con- trol function on phage-specific expression. Studies Introduction have shown that non-glucosylated T4 DNA is significantly more active in stimulating and protein T-even bacteriophage, and in particular T4, have been the synthesis (Cox and Conway, 1973; Roger 1978) and subject of extensive biochemical and genetic analyses that this control occurs during late leading to a detailed molecular understanding of T-phage (Dharmalingam and Goldberg, 1979). This control function infection, replication and assembly (Mosig and Eiserling, may involve the specificity of the phage-induced modifica- 1988). The extreme virulence of these phages is reflected tion of Escherichia coli RNA polymerase (Wu and in a complete cessation of host macromolecular synthesis Geiduschek, 1975) or the structure of the glucosylated immediately after phage infection (for a review see DNA template which may result in an altered susceptibility

© Oxford University Press 3413 A.Vrielink et al.

of glucosylated phage DNA to nucleases and other This is of particular importance since VSG expression is enzymes. Experimental studies have shown that glucosyl- the primary mechanism for parasitic surface coat replace- ated DNA is unable to undergo the from B ment, a mechanism which protects the parasite from to A conformation, whereas non-glucosylated DNA can immune recognition and neutralization (Bemards et al., (Mokulskaya et al., 1966). The glucose group of the 1984). The identification of a glucosylated form of DNA modified B-DNA would lie in the major groove and in other than phage suggests that this form of would sterically prevent major groove narrowing, an event DNA modification may be more widespread than has characteristic of B to A transition. Non-glucosylated DNA previously been thought and leads to further speculation therefore would have greater structural flexibility than on mechanisms of DNA protection and control of gene glucosylated DNA, allowing it potentially to adopt a larger expression. number of conformations. In order to understand, at the molecular level, this For many years DNA modification by cytosine hydroxy- unusual DNA modification process we have solved the and glucosylation have been found only in T- crystal structure of T4-phage [-glucosyltransferase. The even phage. More recently, however, a similar modification BGT structure represents the first example of an enzyme has been observed in the African trypanosome, Trypano- which glucosylates double-stranded DNA and provides soma brucei, where the modified base has been identified as the basis for studies into the mechanisms of non-specific 3-D-glucosyl-hydroxymethyluracil (Gommers-Ampt et al., DNA recognition combined with specific base modifica- 1993). The role of such a modification system in Tbrucei tion. Furthermore, the BGT structure may provide clues is unclear, although it has been suggested to be directly as to the mechanism of the trypanosome-specific DNA involved in the regulation of variant surface glycoprotein modification system which may have important therapeutic (VSG) gene expression (Gommers-Ampt et al., 1993). consequences. A


Fig. 1. Stereo diagram showing the regions of the final 2Fobs-FcaIc electron density maps for I-glucosyltransferase calculated using all reflections between 10 and 2.2 A and phases from the final model. The contour level used is 1.3 times the standard deviation of each map. (A) A view of the electron density for Ile94 and Tyr95 in the substrate-free structure. (B) A view of the density for uridine diphosphate in the substrate-bound structure.

3414 I--glucosyltransferase

Results and discussion ferase (Cheng et al., 1993a), DNA methyl- transferase (Moore et al., 1994) and -specific DNA Electron density map and quality of the model (Labahn et al., 1994) shows only limited The final electron density maps for both the substrate free similarity. Both the HhaI DNA methyltransferase and and substrate bound models were calculated using the adenine-specific DNA methyltransferase are monomers Fourier coefficients (2Fobs-Fcalc), (Xcaic. The maps show comprising two domains, one of which contains a nucleo- clear electron density for residues 1-67, 75-107 and tide binding fold that binds the substrate, S-adenosyl-L- 123-351. The substrate-free structure includes 184 water methionine. molecules and the UDPG bound model includes positions for the uridine diphosphate portion of the UDPG substrate UDPG binding and 221 water molecules. Two loops within the structure, In order to obtain a substrate-bound complex, data were 68-74 and 108-122 have no visible electron density and collected using crystals in which UDPG had not been thus could not be modelled. Figure 1 shows examples of removed. Difference electron density maps using data the electron density map where a clear interpretation of collected from these crystals, and the substrate-free struc- the structure was possible. Ramachandran plots of the two ture after rigid body and positional refinement, showed models (not illustrated) show that all non-glycine residues clear density only for the uridine diphosphate portion of lie inside the energetically allowed regions of (p/AJ space. the substrate. Further refinement and manual inspection For both structures Asp2O5, Asp263, Asp350 and His290 fall in left-handed helical regions. In addition, for the substrate-bound structure, Serl89 and Argl9l also fall in .5 the left-handed helical region. Table I gives the refinement ) statistics for each of the final models. Description of the overall structure A ribbon representation showing the overall structure of BGT is shown in Figure 2 with the strands and helices labelled as referred to in the text. The molecule has dimensions 45X45X55 A and consists of two domains which adopt similar topologies, as shown in Figure 3. The two domains are separated by a central cleft. The N- terminal domain comprises residues 1-165 and 338-351 and consists of a seven stranded parallel twisted P- sheet surrounded by seven a-helices. The second domain comprises residues 182-316 and consists of a six stranded parallel twisted ,-sheet and five a-helices. The two domains are linked by an extended chain region from residue 166-181 and a long a-helix (axl2) from residue Fig. 2. A ribbon representation of the structure of f-glucosyltransferase drawn using the program MOLSCRIPT (Kraulis, 1991). The 317-334. The fold for these topologically similar domains secondary structure elements of the protein are labelled as referred to resembles a Rossmann nucleotide binding motif in the text and in Figure 3. The atoms for the UDP portion of the (Rossmann et al., 1975). The C-terminal domain represents substrate are also shown in ball and stick representation. a more classical nucleotide binding motif, whereas slight deviations are observed in the N-terminal domain. Although the sequential arrangement of the secondary structure elements of the two domains is similar, a super- position of the domains gives no significant three-dimen- sional overlap. Indeed, an analysis of the sequences of the two domains does not reveal any significant homology. A topological comparison of the structure of BGT with the DNA modification enzymes, HhaI DNA methyltrans-

Table I. Final crystallographic refinement statistics

Substrate-free Substrate- structure bound complex R factor (%)a 19.4 19.1 Resolution (A) 10-2.2 10-2.2 Fig. 3. A schematic representation of the topology of P- No. of reflections 18758 20885 glucosyltransferase. The a-helices are depicted by cylinders and No. of protein atoms 2683 2697 the 5-strands by arrows. The secondary structure assignments have No. of solvent molecules 184 221 been determined as defined by Kabsch and Sander (1983). Dotted lines Average B factor for the model (A2) 29.7 21.0 indicate regions of the structure which could not be modelled. The R.m.s. deviation in bond lengths (A) 0.011 0.011 secondary structure elements from each domain are aligned to show their topological similarity, e.g. al and aS are equivalent to ax7 and aR factor = 100 X XIFo-F I/1IF L atI 1 respectively.

3415 A.Vrielink et aL

of the electron density maps calculated using the Fourier oxygens of the ligand is therefore occupied by a predomin- coefficients 3FOb,-S2FCalc and Fobs-FCalc showed some dis- ance of positively charged side chains, presumably to connected electron density near to the terminal phosphate balance the high negative charge of the phosphate groups. oxygen atoms of UDP. In order to ascertain whether the This is in contrast to what has been seen in more classic crystal used for data collection had UDP or UDPG bound nucleotide binding proteins, where the negative charge of to the enzyme, a new crystal was soaked overnight in the phosphates is partially accommodated by a helix dipole fresh mother liquor containing 3 mM UDPG and data (Wierenga et al., 1985, 1986). In nucleotide binding were collected to 2.2 A resolution. Inspection of the proteins, the close approach of the phosphate groups to difference electron density map again showed density only the N-terminus of the helix is enabled by the conserved for the UDP portion of the substrate. The fragmented pattern of glycine residues, which is not seen in the density is situated in a pocket which is of the appropriate sequence of BGT. size to accommodate a glucose ring. It was not possible The uracyl ring is held in position through hydrogen from the electron density, however, to determine whether bonding contacts between 04 and N3 of the ring and the glucose is present in a disordered conformation or if the main chain nitrogen and oxygen atoms of Ile238 the sugar ring had been cleaved off by the enzyme. In respectively. Only 02 of the uridine ring does not make order to determine whether UDPG or UDP is present, a any interaction with the protein. The ring of the sample of crystals and mother liquor were analysed by ligand adopts a C2'-endo ring pucker and is held in HPLC using a DEAE-cellulose ion exchange column. position through contacts between 02' The results showed only UDPG present in both the mother and 03' and the carboxyl side chain of Glu270. Similar liquor and in the dissolved crystals. This confirms that the interactions between the ribose hydroxyl groups and the ligand bound to the enzyme in the crystals is UDPG rather side chains of glutamate or aspartate residues have been than UDP and the glucose portion of the substrate must observed in a number of structures of nucleotide binding be present in a disordered conformation. proteins (Eklund et al., 1984; Skarzynski et al., 1987). The UDPG substrate binds to the protein in the cleft The UDP ligand is bent slightly away from a fully between the two domains. The UDP portion can be divided extended conformation and is held in this conformation into three regions, the phosphate groups, the ribose ring by H20488, which makes hydrogen bonds to 04P of the and the uracyl ring, all of which make hydrogen bonding terminal phosphate group and 03' of the ribose ring. contacts to the C-terminal domain of the protein and to water molecules (Figure 4). Three phosphate oxygen atoms Comparison of the substrate-free structure with make hydrogen bond contacts to the guanidinium groups the UDPG complex of three arginine residues (191, 195 and 269), three water Comparison between the structures of the substrate-free molecules (487, 488 and 489) and the main chain nitrogen and substrate-bound structures reveals significant changes atom of Serl89. The region surrounding the phosphate as a result of binding uridine diphosphoglucose. The

Fig. 4. A stereo view of the hydrogen bond interactions made between the UDP portion of the substrate and the surrounding protein and water molecules. The UDP portion of the substrate is shown in open bonds and the protein is shown in closed bonds. Three water molecules are shown by double circles. Hydrogen bond interactions are shown by dotted lines.

3416 -9glucosyltransferase substrate-free structure adopts an 'open' conformation in movement is generated by a combination of a number of a similar fashion to the structures of citrate synthase small main chain dihedral angle changes along the entire (Remington et al., 1982; Wiegand et al., 1984), hexokinase hinge region of the structure rather than large changes to (Anderson et al., 1978; Bennett and Steitz, 1978, 1980) specific residues within the hinge region. Adjacent to the and the periplasmic binding proteins, a number of which hinge region, residues 173-181 adopt a PPII (polyproline) have been solved in the closed/liganded and in the open/ type helical conformation (Adzhubei and Steinberg, 1993). free forms (Spurlino et al., 1991; Kang et al., 1992; Interestingly, a hinge region has also been observed in the Sharff et al., 1992; Oh et al., 1993). Although the structures structures of periplasmic binding proteins (Sack et al ., of periplasmic binding proteins show some overall similar- 1989a,b; Sharff et al., 1992; Zou et al., 1993) and is ity to BGT in that they contain two al, domains with a believed to be responsible for mediating the conforma- central cavity, no three-dimensional superposition can be tional change of the molecule upon ligand binding. obtained. Upon binding of the UDPG substrate in BGT, It is interesting to note that derivatization with a conformational change occurs resulting in an approxi- K2Pt(NO2)4 was only possible with crystals which had mate 5° rigid body rotation of the C-terminal domain to not had the substrate removed. However, upon soaking a closed conformation. Superposition of the residues with the heavy metal, the cell dimensions were identical making up the N-terminal domains of the two structures to those of the substrate-free crystals. The platinum metal results in an r.m.s. deviation of 1.4 A for all a carbon atoms binds to a methionine residue (Met 169) in the hinge region and an r.m.s. deviation of 2.0 A for the a carbon atoms of and appears to induce the enzyme to dissociate from the C-terminal domain. A superposition of the two struc- UDPG and adopt the substrate-free conformation. It may tures is shown in Figure 5. be that the more mobile nature of this hinge region in the The central region of helix a12 and residues 166-172 substrate-free structure does not allow the metal to bind between the two domains, form the region of the structure to Met169. However, in the presence of UDPG, the responsible for the domain movement and is designated conformation of this side chain is more ordered, thus as the hinge region. The electron density of residues 166- enabling K2Pt(NO2)4 to bind in an isomorphous fashion. 172 is poorly defined in the substrate-free structure with Apart from this hinge region, a loop (residues 188- temperature factors which are significantly higher than 195), situated in the C-terminal domain between strand those observed in the UDPG-bound structure. Two salt ,B8 and helix a7, moves significantly as a result of substrate bridges in the hinge region, between Asp171 and Lys28 binding (see Figure 5). A number of residues (Argl91, and Lys 166 and Glu272, are observed in the UDPG-bound Ser192 and Gly193) within this loop fall in the left-handed structure but are not present in the substrate-free structure. helical region of the Ramachandran plot. In the substrate- These salt bridges involve residues from each domain free structure the electron density for this loop is poorly and contribute to the observed domain movement. A defined and the temperature factors are high, indicating comparison of the main chain dihedral angles in this conformational flexibility. In contrast, the electron density region do not show any significant differences in individual for this loop region in the UDPG-bound structure is well (p/A angles upon UDPG binding. Therefore the observed defined and the temperature factors are comparably low

Fig. 5. A stereo diagram showing the superposition of the a carbon atoms for the substrate-free and UDP-bound complex of 3-glucosyltransferase. The matrix for the superposition was calculated using only residues in the N-terminal domain (1-66, 77-106, 125-169 and 338-351). After superposition, the r.m.s separation for all a carbon atoms is 1.4 i. The r.m.s separation for the a carbon atoms in the UDP binding domain (I182-316) is 2.0 A. Red indicates the stibstrate-bound structure and green the substrate-free structure. A van der Waals representation for the UDP portion of the substrate is shown in yellow.

3417 A.Vrielink et al.

(the average temperature factors for this loop in the Proposed glucose binding site and implications for structure of the complex are 22.2 A2 for main chain atoms catalysis and 20.4 A2 for side chain atoms, whereas in the substrate- Although no significant electron density could be observed free structure the temperature factors are 53.3 A2 and for the glucose ring of the substrate, its approximate 53.4 A2 respectively). Three salt bridge interactions, location in the structure can be inferred from the position Argl91-AsplOO, Argl91-Asp258 and Argl95-Asp258 of the UDP portion of the ligand. The glucose moiety are present in the UDPG-bound structure but are not must lie in a pocket near to 05P of the terminal phosphate observed in the substrate-free structure. The side chains group of the UDP ligand. The other two oxygen atoms of Argl9l and Arg195 are also involved in hydrogen attached to the terminal phosphate group, 04P and 06P, bonding interactions with the phosphate oxygen atoms of interact with protein residues and water molecules as the substrate. Thus this loop is held in a specific conforma- shown in Figure 5. Some unconnected difference electron tion by both salt bridge interactions with other protein density is visible in this pocket. However, it was not side chains and by hydrogen bond interactions with the possible to unambiguously model the glucose ring into substrate. this density. The pocket, bounded by the UDP portion of An additional region of the structure, corresponding to the substrate and by residues from the N-terminal domain, residues 235-238 shows a significant change upon UDPG is exposed to the external solvent environment via a binding. The main chain of Ile238 in this region is also channel extending from the concave surface between the involved in hydrogen bonding interactions with the uracyl two domains of the molecule. The top of this channel is ring of the substrate, as described above. The main chain lined by the side chains of Vall8, Prol9, Ser67, ArglO2, electron density in this region is poorly defined in the Leul.03, Asn215 and H20442. substrate-free structure and the model is characterized by The reaction catalysed by BGT involves the transfer of high temperature factors (45.1 A2 for main chain atoms glucose from the substrate, UDPG, to the 5-hydroxy- and 46.5 A2 for side chain atoms), indicating considerable methylcytosine base of phage DNA, with the release of flexibility. In contrast, the temperature factors for this UDP. The terminal phosphate group of UDPG, which acts region in the UDPG-bound structure are significantly as the leaving group in the transfer of glucose to the base, lower (22.2 A2 for main chain atoms and 22.8 A2 for side is covered by the side chain of Argl9l, which makes salt chain atoms) and the chain is conformationally fixed by bridge interactions with the side chains of Asp258 and hydrogen bond interactions with the uracyl ring. AsplOO. It is known that BGT requires the presence of Therefore, the binding of the substrate causes a con- Mg2+ for catalysis (Josse and Kornberg, 1962). Divalent formational change in the molecule, resulting in a domain cations have been observed in the structures of a number movement. A number of salt bridge interactions are of phosphodiesterase enzymes and are thought to either observed between residues in both domains, which presum- activate a nucleophile and/or stabilize the phosphate oxy- ably contribute to the domain movement reducing the anion leaving group, as suggested for the mechanism of overall flexibility of the molecule. In addition, a number the 3'-5' exonuclease activity of the Klenow fragment of regions in the structure become more ordered upon (Freemont et al., 1988). In a similar fashion, a divalent substrate binding as a result of interactions with atoms metal ion in BGT could act to stabilize the negative from the UDP portion of the substrate. It should be noted charge on the uridine diphosphate leaving group. The that the observed conformational change between the two crystallization conditions for BGT, however, did not con- structures may be constrained by crystal packing effects tain any , nor were any metal ions added which could prevent further changes. during the purification (Tomaschewski et al., 1985). Thus,

Fig. 6. Stereo view of the active site region of the substrate-bound region of BGT. The UDP portion of the substrate is shown in open bonds and the protein is shown in closed bonds. The salt bridges between AsplOO, Argl9l and Asp258 are shown as dotted lines.

3418 3-glucosyltransferase although the protein binds divalent metal ions, none are ring by the 5-hydroxymethyl group from the modified present in the crystal structure. The BGT structure was cytosine base. Two acidic residues, AsplOO and Glu22, inspected for the presence of potential metal binding are located in the region of the active site and are residues in the vicinity of the terminal phosphate groups positioned such that their carboxylate side chains are of the UDP ligand. Two aspartic acid residues, AsplOO exposed on either side of the channel extending from the and Asp258, are located near to this phosphate group, concave surface of the structure. Interestingly, the channel however, in the substrate-bound structure both are involved is ofthe appropriate dimensions to accommodate a cytosine in salt bridge interactions with Argl91, as described above nucleotide and is lined by a number of hydrophobic and shown in Figure 6. In addition, the hydroxyl group residues which could provide van der Waals contacts of Tyr261 lies near to the two carboxyl groups of Asp 1OO with the aromatic ring of the base (see Figure 7). The and Asp258 and to the guanidinium side chain of Argl9l. carboxylate group of Glu22 lies in a position which would These residues could all provide the necessary ligands for be accessible to the proposed position of the nucleotide coordinating the metal ion. base and thus may activate the 5-hydroxymethyl group Although the mechanism of glucose transfer to the 5- for nucleophilic attack on the glucose ring. If such an hydroxymethylcytosine base is at present unknown, the arrangement of nucleotide base, UDPG and metal ion were structure of the substrate-bound form of BGT allows us correct, it would suggest that the 5-hydroxymethylcytosine to speculate on possible mechanisms. The transfer reaction base loops out of the double helix of DNA in a similar could involve a nucleophilic attack on Cl of the glucose manner to that which has been observed in the structure

Fig. 7. Stereo view towards the concave surface showing the van der Waals surface for the protein in green and the van der Waals surface for the UDP portion of the substrate in pink. A channel can be seen extending from the base of the concave surface into the substrate binding pocket.

Fig. 8. A view of the electrostatic potential surface for 3-glucosyltransferase. Red contours correspond to -2kTle and blue contours correspond to +2kT/e. The calculation was carried out with the program FDCALC (Warwicker and Watson, 1982) using an ionic strength of 0.1 M, pH 6.5 and dielectric constants of 3.0 for the protein and 80.0 for the solvent. The structure of the substrate-bound complex was used for the calculation. The unobserved loop regions and the UDP ligand were omitted from the calculation. A ribbon representation of the molecule is shown in yellow and the UDP ligand is represented by a dot surface. 3419 A.Vrielink et al.

Table II. Data collection statistics for BGT Data Soak Soak time Method of Resolution Total Independent % complete Rmerge (%)a Ranom (%) Rderiv (%) concentration (days) data (A) reflections reflections (mM) collection

Substrate-free FASIt 2.2 52 070 19 074 84.3 8.4 UDPG complex FASTc 2.2 74 613 21 380 94.5 8.6 K2Pt(NO2)4 1.0 1 Xentronicsd 2.8 21 762 8435 (3a;) 76.7 4.0 6.7 18.1 K2PtAuCl4e 1.0 1 FASTC 3.0 20 972 9088 98.9 8.1 5.4 19.1 K2HgI4 0.1 3 FASTc 3.0 16 751 8940 97.3 4.7 4.9 16.7 K2Pt(NO2)4 1.0 1 FASIF 2.9 28 025 9691 95.5 7.9 6.0 20.8 K2Pt(NO2)4 + K2AuC14e 1 + 1 1 +3 h FAST 3.0 20274 8790 95.2 11.1 6.4 19.4 aRmerge = Y_IIh,ij-hI/XXIhji (summed over all intensities). bRderiv = XlFdenvh-FnathlYFnath (in the resolution range 10-3.0 A). cData collected at Glaxo Research Laboratory, Oxford. dData collected in the Biophysics Department, University of Leeds. eSoak carried out in 100 mM acetate buffer (pH 5.6). of the HhaI DNA methyltransferase complex with DNA of complexes of DNA binding protein with bound DNA (Cheng et al., 1993b). (Schultz et al., 1991; Winkler et al., 1993). The inferred position of the glucose ring, deeply buried in the structure, Implications for DNA binding and the presence of a channel extending from the surface An inspection of the positions of charged residues along to the glucose binding pocket suggest that significant the surface of the protein shows a predominance of lysines conformational changes must occur in order for the glucose and arginines along the concave surface. Eleven positively phosphate bond to be accessible to the hydroxymethyl charged residues are positioned along the surface, Lys16, group of the modified cytosine base. As has been suggested Lys43, ArglO2, Lysl49 and Lysi50 from the N-terminal above, these changes might occur to the DNA in the form domain and Arg217, Lys219, Lys222, Lys225, Lys237 of the reactive base flipping out of the helical structure and Lys259 from the C-terminal domain. The side chains into the channel or the protein may undergo further of residues 217, 219, 222 and 225 lie along one edge of changes. Further analysis of the specific interactions and the concave surface and are not involved in crystal the mechanism await a structure of the complex of BGT contacts. In contrast, only three negatively charged res- with DNA and UDPG in the presence of a divalent cation. idues are located along this surface, AsplOO, Asp258 and We have determined the structure of the DNA modifica- Glu196. The side chains of AsplOO and Asp258 are both tion enzyme 3-glucosyltransferase both in the presence involved in salt bridge interactions with Argl91 and and absence of the substrate, uridine diphosphoglucose. therefore do not contribute fully to the overall positive The enzyme represents a novel structure for DNA binding charge along this surface. The positions of this large proteins. From a central cleft, a channel extends into the number of positively charged residues provide strong molecule to form the active site where the substrate binds. evidence that the DNA double helix will lie along this Upon binding of the substrate a movement of the two concave surface. To further illustrate this, an electrostatic domains relative to each other is observed, resulting in a potential surface has been calculated for the molecule more closed conformation of the structure and increasing using the program FDCALC (Warwicker and Watson, the interactions between the two domains. The cleft 1982) at pH 6.5 (Figure 8). As expected, the surface between the two domains is lined by positively charged shows a significant positively charged region of the residues providing a surface for DNA interaction. molecule along the concave surface, suggesting this to be the position of the DNA. The dipole moments of helices al, a7, a9 and a3 may also contribute to the positive electrostatic potential surface. 3-Glucosyltransferase does Materials and methods not recognize any specific nucleotide sequence, and it may only be necessary for the enzyme to recognize the Crystallization, heavy metal derivatives and data collection The enzyme P-glucosyltransferase was crystallized as described by modified base, 5-hydroxymethylcytosine. It is therefore Freemont and Ruger (1988). The crystals were grown in the presence likely that the protein contacts the DNA through inter- of the substrate, uridine diphosphoglueose. The density of the crystals actions with the phosphate backbone and thus a large was measured using a Ficoll 400 (Pharmacia) gradient based on the positively charged surface along the protein would provide method described by Westbrook (1985). The density of the gradient was a suitable contact surface. An attempt to model a double assessed using droplets of toluene and carbon tetrachloride, together with a crystal of known density. The density of BGT was found to be helical DNA structure along this surface produced a large 1.14 g/ml, corresponding to one molecule in the asymmetric unit and a number of bad contacts and the modified nucleotide base solvent content of 55%. in a helical conformation was not able to access the active For the heavy atom derivative soaks using K2AuCl4 and K2HgI4 the site of the enzyme. One cannot, however, rule out the substrate was removed by a stepwise procedure where the concentration that changes in the structures of the substrate was gradually reduced. The cell dimensions of the possibility conformational substrate-bound crystals were a = 151.92 A, b = 52.26 A, c = 52.74 of the protein and/or the DNA may occur upon complex A, while the desoaked crystals had cell dimensions of a = 152.88 A, formation, as has been observed in a number of structures b = 52.25 A, c = 53.66 A. The observed change in cell dimensions 3420 P-glucosyltransferase

coordinates of the substrate free structure have been deposited in the Table III. Heavy atom refinement statistics for BGT Protein Data Bank, Brookhaven. Derivative No. of heavy RcU11is Phasing power Crystallographic refinement of the substrate-bound complex atom sites The coordinates for the substrate-free complex were initially used for the refinement of the UDPG complex. Rigid body refinement was applied K2Pt(NO)4 5 0.68 2.1 to the substrate-free structure using ligand bound data to 2.8 A resolution. K2AuCI4 3 0.65 1.6 The structure was divided into three rigid bodies consisting of residues K2HgI4 2 0.95 0.5 1-170, 171-317 and 318-351. The starting R factor was 38.1% on all K,Pt(NO2)4 4 0.71 1.5 reflections between 10 and 2.8 A. Rigid body refinement followed by K2Pt(NO2)4 + K2AuCl4 7 0.7 1.7 conjugate-gradient minimization and individual B factor refinement was carried out and the model rebuilt, incorporating the UDP portion of the Heavy atom refinement was carried out using the program substrate, by examining the difference electron density maps. Coordinates MLPHARE. The overall figure of merit after heavy atom refinement for ligands were obtained by modelling UDP using QUANTA. Sub- and phasing was 0.59 on 8605 reflections from 20 to 2.8 A. The sequent cycles of conjugate-gradient minimization on all reflections in phases were modified by applying three cycles of solvent flattening the resolution range 10-2.2 A (20 885 reflections) followed by manual using a solvent content of 50%. rebuilding gave a final R factor for the UDP-bound complex of 19.1%. The r.m.s. bond length deviation is 0.01 A. The coordinates for the UDP ligand complex have been deposited in the Protein Data Bank, upon removal of the substrate suggests a possible conformational change Brookhaven. in the molecule. Derivatization of the desoaked crystals with K2Pt(NO2)4 was unsuccessful. However, soaking substrate-bound crystals in K2Pt(NO2)4 gave a suitable derivative with cell dimensions similar to Acknowledgements the native desoaked crystals. Similarly, a double derivative was obtained by firstly soaking crystals in K2Pt(NO2)4 followed by transferring the We would like to thank Simon Phillips and Nobutoshi Ito of the crystals to a solution containing K2AuCI4. The substrate-free crystals University of Leeds and Alan Wonacott of Glaxo Research for providing were stored in the mother liquor solution containing 65% saturated us with data collection facilities. We would also like to thank Ursula ammonium sulfate, 100 mM MES, pH 5.6, and 0.02% sodium azide. Aschke for isolating and purifying the protein, Michael Gorman for Native data were collected using both a Xentronics and FAST area measuring the crystal density, Jim Warwicker for calculating the electro- detector and a rotating anode generator using graphite monochromatized static potential surface and Suhail Islam for assistance with his graphics CuKa radiation. The derivative data sets were collected on an Enraf program, PREPI. W.R. wishes to thank the D.F.G. for funding and A.V. Nonius FAST television area detector with no crystal cooling. Frames wishes to thank the European Community for support in the form of a of 0.10 were collected with the crystal-detector distance set at 90 mm. postdoctoral research fellowship. Images from the area detector were evaluated using the program MADNES (Messerschmidt and Pflugrath, 1987). Further processing and References scaling were carried out using the CCP4 program suite (Daresbury, UK). Details of the data collection are given in Table II. Adzhubei,A.A. and Stemberg,M.J.E. (1993) J. MoI. Biol., 229, 472 - 493. Anderson,C.M., Stenkamp,R.E. and Steitz,T.A. (1978) J. Mol. Biol., Phasing and model building 123, 15-33. Two heavy atom sites from the Pt(NO2)4 derivative were obtained Bennett,W.S. and Steitz,T.A. (1978) Proc. Natl Acad. Sci. USA, 75, from a three-dimensional difference Patterson map. The positions and 4848-4852. occupancies of these sites as well as the scale and temperature factors Bennett,W.S. and Steitz,T.A. (1980) J. Moi. Biol., 140, 211-230. relating the derivative data to the native data were refined using the Bernards,A., VanHarten-Loosbroek,N. and Borst,P. (1984) Nucleic Acids phase refinement program MLPHARE (CCP4 program suite, Daresbury, Res., 12, 4153-4170. UK). The anomalous scattering data were used to determine the absolute Brunger,A.T., Kuriyan,J. and Karplus,M. (1987) Science, 235, 458-460. configuration of the structure. The remaining derivatives were located Cheng,X., Kumar,S., Posfai,J., Pflugrath,J.W. and Roberts,R.J. (1993a) using difference Fourier maps calculated with the single isomorphous Cell, 74, 299-307. replacement (s.i.r.) phases. Heavy atom refinement and phase calculations Cheng,X., Kumar,S., Sha,M. and Roberts,R.J. (1993b) Acta Crystallogr, were carried out using all reflections. The final multiple isomorphous A49 (Suppl.), 61. replacement (m.i.r.) phases used to calculate a 'best' native Fourier map Cox,G.S. and Conway,T.W. (1973) J. Virol., 12, 1279-1287. were obtained using the refined parameters from 21 sites for five heavy Dharmalingam,K. and Goldberg,E.B. (1979) Virology, 96, 393-403. atom derivatives. Table III gives the final parameters for each of the Eklund,H., Samama,J.P. and Jones,T.A. (1984) Biochemistry, 23, heavy atom derivatives. A mean figure of merit of 0.59 was calculated 5982-5996. for the final m.i.r. phases. Density modification using the solvent- Freemont,P.S. and Ruger,W. (1988) J. Mo. Biol., 203, 525-526. flattening procedure of Wang (1985) was used to improve the quality of Freemont,P.S., Friedman,J.M., Beese,L.S., Sanderson,M.R. and the electron density map. Steitz,T.A. (1988) Proc. Natl Acad. Sci. USA, 85, 8924-8928. A 2.8 A resolution electron density map was calculated using the Gommers-Ampt,J.H., VanLeeuwen,F., DeBeer,A.L.J., Vliegenthart, combined phases, weighted by the figure of merit. The polypeptide chain J.F.G., Dizdaroglu,M., Kowalak,J.A., Crain,P.F. and Borst,P. (1993) was modelled into the electron density map using the graphics program Cell, 75, 1129-1136. O (Jones et al., 1991). The initial model consisted of 282 amino acid Gram,H. and Ruger,W. (1985) EMBO J., 4, 257-264. residues (80%), of which 219 were modelled with side chain atoms Jones,T.A., Zou,J.-Y., Cowan,S. and Kjeldgaard,M. (1991) Acta (62% of the total residues in the structure). Crvstallogr., A47, 110-119. Josse,J. and Kornberg,A. (1962) J. Biol. Chem., 237, 1968-1976. Crystallographic refinement of the substrate-free structure Kabsch,W. and Sander,S. (1983) Biopolymers, 22, 2577-2637 The initial model was refined with the molecular dynamics program Kang,C.H., Shin,W.-C., Yamagata,Y., Gokcen,S., Ames,G.F.-L. and XPLOR (Brunger et al., 1989). The starting crystallographic R factor Kim,S.-H. (1992) J. Biol. Chem., 266, 23893-23899. for the structure was 45.3% on all reflections from 10.0 to 2.8 A. After Kornberg,S.R., Zimmerman,S.B. and Kornberg,A. (1961) J. Biol. Chem., applying simulated annealing, using the slow cooling protocol, and 236, 1487-1493. refining the overall temperature factor, the R factor was reduced to 27.7%. Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946-950. Electron density maps were calculated using the Fourier coefficients Labahn,J., Granzin,J., Schluckebier,G., Robinson,D.P., Jack,W.E., 3Fobs-2FcaIc and Fobs-Fcalc and the model rebuilt by examining the Schildkraut,I. and Saenger,W. (1994) Proc. Natl Acad. Sci. USA, difference electron density. Additional side chains and some segments in press. of the main chain were added to the model. Subsequent rounds of Lamm,N., Wang,Y., Mathews,C.K. and Ruger,W. (1988) Eur J. Biochem., refinement were carried out, replacing the overall B factor by restrained 172, 553-563. individual B factors and extending the resolution to 2.2 A. The current Lehman,I.R. and Pratt,E.A. (1960) J. Biol. Chem., 235, 32543259. model has an R factor of 19.4% using all reflections in the resolution Mathews,C.K., Kutter,E.M., Mosig,G. and Berget,P.B. (eds) (1983) range 10.0-2.2 A, with an r.m.s. bond length deviation of 0.01 A. The Bacteriophage T4. American Society for Microbiology, Washington. 3421 A.Vrielink et aL

Messerschmidt,A. and Pflugrath,J.W. (1987) J. Appl. Crystallogr, 20, 306-315. Mokulskaya,T.D., Gorlenko,A.M., Zumchuk,L.A., Bogdanova,E.S., Mokulskii,M.A., Goldfarb,D.M. and Khesin,R.B. (1966) Biokhimiya, 31, 749-759. Moore,M.H., Gulbis,J.M., Dodson,E.J., Demple,B. and Moody,P.C.E. (1994) EMBO J., 13, 1495-1501. Mosig,G. and Eiserling,F. (1988) In Calendar,R. (ed.), The Bacteriophages. Plenum Publishing Corp., New York, pp. 521-606. Oh,B.-H., Pandit,J., Kang,C.-H., Nikaido,K., Goken,S., Ames,G.F.-L. and Kim,S.-H. (1993) J. Biol. Chem., 268, 11348-11355. Rabussay,D. (1982) In Cohen,P. and van Heyningen,S. (eds), Molecular Action of Toxins and Viruses. Elsevier Biomedical Press, pp. 219-331. Remington,S., Wiegand,G. and Huber,R. (1982) J. Mol. Biol., 158, 111-152. Revel,H.R. (1983) In Mathews,C.K., Kutter,E.M., Mosig,G. and Berget,P.B. (eds), Bacteriophage T4. American Society for Microbiology, Washington, DC, pp. 156-165. Rossmann,M.G., Liljas,A., Branden,C.-I. and Bansazak,L.J. (1975) In Boyer,P.D. (ed.), The Enzymes. Academic Press, New York, Vol. 11, pp. 61-102. Ruger,W. (1978) Eur J. Biochem., 88, 109-117. Sack,J.S., Saper,M.A. and Quiocho,F.A. (1989a) J. Mol. Biol., 206, 17-191. Sack,J.S. Trakhanov,S.D., Tsigannik,I.H. and Quiocho,F.A. (1989b) J. Mol. Biol., 206, 193-207. Sharff,A.J., Rodseth,L.E., Spurlino,J.C. and Quiocho,F.A. (1992) Biochemistry, 31, 10657-10663. Schultz,S.C., Shields,G.C. and Steitz,T.A. (1991) Science, 253, 1001-1007. Skarzynski,T., Moody,P.C.E. and Wonacott,A.J. (1987) J. Mol. Biol., 193, 171-187. Spurlino,J.C., Lu,G.-Y and Quicho,F.A. (1991) J. Biol. Chem., 266, 5202-5219. Tomaschewski,J., Gram,H., Crabb,J.W. and Ruger,W. (1985) Nucleic Acids Res., 13, 7551-7568. Wang,B.C. (1985) Methods Enzymol., 115, 90-112. Warner,H.R., Snustad,D.P., Jorgensen,S.E. and Koerner,J.F. (1970) J. Virol., 5, 700-708. Warwicker,J. and Watson,H.C. (1982) J. Mol. Biol., 157, 671-679. Westbrook,E.M. (1985) Methods Enzymol., 114, 187-196. Wiegand,G. Remington,S., Deisenhofer,J. and Huber,R. (1984) J. Mol. Biol., 174, 205-219. Wierenga,R.K., De Maeyer,M.C.H. and Hol,W.G.J. (1985) Biochemistry, 24, 1346-1357. Wierenga,R.K. Terpstra,P. and Hol,W.G.J. (1986) J. Mol. Biol., 187, 101-107. Winkler,M. and Ruger,W. (1993) Nucleic Acids Res., 21, 1500. Winkler,F.K., Banner,D.W., Oefner,C., Tsernoglou,D., Brown,R.S., Heathman,S.P., Bryan,R.K., Martin,P.D., Petratos,K. and Wilson,K.S. (1993) EMBO J., 12, 1781-1795. Wu,R. and Geiduschek,E.P. (1975) J. Mol. Biol., 96, 539-562. Wyatt,G.R. and Cohen,S.S. (1952) Nature, 170, 1072-1073. Zimmerman,S.B., Kornberg,S.R. and Kornberg,A. (1962) J. Biol. Chem., 237, 512-518. Zou,J., Flocco,M.M. and Mowbray,S.L. (1993) J. Mol. Biol., 233, 739-752. Received on March 2, 1994; revised on May 20, 1994