doi:10.1006/jmbi.2000.3844availableonlineathttp://www.idealibrary.comon J. Mol. Biol. (2000) 300, 353±362

A TOPRIM Domain in the Crystal Structure of the Catalytic Core of Confirms a Structural Link to DNA

MarjetkaPodobnik1,3,PeterMcInerney2,MikeO'Donnell2 andJohnKuriyan1*

1Laboratories of Molecular synthesize short RNA strands on single-stranded DNA tem- Biophysics and plates, thereby generating the hybrid duplexes required for the initiation of synthesis by DNA . We present the crystal structure of the 2Laboratory of DNA catalytic unit of a primase , that of a 320 residue fragment of Replication, Howard Hughes Escherichia coli primase, determined at 2.9 AÊ resolution. Central to the cat- Medical Institute, The alytic unit is a TOPRIM domain that is strikingly similar in its structure Rockefeller University, New to that of corresponding domains in DNA topoisomerases, but is unre- York, NY 10021, USA lated to the catalytic centers of other DNA or RNA polymerases. The cat- 3Department of Biochemistry alytic domain of primase is crescent-shaped, and the concave face of the and Molecular Biology, Jozef crescent is predicted to accommodate about 10 base-pairs of RNA-DNA Stefan Institute, Slovenia duplex in a loose interaction, thereby limiting . # 2000 Academic Press Keywords: DNA replication; RNA-; primase; TOPRIM; *Corresponding author

Introduction 65.5 kDa (581 residues) that contains three functionally distinct regions that are separ- able by proteolysis. The N-terminal region None of the known DNA polymerases is able to initiate DNA synthesis by using single-stranded (12 kDa, residues 1-110) contains a zinc-binding DNA as templates. Instead, DNA replication relies domain that is required for primase function, on a variety of mechanisms to generate short pri- potentially because of a role in recognizing single- mer regions, typically RNA-DNA hybrid duplexes, stranded DNA. The crystal structure of this that then serve as initiation sites for DNA synthesis domainhasbeendetermined(Pan&Wigley, by DNA polymerases. During chromosomal repli- 2000).Acentralcatalyticdomain(36kDa,resi- cation, in particular, the repeated generation of dues 111-433) is responsible for the catalysis of RNA primers required for the replication of lag- RNA synthesis, and a C-terminal fragment (resi- ging strands, is brought about by DNA-dependent dues 434-581) is involved in interactions between RNA polymerases, known as primases. theprimaseandthehelicase(Tougu&Marians, The prototypical bacterial primase is the enzyme 1996;Touguetal.,1994).Primaseactivityrequires encoded by the dnaG gene of Escherichia coli both the N-terminal zinc-binding domain and the (Kornberg&Baker,1991).Allknownbacterialpri- central catalytic domain. mases, as well as primases from several bacterio- Bacterial and bacteriophage primases of the phages, are homologous to E. coli primase. DnaG family have no apparent relationship to Referred to collectively as DnaG-type primases, eukaryotic and archaebacterial primases, or to any these are often closely associated with DNA heli- otherRNAorDNApolymerases(Aravindetal., cases in mobile assemblies known as primosomes, 1998;Leipeetal.,1999).Unexpectedly,iterative which track the moving replication fork sequence pro®le searches using the program PSI- (McMacken&Kornberg,1978).E.coliprimaseisa BLAST and the E. coli primase sequence as a start- ing input revealed a potential structural relation- ship between a central region (100 residues) of Abbreviations used: MAD, multiwavelength anomalous dispersion; TEV, tobacco etch virus; NCS, the catalytic domain of DnaG-type primases and non-crystallographic symmetry. otherwise unrelated such as certain DNA E-mail address of the corresponding author: topoisomerases, several nucleases and proteins [email protected] involvedinrecombinationalrepair(Aravindetal.,

0022-2836/00/020353±10 $35.00/0 # 2000 Academic Press 354 Structure of E. coli Primase Catalytic Core

1998).ThissharedregionwasnamedtheTOPRIM stranded DNA template and the nascent RNA- domain (for topoisomerase/primase). The level of DNA duplex with the primase. Our conclusions sequence similarity upon which the presence of are in general agreement with an independent TOPRIM domains in bacterial primases was postu- report of the structure of the catalytic core of E. coli lated is relatively low. The TOPRIM domains in primase that was published at the same time as the primases are smaller than those in topoisomerases submissionofthispaper(Kecketal.,2000). (80 residues instead of 120 residues), and only 10 % of the residues in the TOPRIM domains of Results and Discussion known structure (all of which are in topoisome- rases) are preserved identically in any one of the General features of the structure primase sequences. The boundaries of the catalytic core of E. coli pri- We have used X-ray crystallography to deter- mase were de®ned by subjecting the full-length mine the crystal structure of the catalytic core of protein to limited proteolytic digestion with tryp- E. coli primase. Embedded within the catalytic sin,asdescribed(Touguetal.,1994),andcharacter- domain is a TOPRIM fold that is similar to that izing the resulting fragments by N-terminal seen in the topoisomerases, con®rming the striking sequencingandmassspectrometry(Figure1(a); conservation of this domain among functionally data not shown). A stable fragment comprising different replication proteins. The structural simi- residues 111-429 was identi®ed, corresponding to larity between the TOPRIM domains of primase thecentralcatalyticregionofprimase(Mustaev& andtopoisomeraseVI(Nicholsetal.,1999)allows Godson,1995).ThestructureofE.coliprimasecat- us to identify a metal- in the primase alytic core was determined at 2.9 AÊ resolution catalytic domain, and thereby to locate the likely using multiwavelength anomalous dispersion site for addition to a growing primer (MAD) and a single crystal of selenomethionyl chain. Combined with analysis of surface features substitutedprotein(Table1). of the primase this has enabled a hypothesis to be The crystallographic asymmetric unit contains generated regarding the interaction of the single- ®ve molecules of primase, which are arranged in a

Figure 1. The structure of the E. coli primase catalytic domain. (a) A diagram of the domain organization of full- length primase. The residue numbers corresponding to the domain boundaries in the E. coli protein are indicated. (b) Ribbon representation of the structure of the primase catalytic domain. The orientation shown here is similar to that used in most of the Figures in the paper. The b strands are marked with the red numbers (1-12) and a-helices with black letters (A-N). The circles indicate the boundaries of the TOPRIM sub-domain. Structure of E. coli Primase Catalytic Core 355

Table 1. Data collection and re®nement statistics

A. MAD phasing

Energies (eV) No. of reflections (total/unique) Completeness (%) Rsym l1 12,570 273,234/40,874 97.6 (90.8) 5.8 (20.7) l2 12,660 276,321/40,801 97.9 (93.2) 5.8 (21.2) l3 12,664 279,034/40,836 98.0 (93.7) 7.5 (23.4) l4 12,845 279,399/40,704 98.5 (97.4) 6.2 (22.2) Mean overall figure of merit (15.0 AÊ -2.9 AÊ ) (centric/acentric)ˆ0.51/0.58

B. Refinement and stereochemical statistics R-value (%) 21.9 Free R-value (%) 27.6 AverageB-factorsa(AÊ2) Main-chain 46.81 Side-chains 50.95 Whole 48.82 RMS deviations Bonds (AÊ ) 0.0114 Angles (deg.) 1.414

Rsym ˆ 100  ÆjI hIij/ÆI, where I is the integrated intensity of a given re¯ection. For Rsym and completeness, numbers in parentheses refer to data in the highest resolution shell. Figure of merit ˆhjÆP(a)eia/ÆP(a)ji, where a is the phase and P(a) is the phase probability distribution.

R-value ˆ ÆjFp Fp(calc)j/ÆFp, where Fp is the structure factor amplitude. The free R-value is the R-value for a 10 % subset of the data that was not included in the crystallographic re®nement. a Average B-factors for the ®ve molecules in the asymmetric unit.

helical spiral. The ®ve molecules differ slightly in their molecular architecture and their enzymatic their inter-domain orientation, but the structures of mechanisms(Limaetal.,1994;Bergeretal.,1996; the individual domains are virtually unchanged. Nicholsetal.,1999).Nevertheless, the three-dimen- This assembly is unlikely to be of physiological sig- sional structures of both type IA and type II topoi- ni®cance, but it is of interest, since it originates somerases have a small core region in common, from the interaction of an arginine residue side- which corresponds to theTOPRIMdomain chain (residue 299) in one protein with the putative (Aravindetal., 1998;Bergeretal.,1998).The fold metal-binding site of another. This arrangement of the TOPRIM domains in these proteins positions the guanidium group of the arginine resi- resembles a Rossmann-like nucleotide-binding due such that it mimics the metal observed in the fold, with a central b-sheet formed by four parallel TOPRIM domain of topoisomerase VI (not shown). b-strands, ¯anked by three a-helices. Both type IA Our attempts to soak into this crystal and type II topoisomerases contain a resi- form have been unsuccessful, almost certainly due that forms a covalent linkage with DNA because the addition of divalent metal ions will during one stage of the topoisomerase reaction disrupt this interaction between molecules in the cycle. This tyrosine residue is not part of the crystal. TOPRIM domain, but is positioned near it during The polypeptide chain of the catalytic domain one or more of the conformational intermediates folds into three consecutive sub-domains, with no that occur during DNA strand exchange. cross-back connections. The TOPRIM sub-domain As in the comparison between the structures of is at the center of the structure, and is ¯anked on type IA and type II topoisomerases, the architec- one side by an a/b N-terminal domain that con- ture of the catalytic domain of primase is similar to tains an anti-parallel b-sheet and several helices, that of the topoisomerases only in this restricted andontheothersidebyahelicalC-terminalsub- region(Figure3(a)).Basedonthestructural domain(Figure1(b)).Theoverall shape of the mol- elements that are common to these proteins, the ecule is that of a crescent, with the conserved TOPRIM domain in primase is de®ned to extend acidic residues of the TOPRIM domain being from residues 259 to 341. The TOPRIM fold in pri- located at the base of a concave depression on the mases is more appropriately referred to as a sub- inner surface of the crescent. The high degree of domain that is embedded within the larger cataly- sequence conservation that is seen between various tic domain, since it has extensive hydrophobic DnaG-type primases indicates that the structure interfaces with the N and C-terminal sub-domains. described here will be a good model for the cataly- The structural elements that make up the TOPRIM tic domains of the DnaG family members sub-domain include b-strands 8-11, which form a (Figure2). parallel b-sheet, and helices aG, aH and aI, which pack against this sheet in a manner that resembles The TOPRIM domain the Rossmann fold. The inferred location of the of the primase, which is within the Type I and type II DNA topoisomerases are fun- TOPRIM domain (see below), overlaps closely with damentally different from each other in terms of the binding site for dinucleotide phosphates in 356 Structure of E. coli Primase Catalytic Core

Figure 2. Sequence alignment of the catalytic domains of bacterial primases of the DnaG family. Ec, E. coli; St, Sal- monella typhimurium; Hi, Haemophilus in¯uenzae; Bs, Bacillus subtilis; Lm, Listeria monocytogenes; Rp, Rickettsia prowaza- keii. The residues are highlighted according to their similarity, from white (40 % or lower sequence similarity) to dark green for 100 % similarity. The numbering is of the E. coli protein. Several regions of sequence conservation that had been noted previously (but not discussed further here) are indicated by boxes. Cyan boxes designate the conserved motifsinbacterialandbacteriophageprimases(Ilyinaetal.,1992).Theso-calledRNAP(RNA-polymerase)regionis marked by a pink box, but note that the structure presented here reveals no similarity to other RNA polymerases. Thebacterialsignaturesequence(Versalovic&Lupski,1993)isindicatedbyamagentabox.Theconservedacidic residues are marked by black stars. The secondary structure elements for E. coli primase are represented by cylinders for helices and arrows for strands. The elements colored in red represent the TOPRIM region. Alignments were obtainedusingCLUSTALX(Thompsonetal.,1997)andeditedbyhandtomatchthealignmentof(Ilyinaetal.,1992). The sequence similarity levels used for coloring the alignment were calculated by averaging the similarity scores at each position of all possible pairs of sequences (David Jeruzalmi, unpublished software). The level of equivalence of non-identicalresidueswasestablishedbyuseoftheBLOSUM62aminoacidsubstitutionmatrix(Henikoff& Henikoff,1993).

Rossman fold-containing proteins, such as alcohol topoisomerase TOPRIM domains are identical in (not shown). primase, the fractional sequence identity is higher When the TOPRIM sub-domain of primase is when the smaller primase sequence is used as a aligned structurally with the corresponding regions reference, with 18 %, 14 % and 16 % of the residues of topoisomerases IA, II and VI the rms deviation in the primase TOPRIM domain being conserved in Ca positions is 2.0 AÊ within the a helices and identically in topoisomerase IA, topoisomerase II b-strands. This is a relatively high degree of struc- and topoisomerase VI, respectively. tural conservation, given that the level of sequence Only ®ve residues are strictly conserved across identity between the primase and topoisomerase allTOPRIMsequences(Figure3(c)).Twoofthese TOPRIM domains is low. While only 10 % of the are glycine residues that are likely to play a struc- residues within individual members of the larger tural role. The other three are acidic residues Structure of E. coli Primase Catalytic Core 357

Figure 3. Structural alignment of the TOPRIM domains. (a) Backbone structure of the TOPRIM domains. The indi- vidualdomainsareindicatedasfollows:primase,E.coliprimase;TopoI,E.colitopoisomeraseI(Limaetal.,1994); TopoII,yeasttopoisomeraseII(Bergeretal.,1996);TopoVI,M.jannaschiitopoisomeraseVI(Nicholsetal.,1999).The alignmentwasperformedbytheDALIserver(Holm&Sander,1993).(b)Theconservedacidicside-chainsinthe TOPRIM domains, colored as in (a). (c) Sequence alignment based on the structural alignment of TOPRIM domains.

(Glu265, Asp309 and Asp311 in E. coli primase) The manner in which TOPRIM domains interact that are present in two conserved sequence motifs. with substrates is unknown. One clue is provided The importance of these acidic residues has been by the structure of topoisomerase VI, a type II documentedfortheprimases(Drachevaetal., topoisomerase, in which Mg2‡ is seen to be coordi- 1995;Stracketal.,1992),aswellasfortopoisome- nated by the three conserved acidic residues of the rases(Chen&Wang,1998;Zhuetal.,1998). TOPRIMdomain(Nicholsetal.,1999).Theacidic 358 Structure of E. coli Primase Catalytic Core side-chains of all the TOPRIM domains of known characteristicoftheTOPRIMdomain(Figures1 structuresuperimposeclosely(Figure3(b)),andit and3).Whileourstructureofprimasecontainsno is likely that they all share a conserved metal-bind- bound metal ions, we infer that these three resi- ing function. However, the Rossmann-fold archi- dues will coordinate divalent metal ions based tecture of the TOPRIM domain, and the manner in upon the similarity to the structure of topoisome- which it coordinates metal ions does not resemble raseVI,whichdoeshavemetalionboundtoits any feature of DNA or RNA polymerases of activesite(Nicholsetal.,1999)(Figure3(b)).Itis known structure. known that primases require Mg2‡ for their activity(Urlacher&Griep,1995),anditislikely The primase catalytic site and implications for that the polymerization reaction is catalyzed in a DNA/RNA binding metal-facilitated manner, as is the case for other DNAandRNApolymerases(Steitz,1998)aswell The structure of the primase catalytic core has as for the topoisomerases IA and topoisomerases II been determined in the absence of any substrates. (Wang,1996;Zhuetal.,1998).Thelocationofthe Nevertheless, several features of the structure three conserved acidic residues thus allows us to reveal the likely location of the polymerase active site. A large groove is formed at the center of the localize the site where the phosphate transfer reac- concavesurfaceofthecatalyticcore(Figure4). tionislikelytooccurinprimase(Figure4(b)). This groove is encircled by residues that are very It might be expected that the catalytic center of highly conserved among DnaG-type primases, and primase would contain binding sites for one or these constitute the only such patch of extremely more nucleotides that act as substrates for the for- high sequence conservation on the surface of the mation of the ®rst phosphodiester linkage in the molecule(Figure4(a)).Inadditiontothesequence nascent RNA strand. A notable feature of the pri- conservation, electrostatic features of this surface mase surface is a relatively deep and largely groove are strongly suggestive of how the DNA hydrophobic pocket that is adjacent to the putative template and the RNA-DNA hybrid duplex might metal binding site and that is ringed by very be bound by the primase catalytic core. highlyconservedside-chains(Figure4(a)).This One corner of this groove contains several con- pocketislargeenoughtoaccommodateapurine served acidic residues, including the three that are base(Figure5(b)).Thelocationofthepocketis

Figure 4. Molecular surface of the primase catalytic domain. (a) Distribution of conserved residues among the DnaG-family of primases. The surface is colored in ramp from white to green, with white representing sequence iden- tity (calculated as described in the legendtoFigure2)of40%or lower, and darkest green corre- sponding to regions with 100 % sequence identity, according to the sequence similarity (40-100 %) cal- culatedasinFigure2.(b)Theelec- trostatic potential mapped onto the molecular surface, calculated using GRASP(Nichollsetal.,1991).The greensphererepresentsthelocation of the Mg2‡ in the structure of the TOPRIM domain of topoisomerase VI, which was superimposed onto the TOPRIM domain of primase in order to position the metal ion (Nicholsetal.,1999).Theblackcir- cle marks the potential site of cata- lysis and the yellow ellipses mark potential binding sites for the phos- phatebackboneofDNAorRNA (seeFigure5(a)).Bothsurfacesare in the standard view (shown in Figure1). Structure of E. coli Primase Catalytic Core 359

Figure 5. Potential modes of interaction between the primase and nucleic acids. (a) A speculative model for how a newly synthesized RNA-DNA hybrid might interact with the primase. The molecular surface of the primase catalytic domain is shown, colored accord- ing to electrostatic potential as in Figure4(b).AnA-formdouble helix has been placed so as to locate the end of one strand near the metal-binding site, indicated by the yellow ellipse. The shape of the central groove is such that the double helix can extend only up or down along the surface. The groove narrows towards the top, and so the helix was extended downwards. Note that the phos- phate backbone of the template strand (blue) runs close to surface regions with positive electrostatic potential (blue). The single- stranded template is extended upwards (broken line) along a region of positive electrostatic potential, generated by highly con- servedresidues(seeFigure4(a)). The model shown here is not based on any kind of energetic analysis, but represents instead a simple hypothesis that awaits experimen- tal testing. (b) A potential binding site for nucleotides in the TOPRIM domain. A cross-sectional view of the primase is shown, rotated approximately 90o with respect to the view in (a). A deep pocket, lined by conserved surface residues (seeFigure4(a))canbeseen.Thispocketisadjacenttothemetal-bindingsite,whichisdirectlybehinditinthis view. The pocket is large enough to accommodate a purine base, and an adenine residue is shown here merely to illustrate this fact. If such an interaction were to correspond to an actual binding mode for an adenine nucleotide in the initiation of primer synthesis, it would have to swing out of this pocket in order to base-pair with the comp- lementary base in the template strand.

such that the phosphate groups of the nucleotide tially in one direction only. Upward extension of bound at the site might be able to coordinate metal thedoublehelix(intheorientationofFigure5(a)) ions at the adjacent acidic site. Such an interaction is prevented by a narrowing of the surface groove would require that the base swing out of the pock- due to the juxtaposition of elements from the N- et in order to base-pair with the complementary terminal and TOPRIM sub-domains. Lateral move- nucleotide in the template strand. ments of the helix are restricted by the anti-parallel In order to visualize how DNA and RNA b-sheet of the N-terminal domain on one side, and strandsmightinteractwiththesurfaceofprimase, elements of the TOPRIM domain on the other. we carried out the following simple modeling exer- These form a cup-shaped depression into which cise(Figure5(a)).AnA-formdoublehelix,corre- the A-form double helix can be placed nicely. sponding to an RNA-DNA hybrid, was placed on The side of the groove opposite to the metal- the surface of the primase such that one of the binding site is lined by several conserved basic phosphate linkages in one strand was close to the residues, and these appear to be positioned for metal-binding site (this marks the site of nucleotide favorable interaction with the phosphate backbone addition to the growing RNA strand). It was of the template DNA strand in the model immediately apparent that the surface groove is so (Figure5(a)).Residuesintheupperpatchcould constructed that the path of the duplex is con- interact with single-stranded DNA, which is pro- strained to move away from the active site essen- posed to extend upwards, away from the active 360 Structure of E. coli Primase Catalytic Core site. The N-terminal residue of the catalytic core is Conclusions on the surface that is distal to the proposed active- sitegroove(Figure1(b)).Thissuggeststhatthe Primases are DNA-dependent RNA poly- zinc-binding domain, which is thought to interact merases, and it seems remarkable to us that their withsingle-strandedDNA(Pan&Wigley,2000), catalytic centers are closely related to that of topoi- might be placed behind the catalytic core (with somerases and not to RNA or DNA polymerases. referencetothestandardorientationinFigure5(a)). Almost all DNA polymerases (as well as some It is not known how the domain interacts with RNA polymerases) share a common architecture at DNA, and whether its interaction with the catalytic their catalytic centers, which involves the presen- domain places any particular constraints on the tation of metal-coordinating acidic residues by a b- direction of approach of the template. sheetplatformknownasthepalm(Steitz,1998). Primases usually synthesize short RNA primers, The structure of a bacterial RNA polymerase has that are often less than ten nucleotides long been determined, and while this has a metal- (Kornberg&Baker,1991).Thus,onemayexpect coordination site at its catalytic center, the organiz- that certain features of the structure of the catalytic ation of the protein is unrelated to either the DNA domain of the primase may have a bearing on the polymerasesortoprimase(Zhangetal.,1999).The connectionbetweenprimasesandtopoisomerases length of primer synthesis. The proposed inter- which is established by the identi®cation of action between the DNA-RNA duplex and the sur- TOPRIM domains in these two crucial sets of repli- face of the catalytic domain of the primase cation proteins suggests that diverse proteins involves a small area, and therefore does not seem involved in replication and repair, including the likely to confer a high level of processivity upon primase RNA polymerases, have originated from a the primase. The interaction between the primase common ancestor that is unrelated to DNA poly- and the newly synthesized primer is limited to less merases and the transcriptional RNA polymerases. than about ten nucleotides, and there are no appar- The impressive accuracy of the sequence compari- ent mechanisms for trapping the duplex on the sonsbyKooninandco-workers(Aravindetal., surface. As a consequence, it is likely that the pri- 1998),which®rstshowedthecommonalityofthe mer may disengage from the template after the TOPRIM domain, bodes well for the ¯ow of synthesis of just a few nucleotides. The interaction insights from the further elucidation of highly of primase with the DnaB may help pri- divergent patterns of protein evolution to continue. mase remain associated with DNA, but such an In conclusion, the structure of the catalytic core interaction could have consequences for the lengths of E. coli primase shows a novel fold for a nucleic of the primers that are synthesized, as discussed acid polymerase. A centrally placed Rossman fold, below. the TOPRIM domain, helps form a surface groove One intriguing consequence of the predicted that we have tentatively identi®ed as the channel interaction between the DNA-RNA duplex and the for binding double-helical RNA-DNA hybrid. The primase concerns potential restrictions on the con- groove includes a putative active-site cleft that con- tinued synthesis of RNA by primase. Our model tains a conserved cluster of acidic residues, suggests that the hybrid DNA-RNA duplex is suggesting that metal ions play a crucial role in the extruded by the catalytic domain along the path phosphotransfer reaction. It is anticipated that the indicatedinFigure5(a)suchthatitemergesfrom structural results reported here will enable rapid the surface of the primase catalytic domain more progress to now be made in obtaining a complete or less perpendicular to a rather ¯at edge of the understanding of the mechanism of primase domain. This ¯at surface might abut other proteins action. of the replication complex, and thereby affect the ability of primase to continue synthesis. In many cases, primases are recruited to single-stranded Materials and Methods DNA by , such as the DnaB helicase of E. coli, and the primases interact with the helicases Purification and crystallization either covalently or non-covalently via the C-term- The catalytic core of E. coli primase (residues 111 to inal region of the primase catalytic domain. The 429) was sub-cloned into the pPROEXHTa expression positioning of a helicase below the primase (with vector, using the SfoI and NotI restriction sites, and referencetotheorientationshowninFigure5) expressed in E. coli BL21 cells. The resulting protein con- might enforce a length restriction on the newly tains a histidine tag connected to the N terminus of the synthesized primer that is set by the extent to primase domain by a cleavage site for the tobacco etch which the linkage between the helicase and the pri- virus(TEV)protease(Parksetal.,1994).Thecellswere mase can ¯ex. The geometry of the predicted inter- lysed in 20 mM Tris-HCl (pH 8.0), 300 mM NaCl, action between the RNA-DNA duplex and the 10 mM imidazole, containing 5 mM b-mercaptoethanol surface of the primase catalytic domain is such that and 1 mM PMSF, using an Emulsi¯ex French press (Avestin). The soluble lysate was applied to Ni-NTA up to ten nucleotides of the primer could poten- (Qiagen) column to isolate the histidine-tagged fragment. tially emerge from the primase active site without After the treatment with TEV protease, the tag-free pro- running into a protein that would abut the lower tein was ¯owed over the Ni-NTA column again to edge of the primase. remove uncleaved fusion protein, and was further puri- Structure of E. coli Primase Catalytic Core 361

®ed by gel ®ltration over a Superdex S75 column (Phar- Protein Data Bank accession number macia) equilibrated in 20 mM Tris-HCl (pH 7.5), 50 mM NaCl, and 1 mM DTT. The coordinates have been deposited with the RCSB Crystals of the catalytic core were prepared using Protein Data Bank and have the accession code 1EQN. hanging drop vapor diffusion by mixing equal volumes of a 15 mg/ml protein solution and a reservoir solution containing 16-19 % PEG 4000 (Fluka), 0.053-0.063 mM Acknowledgments Tris (pH 7.9-8.5) and 0.11-0.13 M sodium acetate, and then equilibrating over 0.5 ml reservoir at 20 C for We thank Jeffrey Bonanno, David Jeruzalmi and Hiro- about one week. Crystals form in space group P212121 to Yamaguchi for valuable assistance and discussions. with unit dimensions of a ˆ 62.7 AÊ , b ˆ 107.6 AÊ , We thank the staff at the X4A beamline at the National c ˆ 263.1 AÊ . There are ®ve molecules of p36 fragment Synchrotron Light Source for assistance with synchrotron per asymmetric unit, with a solvent content of approxi- data collection. This work was supported in part by a mately 50 %. A selenomethionine-substituted primase grant from National Institutes of Health (GM38839; MO). fragment (with nine methionine residues per molecule) was also produced and crystallized as for the native pro- tein, except for an additional 5 mM DTT in the crystalli- zation solution. Complete incorporation of References selenomethionine was con®rmed by mass spectrometry Aravind, L., Leipe, D. D. & Koonin, E. V. (1998). Toprim (data not shown). ± a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucl. Acids Res. 26, 4205-4213. Berger, J. M., Fass, D., Wang, J. C. & Harrison, S. C. (1998). Structural similarities between topoisome- Structure determination rases that cleave one or both DNA strands. Proc. Natl Acad. Sci. USA, 95, 7876-7881. Crystals of the catalytic core were transferred to stabil- Berger, J. M., Gamblin, S. J., Harrison, S. C. & Wang, ization solution containing 22.35 % PEG 4000, 0.075 M J. C. (1996). Structure and mechanism of DNA Tris (pH 8.3), 0.15 M sodium acetate, 5 mM DTT and topoisomerase. Nature, 379, 225-232. 20 % (v/v) ethlyene glycol. Multiwavelength anomalous BruÈ nger, A. T., Adams, P. D., Clore, G. M., DeLano, dispersion (MAD) data were collected on one seleno- W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S., methionyl-derivatized crystal at four wavelengths at Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., beamline X4A of the National Synchrotron Light Source Rice, L. M., Simonson, T. & Warren, G. L. (1998). (Brookhaven,NY)(Table1).Alldatawereintegrated Crystallography and NMR system: a new software and reduced with DENZO/SCALEPACK programs suite for macromolecular structure determination. (Otwinowski&Minor,1997).Thescaledandreduced Acta Crystallog. sect. D, 54, 905-921. intensity data were converted to amplitudes using Callaborative Computational Project No. 4 (1994). The TRUNCATE(CCP4,1994),andcross-wavelengthscaling CCP4 suite: programs for protein crystallography. was performed using FHSCALE (CCP4, 1994), treating Acta Crystallog. sect. D, 50, 760-763. l2asnative(Table1).Outof45seleniumsites,22were Chen, S. J. & Wang, J. C. (1998). Identi®cation of active- foundbySOLVE(Terwilliger&Berendzen,1995)and site residues in Escherichia coli DNA topoisomerase the rest by difference Fourier maps. Phase calculation I. J. Biol. Chem. 273, 6050-6056. and heavy-atom re®nement were performed in Dracheva, S., Koonin, E. V. & Crute, J. J. (1995). Identi®- MLPHARE (CCP4, 1994), and map modi®cations were cation of the primase active site of the herpes sim- carried out using DM (CCP4, 1994) and MAMA plex virus type 1 helicase-primase. J. Biol. Chem. (Kleywegt&Read,1998).Automatedsolvent¯attening 270, 14148-14153. resulted in an electron density map of exceptional qual- Henikoff, S. & Henikoff, J. G. (1993). Performance evalu- ity into which a nearly complete model could be built ation of substitution matrices. Proteins: usingtheprogramO(Jonesetal.,1991). Struct. Funct. Genet. 17, 49-61. The primase catalytic core consists of three domains Holm, L. & Sander, C. (1993). Protein structure compari- (residues 120-241, 242-367, 368-415), and the inter-domain son by alignment of distance matrices. J. Mol. Biol. orientations are slightly, but signi®cantly, different in 233, 123-138. each of the ®ve molecules in the asymmetric unit. Re®ne- Ilyina, T. V., Gorbalenya, A. E. & Koonin, E. V. (1992). mentofthemodelbyCNS(BruÈngeretal.,1998)pro- Organization and evolution of bacterial and bac- ceeded by utilizing non-crystallographic symmetry (NCS) teriophage primase-helicase systems. J. Mol. Evol. restraints for each of the three domains separately. The 34, 351-357. structure factors measured to 2.9 AÊ ,atl1, for the seleno- Jones, T. A., Zou, J. Y., Cowan, S. W. & Kjeldgaard, M. methionine crystals were used as the native data set for (1991). Improved methods for building protein re®nement. The ®nal re®ned model has a crystallographic models in electron density maps and the location of R-value of 21.9 %, a free R-value of 27.6 % and is well errors in these models. Acta Crystallog sect. A, 47, de®ned for the residues 113-421. No electron density 110-119. could be seen for the residues 111, 112 and 429, or for the Keck, J. L., Roche, D. D., Lynch, A. S. & Berger, J. M. residues 177, 193, 237, 287, 288, 364 and 422-428, only the (2000). Structure of the RNA polymerase domain of main chain could be traced. The geometry was analyzed E. coli primase. Science, 287, 2482-2486. usingPROCHECK(Laskowskietal.,1993),with88%of Kleywegt, G. J. & Read, R. J. (1998). Not your average the residues lying in the most favored regions of the density. Structure, 5, 1557-1569. Ramachandran plot, and no residues in disallowed Kornberg, A. & Baker, T. A. (1991). DNA Replication, regions. 2nd edit., W.H. Freeman and Company, New York. 362 Structure of E. coli Primase Catalytic Core

Laskowski, R. A., MacArthur, M. W., Moss, D. S. & i®ed within the primase domains of - Thornton, J. M. (1993). PROCHECK: a program to encoded I- and P-type DNA primases and the check the stereochemical quality of protein struc- alpha protein of the Escherichia coli satellite phage tures. J. Appl. Crystallog. 26, 283-291. P4. J. Biol. Chem. 267, 13062-13072. Leipe, D. D., Aravind, L. & Koonin, E. V. (1999). Did Terwilliger, T. C. & Berendzen, J. (1995). Difference DNA replication evolve twice independently? Nucl. re®nement: obtaining differences between two Acids Res. 27, 3389-3401. related structures. Acta. Crystallog. sect. D, 51, 609- Lima, C. D., Wang, J. C. & Mondragon, A. (1994). 618. Three-dimensional structure of the 67 K N-terminal Thompson, J. D., Gibson, T. J., Plewniak, F., fragment of E. coli DNA topoisomerase I. Nature, Jeanmougin, F. & Higgins, D. G. (1997). The CLUS- 367, 138-146. TAL X windows interface: ¯exible strategies for McMacken, R. & Kornberg, A. (1978). A multienzyme system for priming the replication of phiX174 viral multiple sequence alignment aided by quality anal- DNA. J. Biol. Chem. 253, 3313-3319. ysis tools. Nucl. Acids Res. 25, 4876-4882. Mustaev, A. A. & Godson, G. N. (1995). Studies of the Tougu, K. & Marians, K. J. (1996). The interaction functional topography of the catalytic center of between helicase and primase sets the replication Escherichia coli primase. J. Biol. Chem. 270, 15711- fork clock. J. Biol. Chem. 271, 21398-21405. 15718. Tougu, K., Peng, H. & Marians, K. J. (1994). Identi®- Nicholls, A., Sharp, K. A. & Honig, B. (1991). Protein cation of a domain of Escherichia coli primase folding and association: insights from the interfacial required for functional interaction with the DnaB and thermodynamic properties of hydrocarbons. helicase at the replication fork. J. Biol. Chem. 269, Proteins: Struct. Funct. Genet. 11, 281-296. 4675-4682. Nichols, M. D., DeAngelis, K., Keck, J. L. & Berger, J. M. Urlacher, T. M. & Griep, M. A. (1995). Magnesium acet- (1999). Structure and function of an archaeal topo- ate induces a conformational change in Escherichia VI subunit with homology to the meiotic coli primase. Biochemistry, 34, 16708-16714. recombination factor . EMBO J. 18, 6177-6188. Versalovic, J. & Lupski, J. R. (1993). The Haemophilus Otwinowski, Z. & Minor, W. (1997). Processing of X-ray in¯uenzae dnaG sequence and conserved bacterial diffraction data collected in oscillation mode. primase motifs. Gene, 136, 281-286. Methods Enzymol. 276, 307-326. Wang, J. C. (1996). DNA topoisomerases. Annu. Rev. Bio- Pan, H. & Wigley, D. B. (2000). Structure of the zinc- chem. 65, 635-692. binding domain of Bacillus stearothermophilus DNA Zhang, G., Campbell, E. A., Minakhin, L., Richter, C., primase. Structure, 8, 231-239. Severinov, K. & Darst, S. A. (1999). Crystal struc- Parks, T. D., Leuther, K. K., Howard, E. D., Johnston, ture of Thermus aquaticus core RNA polymerase at S. A. & Dougherty, W. G. (1994). Release of pro- Ê teins and peptides from fusion proteins using a 3.3 A resolution. Cell, 98, 811-824. recombinant plant virus proteinase. Annu. Rev. Zhu, C. X., Roche, C. J., Papanicolaou, N., Biochem. 216, 413-417. DiPietrantonio, A. & Tse-Dinh, Y. C. (1998). Site- Steitz, T. A. (1998). A mechanism for all polymerases. directed mutagenesis of conserved aspartates, gluta- Nature, 391, 231-232. mates and arginines in the active site region of Strack, B., Lessl, M., Calendar, R. & Lanka, E. (1992). Escherichia coli DNA topoisomerase I. J. Biol. Chem. A common sequence motif, -E-G-Y-A-T-A-, ident- 273, 8783-8789.

Edited by P. E. Wright

(Received 3 April 2000; received in revised form 28 April 2000; accepted 28 April 2000)