View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Structure, Vol. 10, 215–224, February, 2002, 2002 Elsevier Science Ltd. All rights reserved. PII S0969-2126(02)00698-6 Crystal Structure of a Novel from the Hyperthermophilic Archaeon Pyrococcus furiosus

Joseph W. Arndt,1 Bing Hao,1 peptidase for the C-terminal amino acid ladder sequenc- Vijay Ramakrishnan,2 Timothy Cheng,2 ing of peptides and proteins by matrix-assisted laser Sunney I. Chan,2,3 and Michael K. Chan1,3 desorption ionization time-of-flight mass spectrometry 1 Departments of Chemistry and Biochemistry (MALDI-TOF MS) [6]. The Ohio State University Early studies of the thermophilic Thermus aquaticus 484 West 12th Avenue carboxypeptidase (TaqCP) revealed distinct differences Columbus, Ohio 43210 from any known carboxypeptidase. Biochemical char- 2 Noyes Laboratories 127-72 acterization of the TaqCP demonstrated that it had a Division of Chemistry and Chemical Engineering novel size and sequence and an unexpected metal bind- California Institute of Technology ing motif [3]. TaqCP was found to be missing the

Pasadena, California 91125 HXXE(X)123–132H motif characteristic of classical metallo- (CPA, CPB, and CPT) and instead contained the HEXXH motif [4, 5] common to other fami- Summary lies of metalloproteases ( and deformylase) [9, 10] but not previously observed in a carboxypepti- The structure of Pyrococcus furiosus carboxypepti- dase. These features were also later found in the hy- dase (PfuCP) has been determined to 2.2 A˚ resolution perthermophilic P. furiosus carboxypeptidase, which using multiwavelength anomalous diffraction (MAD) shares 37% sequence identity with TaqCP. methods. PfuCP represents the first structure of the Pyrococcus furiosus carboxypeptidase (PfuCP) has new M32 family of carboxypeptidases. The overall struc- one of the highest optimal activity temperatures (90ЊC– ture is comprised of a homodimer. Each subunit is 100ЊC) studied to date. It was purified to homogeneity mostly helical with its most pronounced feature being by Cheng et al. and shown to be an unusual metallopro- a deep substrate binding groove. The lies tease in that the zinc-bound form is inactive, while bind- at the bottom of this groove and contains an HEXXH ing of other metals, such as cobalt, promotes its cata- motif that coordinates the metal ion required for catal- lytic activity [6]. PfuCP has a broad substrate specificity ysis. Surprisingly, the structure is similar to the re- that includes basic, aromatic, neutral, and polar amino cently reported rat neurolysin. Comparison of these acids but is unable to digest peptides with C-terminal structures as well as sequence analyses with other glycine, proline, or acid residues (aspartate and gluta- homologous proteins reveal several conserved resi- mate). Based on SDS-PAGE, gel filtration, and MALDI- dues. The roles for these conserved residues in the TOF MS, PfuCP was found to be a homodimer of 59 catalytic mechanism are inferred based on modeling kDa subunits [6]. and their location. A structural investigation of PfuCP was undertaken to classify this novel type of carboxypeptidase and to elucidate the role of various structural elements in pro- Introduction moting its catalytic activity. Here, we present structures of three different forms of PfuCP that provide a clear Carboxypeptidases hydrolyze peptide bonds from the picture of the overall aggregation state and fold and C terminus of peptides and proteins. Metallocarboxy- that aid in identifying the metal binding active site. We peptidases, which comprise the largest class of car- propose a mechanism, based on a computational model boxypeptidase, are ubiquitous in nature and typically of substrate bound to the active site and from the loca- have a single zinc ion bound to the active site. Different tions of critical residues conserved in the overall family. metallocarboxypeptidases can be classified by their substrate specificities. For instance, carboxypeptidase Results and Discussion A (CPA) is specific for neutral, preferably hydrophobic amino acids, (CPB) is specific for Structure Description basic amino acids, and carboxypeptidase T (CPT) is Three different PfuCP structures have been determined specific for basic and hydrophobic amino acids [1, 2]. from two different crystal forms. In all, the overall protein Most of the carboxypeptidases studied to date are mes- fold is primarily helical, except for a three-stranded ␤ ophilic , but several thermostable carboxypep- sheet near each active site (␣ helix: 68%, ␤ strand: 3.2%; tidases have been purified and characterized [3–8]. In see Experimental Procedures). In each case, as predicted the biotechnological industry, there is growing interest in from the biochemical studies [6], the structure appears developing applications for hyperthermophilic enzymes, as a dimer with an overall shape reminiscent of a butter- since they are often more stable in the presence of fly (Figure 1). The dimensions of the dimer are 100 A˚ ϫ severe environmental factors, including elevated tem- 65 A˚ ϫ 55 A˚ with an overall surface area of 34,800 A˚ 2 . peratures, denaturing agents, and organic solvents. One A total of 4400 A˚ 2 of surface area is buried between such application is the use of a thermostable carboxy-

3 Correspondence: [email protected] (M.K.C.), chans@ Key words: carboxypeptidase; crystal structure; HEXXH motif; M32 its.caltech.edu (S.I.C) family; metallopeptidase Structure 216

Figure 1. Ribbons Diagram of PfuCP Homo- dimer in Stereo This figure was prepared using the programs MOLSCRIPT and Raster-3D [29, 35]. ␤ strands, red; ␣ helices, cyan; loops, green; active site metals, pink.

the dimer interface [11]. Data collection, phasing, and hydrophobic residues with more hydrophilic counter- refinement statistics are summarized in Table 1. parts and loss of one of the salt bridges in TaqCP. The dimer interface is primarily formed between the The two PfuCP subunits that form the dimer are ess- two N-terminal helices (␣1 and ␣2) from each subunit entially identical (root-mean-square deviation of C␣ that come together to form a stack of four helices. The atoms ϭ 0.473 A˚ ), having dimensions of 75 A˚ ϫ 55 A˚ ϫ residues that form this interface are primarily hydropho- 55 A˚ . Their distinguishing feature is a deep groove that bic (A22, I23, A26, V29, L30, P39, G42, V48, A49, L53, and transverses the length of the protein (Figure 3). This F286), although two pairs of salt bridges between R19 groove is approximately 40 A˚ long and 30 A˚ deep at its and E59 and D33 and R250 contribute to the dimer lowest point. The groove is widest at the open end, interaction. Some support for the importance of these which measures 15 A˚ (Figure 3, bottom). The other end residues in dimerization is provided by its sequence is closed, with a helical cap formed by helix ␣4 (Figure alignment with TaqCP (Figure 2). Unlike PfuCP, TaqCP 3, top). Each subunit can be divided into two domains, exists as a monomer in solution [3]. The alignment re- a left domain (L) and a larger right domain (R; Figure 3), veals several sequence differences at the dimer inter- based on its position relative to this substrate groove. face that likely contribute to the differences in the aggre- The first two N-terminal residues were not found and gation state, such as the substitution of some of the are likely disordered. The first secondary structure be-

Table 1. Crystallographic Data for PfuCP Data Collection Yb-PfuCP Pb-PfuCP apo-PfuCP Crystal (Protein Data Bank ID 1K9X) (Protein Data Bank ID 1KA4) (Protein Data Bank ID 1KA2)

Space group P1 C2 C2 Unit cell (A˚ ) a,b,cϭ 66.9, 88.3, 109.7 a, b, c ϭ 132.3, 67.6, 67.2 a, b, c ϭ 130.5, 67.1, 67.9 Unit cell (Њ) ␣, ␤, ␥ϭ88.2, 78.2, 69.3 ␤ϭ95.2 ␤ϭ95.9 Wavelength (A˚ ) 1.000 (␭1) 1.386 (␭2) 1.000 1.000 Resolution (A˚ ) 2.3 3.1 3.0 2.2 Unique reflections 98,094 40,069 14,993 29,834 Total reflections 353,793 196,056 164,344 147,392 Completeness (%)a 93.8 (80.5) 91.6 (72.5) 95.0 (89.4) 94.8 (95.1) a,b Rsym (%) 3.6 (8.7) 3.6 (4.9) 5.1 (21.7) 8.9 (27.8) Phasing Statistics

c d e Resolution (A˚ ) Number of Sites Phasing Power Mean FOM RKraut 3.1 11 2.9 (␭1), 2.79 (␭2) 0.62 0.036 (␭1), 0.010 (␭2) Refinement Statistics Rmsd Number of Atoms Mean B Factors (A˚ 2)

f g Resolution (A˚ )Rcryst Rfree Bond (A˚ ) Angle (Њ) Protein Water Metal Protein Solvent Yb-PfuCP 2.3 21.2 26.7 0.007 1.237 16,652 610 11 (Yb) 29.6 31.6 Pb-PfuCP 3.0 23.6 27.6 0.008 1.329 4163 83 1 (Pb) 54.5 38.3 apo-PfuCP 2.2 23.9 28.4 0.009 1.630 4163 109 1 (Mg) 42.5 38.0 a Values for the highest resolution shell are in parenthesis. b Rsym ϽIϾϭ⌺|I ϪϽIϾ|/⌺ϽIϾ, where I is the measured intensity of each reflection and ϽIϾ is the mean value for the reflection. c Phasing power (anomalous) ϭϽFH/⑀Ͼ, where ⑀ is lack of closure. d Mean FOM ϭϽcos⌬␣jϾ, where ⌬␣j is the phase angle error for phase angle j. e RKraut ϭ⌺⑀iso/⌺⌬iso for isomorphous differences and ⌺⑀ano/⌺⌬Bijvoet for anomalous differences, where ⑀iso and ⑀ano are the isomorphous and anomalous lack of closure, respectively, ⌬iso is the isomorphous differences, and ⌬Bijvoet is the Bijvoet difference. The summation is taken over all accentric reflections. f Rcrystal ϭ⌺||Fobs| Ϫ |Fcalc||/|Fobs|, where Fobs and Fcalc are the observed and calculated structure factor amplitudes, respectively. g Rfree was calculated on 10% of the reflections randomly omitted from the refinement. Figure 2. Sequence Alignment of the M32 Family of Metallocarboxypeptidases Sequences of PfuCP and 8 other carboxypeptidases of the M32 family were aligned with the program ClustalW [36]. This is followed by the structure-based alignment of rat neurolysin with the residues used in the superposition [15]. The numbering and secondary structure above the alignment corresponds to that of PfuCP. Residues that are conserved between all the homologs are in bold and highlighted in yellow. Motifs are indicated with colored lettering as follows: HPF, gray; DXRXT, green; HEXXH, blue; HESQ, brown; IRXXAD, violet; and GXXQDXHW, red. Accession numbers with percent identity to PfuCP are as follows: B. subtilus, p50848, 36%; T. aquaticus, p42663, 37%; Halobacterium sp., q9hmb7, 34%; D. radiodurans, q9rrr3, 33%; R. rickettsii, cac33677, 30%; L. major, cac37108, 33%; V. cholerae, q9ks46, 31%; R. loti, bab50714, 32%. Structure 218

Figure 3. PfuCP Subunit Structure (A) Ribbons diagram of a single PfuCP sub- unit as viewed along the substrate groove. Drawn in stereo (active site metal, pink). (B) Surface diagram of PfuCP subunit in stereo (negatively charged residues, red; positively charged residues, blue) modeled with 10-mer polyalanine substrate. (C) Rainbow stereo plot of the C␣ trace of PfuCP subunit (N terminus, blue; C terminus, red; active site metal, pink). Every twentieth residue is labeled. Figures were prepared us- ing the programs MOLSCRIPT, Raster-3D, and GRASP [11, 29, 35].

gins with a collection of three ␣ helices (␣1–␣3, residues followed by a three-stranded ␤ sheet (␣10, ␤1–␤3, resi- 7–75), which compose approximately one-half of the dues 209–253). The next set of ␣ helices (␣11–␣13) forms groove wall in the R domain. The following ␣ helix (␣4, the active site. The first (␣11, residues 261–278) contains residues 81–99) crosses over from the R domain to the the HEXXH motif with its two histidines (H269 and H273) L domain toward the top of the molecule, acting as a coordinated to the active site metal, and the last (␣13, cap to the top end of the groove. The groove wall of the residues 295–306) contains the glutamate residue (E299) L domain is made up primarily of the next three helices that serves as the third protein ligand. Following these (␣5–␣7, residues 102–163). A long ␣ helix (␣8, residues active site ␣ helices, the chain passes across the groove 169–192) forms the underside of the groove wall of the bottom in the form of three ␣ helices (␣14–␣16, residues L domain before crossing back to the R domain. The 313–343) and then back again to the left wall of the rest of the right groove wall is composed of a helix groove, which is completed by five ␣ helices (␣17–␣21, Hyperthermophilic Carboxypeptidase Structure 219

Figure 4. Electron Density of Active Site ˚ A 2.3 A 2Fo Ϫ Fc electron density map (1.0␴, blue; 4␴, magenta) superimposed on the final active site model of PfuCP (Yb-phased crys- tal). Carbon atoms, yellow; nitrogen, cyan; oxygen, red; active site metal, green (mod- eled as Mg2ϩ). Figure was prepared with Xtal- View and Raster-3D [28, 29].

residues 352–414). The C-terminal end of the polypep- clan of HEXXH-containing metalloproteases (Figure 2) tide finishes with a long ␣ helix (␣22, residues 419–441) [1]. In TaqCP, this would correspond to E306. Interest- that extends from the left wall to the bottom of the ingly, a weaker binding ligand is found in the structures groove, at which point the remaining ␣ helices (␣23–␣26, of unmetallated PfuCP and the Yb-PfuCP derivative. residues 443–499) complete the groove floor below the This ion is presumed to be Mg2ϩ, presumably from the active site. Based on the distance between the two ac- crystallization condition. In each of the three structures ,A˚ ), the two subunits are believed to act we have determined, the protein ligands (H269, H273 50ف) tive sites independently. and E299), the catalytic water, and the “proton shuttle” E270 adopt conformations similar to those found in other The Protein Active Site peptidases containing the HEXXH motif [12] (Figure 4). The HEXXH motif is a common motif found in numerous While the PfuCP structures clearly determine the loca- metalloproteases, including thermolysin, astacin, de- tion of metal active site, the identity of the native metal formylase, and the M1 [1, 9]. The first itself remains elusive. While most metallopeptidases be- identification of this motif in the PfuCP family of carboxy- longing to the HEXXH superfamily, including TaqCP, use peptidases was in TaqCP from a HEMGH sequence lo- a single Zn2ϩ ion as the active site metal, Cheng et al. cated at positions 276–280 [4]. This subsequently led found that PfuCP was activated by Co2ϩ ion and not to the placement of TaqCP in the newly formed M32 zinc [6]. More recently, PfuCP has been reconstituted family of metallopeptidases [1]. It was later demon- with Mn2ϩ with good activity (T.C., unpublished data). strated by mutagenesis studies that the two histidines Interestingly, these features are reminiscent of another of the HEXXH motif (H276 and H280) are two of the HEXXH metalloprotease, peptide deformylase, which ligands that bind the catalytic metal ion [5]. has been shown to contain ferrous ion as its native PfuCP has an HEFGH sequence at positions 269–273. metal. Here, the zinc-bound form was shown to be 100 The location of this motif places the active site at the times less active than the native protein containing iron bottom of the groove near the right wall just underneath [13], while the cobalt and nickel forms were found to the ␤ sheet of the loop connecting strands ␤1 and ␤2. have similar activities to those of wild-type. The origin Unexpectedly, this places the active site midway along of the metal-dependent differences in the activities of the groove rather than at its back, a feature more remi- these enzymes remains unresolved. niscent of an than an . While current data appear convincing that PfuCP is a Structural Similarity with Neurolysin carboxypeptidase, an additional endopeptidase activity Structure homology searches were performed with TOP, to a specific polypeptide cannot be ruled out. VAST, FUGUE, and DALI [14–17]. One hit was found with Consistent with the role of the HEXXH in metal ion DALI and FUGUE [16, 17], revealing that PfuCP has a binding, the structure of Pb-PfuCP reveals a Pb ion similar fold to Rattus norvegicus neurolysin (Protein bound to the two histidines of the HEXXH motif (His- Data Bank code 1I1I) [18]. This zinc-dependent endo- 269 and His-273). The third metal ion based on the struc- peptidase cleaves neurotensin, a 13-residue neuropep- ture is glutamate E299. Based on the sequence align- tide, between residues 10 and 11 [18]. PfuCP can be ment, this residue is conserved for the entire PfuCP superimposed on neurolysin using DALI with an rmsd family of carboxypeptidases, placing them in the MA of 2.1 A˚ for the 251 C␣ atoms used in the superposition. Structure 220

Figure 5. Structural Comparison with Neurolysin (A) Backbone trace of the overall structures of PfuCP in blue superimposed on neurolysin in red with conserved residues and active site metals (Protein Data Bank code 1I1I). (B) Active site comparison of PfuCP and neurolysin with proposed catalytic residues. Figures were prepared using the programs MOLSCRIPT and Raster-3D [29, 35].

This structure-based sequence alignment is provided in amino acid identity between PfuCP and neurolysin is Figure 2. Neurolysin (681 residues) is larger than PfuCP low (Ͻ16%), even based on the amino acids used in (499 residues) and has a number of extra structural ele- the structure superposition. Indeed, BLAST showed no ments. These include an additional ␣ helix at its N termi- sequence similarity between these two proteins [20]. nus (␣1) and a cork-like subdomain (␣7–␣8 and ␤1–␤2) Using the structure alignment, we find that eleven of the that has the overall effect of narrowing the bottom end conserved residues of the PfuCP family of carboxypepti- of the substrate groove as well as elongating the left dases are also conserved in neurolysin (PfuCP/neuro- groove wall (Figure 5A). The last major difference be- lysin: H238/425, H269/474, E270/475, G272/477, H273/ tween the two structures is that the ␤ sheet near the 478, E299/503, E306/510, H411/601, G418/608, Y423/613, active site is composed of five strands in neurolysin and G468/650; Figure 5A). Six of these residues are but only of three strands in PfuCP. In light of the clear proposed to play an important role in catalysis (Figures structural similarity of PfuCP and neurolysin, it seems 5B and 6). likely that PfuCP and neurolysin evolved from a common ancestor by divergent evolution, much like that pre- Sequence Homologs of PfuCP dicted for leucine and CPA [19]. A BLAST search [20] using the PfuCP sequence identi- Despite this high structural similarity, however, the fied 14 other members that span all three kingdoms of Hyperthermophilic Carboxypeptidase Structure 221

life: archaea, bacteria, and eucarya (see Supplemental Data [www.developmentalcell.com/cgi/content/full/2/ 2/*bxs/DC1]). The homology between PfuCP and each of the proteins is fairly high (31%–87% identity) and extends over the entire protein (Figure 2). This suggests that these sequence homologs share a common fold for which PfuCP provides the structural prototype. The paucity of these enzymes compared to the much larger family of CPA/CPB homologs is notable. This may sug- gest a very specific role for these enzymes in biology. In support of this, in many of the organisms that contain a PfuCP homolog, a gene appearing to belong to the much larger family of carboxypeptidases was also iden- tified (organism, PIR code: P. horikoshii, G71097 and C71119; D. radiodurans, A75364 and C75531; B. halo- durans, E83851; P. abyssi, A75042; B. subtilis, B70076, H69817, and E69640) [8]. These genes are homologs of the carboxypeptidase from S. solfataricus (CPS), which is itself a homolog of CPB. For those organisms where no homolog was found, the genome was found to be incomplete. The sequence alignment of PfuCP and its homologs Figure 6. Proposed Catalytic Mechanism (Figure 2) reveals five conserved amino acid sequences Schematic diagram of a possible catalytic mechanism for peptide in addition to the characteristic metalloprotease HEXXH binding and hydrolysis. motif. The sequences are as follows (the PfuCP sequence is used as a reference): an HPF sequence (residues 238–240) mostly on strand ␤2, a DXRXT sequence (resi- All HEXXH proteins reported to date have a similar dues 248–252) on strand ␤3, an HESQ sequence (resi- binding pocket around the active site with the substrate dues 298–301) on helix ␣13, an IRXXAD sequence binding in a similar orientation relative to the HEXXH (residues 350–355) on helix ␣17, and a GXXQDXHW se- motif. Indeed, the protein ligands H269 and H273 of the quence (residues 405–412) on the loop between helices HEXXH motif and E299 of the conserved HESQ signature ␣20 and ␣21. Roles for these motifs could be inferred sequence adopt conformations similar to those found from their location in the PfuCP structure by modeling in other HEXXH-containing . Based on this a 10-mer polyalanine substrate to the active site, based homology, a 10-mer polyalanine oligopeptide was mod- on the crystal structures of astacin and thermolysin in eled to the PfuCP active site using the locations of inhibi- complex with transition state analogs [10, 21]. The puta- tors bound in structures of astacin and thermolysin tran- tive functions of all these sequences in regard to the sition state inhibitor complexes as a guide [10, 21]. The catalysis will be discussed in our proposed mechanism oligopeptide was modeled as an extended ␤ strand that with the exception of the DXRXT sequence, which is far interacts with the ␤2 strand of the PfuCP protein in an removed from the active site and whose catalytic role, antiparallel orientation, forming a ladder of four hydro- if one exists, is unclear. gen bonds. This expands the ␤ sheet near the active site by an additional strand, as has been observed in astacin and thermolysin [10, 21]. Proposed Catalytic Mechanism From this orientation, a mechanism can be proposed The mechanism of catalysis of this family of carboxypep- for PfuCP that is similar to other HEXXH proteins (Figure tidase is unclear at this time. The fundamental issue is 6) [10, 21]. Here, the metal serves to stabilize a bound the direction of substrate approach. One possibility is water/hydroxide and/or activate the scissile carbonyl of based on the shape of the groove. The groove has open the substrate by serving as a Lewis acid. The active (Figure 3, bottom) and closed (Figure 3, top) ends. One site glutamate, E270 of the HEXXH motif, assists in the would therefore predict that, based on the greater ease nucleophilic attack of the activated water/hydroxide on in accessing the active site, the C-terminal end of pep- the carbonyl to form the tetrahedral intermediate, by tide substrates should approach from the open end. acting as a general acid/base that can shuttle the hydro- Unfortunately, this possibility disagrees with the direc- gen atom from the activated water to the scissile amide tion suggested by homology modeling studies with other nitrogen of the substrate. Protonation of this amide nitro- HEXXH metalloproteases. In all structures of HEXXH gen makes it a better leaving group, thereby facilitating metalloproteases bound to peptides, the peptide binds cleavage of the amide bond. to the metal site in the opposite orientation. Based on The H238 and P239 of the HPF signature sequence the orientation suggested by this homology model, the may play a role in orienting the amide nitrogen to accept C terminus would feed into the short gap between the this hydrogen and in stabilizing the proper transition closed end of the groove (helical cap ␣4) and the metal state. The carbonyl oxygen of P239 on strand ␤2 could active site. We currently favor the latter model and pres- hydrogen bond the scissile N-H of the substrate to posi- ent this below, but, clearly, additional studies will be tion it for protonation by the proton shuttle E270, as required to confirm this. does its counterparts in other HEXXH metalloproteases Structure 222

[9, 10, 21]. Upon protonation, H238 could serve a role reported here demonstrates that carboxypeptidases in transition state stabilization. Modeling studies reveal span all kingdoms of life: archaea, bacteria, and eucarya. that H238 can be positioned to serve as a hydrogen bond Based on the likely divergent evolution of rat neurolysin acceptor that would stabilize the tetrahedral transition and PfuCP, it seems plausible that other members of this state. H238 is conserved for all members of the PfuCP structural family could be identified by various structural family of carboxypeptidases as well as neurolysin (H425; genomics initiatives. Figure 5). Consistent with such a role, an asparagine (N112) in thermolysin is suggested to serve a similar role Experimental Procedures and is located at nearly the same position [10]. It should Purification and Crystallization be noted that modeling studies also reveal that H238 PfuCP was purified from P. furiosus as previously described [6]. For can be positioned to hydrogen bond the C-terminal car- the crystallization, PfuCP was concentrated to 19 mg/ml in 50 mM boxylate. Its presence in the endopeptidase neurolysin, Tris buffer (pH 8.0) with 10% glycerol. Several different crystal forms however, argues against the importance of this inter- were obtained using the hanging drop method at 4ЊC with a single action. crystallization condition originally identified from the Hampton Y423 is another residue that could play a role in both Screen Kit I [22]. The conditions were subsequently improved to the final solution consisting of 25%–30% polyethylene glycol 4000, positioning the C-terminal carboxylate and stabilizing 100 mM Tris (pH 8.5), 20–40 mM MgCl2. Of the crystals, two forms the tetrahedral intermediate. This residue is conserved were the easiest to reproduce and of sufficient quality for X-ray for the entire family of PfuCP carboxypeptidases as well characterization. These crystals grew within 10 days (form I) and 6 as neurolysin (Y613; Figure 5) and is located at a similar weeks (form II). Form I crystals are triclinic and belong to the space position in the active site as residues Y149 and H231 of group P1, with four molecules per asymmetric unit. Form II crystals astacin and thermolysin, respectively. These residues are monoclinic and belong to the space group C2, with one molecule per asymmetric unit. have been suggested to stabilize the negative charge of the oxyanion intermediate by hydrogen bonding one Data Collection and Cryoprotection of the coordinating oxygens [10, 21]. In PfuCP, this can The structure was solved from MAD data collected on a single occur by reorientation of ␹1 torsional angle of Y423 (Fig- Yb-derivatized crystal [23]. The crystal was prepared by soaking a ure 6). The OH hydrogen of the phenolic side chain can triclinic crystal for 24 hr in an artificial mother liquor containing 20 also be positioned to hydrogen bond the C-terminal mM YbCl3 in place of MgCl2. The crystal was transferred to para- carboxylate of the modeled substrate. tone-N (Hampton Research) and cryocooled in liquid nitrogen [24]. Data were collected at two wavelengths: the peak of the anomalous The IRXXAD and GXXQDXHW signature sequences absorption and a remote wavelength at 100 K. The wavelengths are proposed to work in concert to bind the substrate were chosen based on an X-ray fluorescence spectrum of the crys-

C-terminal carboxylate and promote release of the tal, close to the ytterbium LIII edge. The Pb-derivatized crystal was cleaved amino acid product. In the absence of substrate, prepared by soaking a monoclinic crystal for 3 days in an artificial R351 and D409 from the IRXXAD and GXXQDXHW motifs, mother liquor containing 1 mM PbBr2. Both the apo-PfuCP and Pb- respectively, form an intrasubunit salt bridge. Modeling PfuCP crystals were transferred to solutions containing increasing concentrations of glycerol up to 25% prior liquor in flash freezing studies indicate, however, that R351 can be repositioned in liquid nitrogen. All diffraction data (Table 1) were collected in to form a salt bridge with the C-terminal carboxylate of oscillation mode using synchrotron radiation with an ADSC-Q4 CCD the bound substrate by a minor adjustment of its ␹1 and detector. Diffraction images were indexed, integrated, and scaled

␹2 torsional angles. This brings R351 within 3.1 A˚ of the using the HKL program suite [25]. C-terminal carboxylate. We propose that R351 serves a role in stabilizing substrate binding, but that, following Structure Solution Determination of the Yb binding sites was made by inspecting hydrolysis, it returns to reform its salt bridge with D409, Harker sections from a Bijvoet difference Patterson map calculated releasing the C-terminal amino acid (Figure 6). from the 1.386 A˚ wavelength data set, using the program package XtalView [26, 27]. The initial MAD phases from 20 to 3.1 A˚ were obtained using the program PHASES (Table 1) [28]. The electron map Biological Implications calculated from the initial phase was used to identify the locations of several long ␣ helices, whose positions, in turn, could be utilized Carboxypeptidase enzymes that are thermostable and to identify the noncrystallographic symmetry elements for the four different molecules in the cell [29]. Following noncrystallographic tolerant to organic solvents would be particularly well symmetry averaging using the software package RAVE from the suited for C-terminal sequencing and other biotechno- Uppsala Software Factory [30], the quality of the maps was sufficient logical applications. Thus, Pyrococcus furiosus car- to build a polyalanine model containing about 80% of the protein boxypeptidase (PfuCP) is of interest due to its high opti- residues. After several cycles of phase combination and rebuilding, mum temperature of 90ЊC–100ЊC [6]. Knowledge of the the side chains could be assigned [28]. The apo-PfuCP and Pb- tertiary structure of PfuCP should facilitate the design PfuCP structures were solved by molecular replacement using the program CNS [31]. of engineered enzymes with altered specificities for use in C-terminal sequencing or other biotechnological pro- Model Building and Refinement cesses. Model building was performed using the program O from both the The PfuCP crystal structure reported here represents original and averaged density maps [32]. The entire polypeptide the first structural characterization of a new family of chain could be traced except for the two N-terminal residues. The carboxypeptidase that is distinct from all other carboxy- program package CNS was used to perform the structure refine- ˚ peptidases. Unexpectedly, it has structural similarity to ment, originally using data from 20 to 3.1 A resolution and then extending to include all the data from 20 to 2.3 A˚ resolution while the rat neurolysin, suggesting the existence of common applying an overall bulk solvent correction [31]. The refinement con- structural fold that can be used for both carboxypepti- sisted of several rounds of least square minimization and annealing dases and . The sequence alignment (at 2500 K), followed by manual rebuilding with the program O [31, Hyperthermophilic Carboxypeptidase Structure 223

32]. Ten percent of the data were excluded for calculating the Rfree characterization of a thermostable carboxypeptidase (carboxy- [33]. Individual B factors were initially set to 15 A˚ 2 for the protein peptidase Taq) From Thermus aquaticus YT-1. Biosci. Biotech- atoms and were refined by applying an overall anisotropic B factor nol. Biochem. 56, 1839–1844. scaling. Water molecules were excluded from this early refinement. 4. Lee, S.H., Taguchi, H., Yoshimura, E., Minagawa, E., Kamino- Final refinement included water molecules and individual aniso- gawa, S., Ohta, T., and Matsuzawa, H. (1994). Carboxypeptidase tropic B factor refinement. The final Yb-derived structure contains Taq, a thermostable zinc , from Thermus aquaticus 16,652 nonhydrogen atoms and 610 water molecules in the asym- YT-1—molecular-cloning, sequencing, and expression of the metric unit. The final R factor is 21.2 with an Rfree factor of 26.7. All encoding gene in Escherichia coli. Biosci. Biotechnol. Biochem. of the residues lie within allowed regions of a Ramachandran plot. 58, 1490–1495. The deviation from ideal of the bond lengths is 0.007 A˚ , and the 5. Lee, S.H., Taguchi, H., Yoshimura, E., Minagawa, E., Kamino- deviation of bond angles is 1.237Њ. The geometry was checked by gawa, S., Ohta, T., and Matsuzawa, H. (1996). The active site the program PROCHECK, and all parameters were in the acceptable of carboxypeptidase Taq possesses the active-site motif His- range with the majority being better than acceptable [34]. The sec- Glu-X-X-His of zinc-dependent endopeptidases and aminopep- ondary structure was assigned by PROCHECK and by visual inspec- tidases. Protein Eng. 9, 467–469. tion of the model: ␣1, residues 7–36; ␣2, residues 39–61; ␣3, residues 6. Cheng, T.C., Ramakrishnan, V., and Chan, S.I. (1999). Purifica- 64–75; ␣4, residues 81–99; ␣5, residues 102–124; ␣6, residues 128– tion and characterization of a cobalt-activated carboxypepti- 149; ␣7, residues 155–163; ␣8, residues 169–192; ␣9, residues 201– dase from the hyperthermophilic archaeon Pyrococcus furiosus. 204; ␣10, residues 209–223; ␤1, residues 230–235; ␤2, residues 239– Protein Sci. 8, 2474–2486. 245; ␤3, residues 248–253; ␣11, residues 261–278; ␣12, residues 7. Ishikawa, K., Ishida, H., Matsui, I., Kawarabayasi, Y., and Ki- 282–285; ␣13, residues 295–306; ␣14, residues 313–320; ␣15, resi- kuchi, H. (2001). Novel bifunctional hyperthermostable carboxy- dues 323–332; ␣16, residues 335–343; ␣17, residues 352–354; ␣18, peptidase/aminoacylase from Pyrococcus horikoshii OT3. Appl. residues 359–375; ␣19, residues 380–395; ␣20, residues 401–404; Environ. Microbiol. 67, 673–679. ␣21, residues 411–414; ␣22, residues 419–441; ␣23, residues 443– 8. Colombo, S., Dauria, S., Fusi, P., Zecca, L., Raia, C.A., and 450; ␣24, residues 454–466; ␣25, residues 473–481; ␣26, residues Tortora, P. (1992). Purification and characterization of a thermo- 488–499. stable carboxypeptidase from the extreme thermophilic archae- bacterium Sulfolobus solfataricus. Eur. J. Biochem. 206, Substrate Modeling 349–357. The probable for the peptide substrate was modeled 9. Chan, M.K., Gong, W.M., Rajagopalan, P.T.R., Hao, B., Tsai, by comparing the structure of PfuCP with those of transition state C.M., and Pei, D.H. (1997). Crystal structure of the Escherichia analogs bound to astacin and thermolysin (Protein Data Bank codes coli peptide deformylase. Biochemistry 36, 13904–13909. 1QJI and 4TMN, respectively) [10, 21]. The modeling strategy in- 10. Holden, H.M., Tronrud, D.E., Monzingo, A.F., Weaver, L.H., and volved superimposing the active site HEXXH motifs of these proteins Matthews, B.W. (1987). Slow-binding and fast-binding inhibitors with that of PfuCP using the program O [32] followed by the place- of thermolysin display different modes of binding—crystallo- ment of a 10-mer polyalanine substrate to the active site groove with graphic analysis of extended phosphonamidate transition-state the scissile carbonyl oxygen of the penultimate alanine in proper analogs. Biochemistry 26, 8542–8553. distance and orientation to coordinate with the active site metal. The 11. Nicholls, A., Bharadwaj, R., and Honig, B. (1993). GRASP— N terminus of the polyalanine model was extended in an antiparallel graphical representation and analysis of surface-properties. Bi- orientation to strand ␤2ofthe␤ sheet near the active site in a similar ophys. J. 64, 166–170. orientation to that of the reference structures. 12. Monzingo, A.F., and Mathews, B.W. (1984). Binding of N-car- boxymethyl dipeptide inhibitors to thermolysin determined by Acknowledgments X-ray crystallography: a novel class of transition-state ana- logues for zinc peptidases. Biochemistry 23, 5724–5729. We thank Robert Sweet and the organizers of the RapiData 2000 13. Rajagopalan, P.T.R., Yu, X.C., and Pei, D.H. (1997). Peptide course, particularly Craig Ogata for his insight into the use of deformylase: a new type of mononuclear iron protein. J. Am. whitelines for MAD phasing. We thank Mike Soltis of SSRL for pro- Chem. Soc. 119, 12418–12419. viding us with additional time to complete the Yb data set that was 14. Lu, G. (2000). TOP: a new method for protein structure compari- critical to solving the structure. Diffraction data were collected at sons and similarity searches. J. Appl. Crystallogr. 33, 176–183. the 9-1 beamline at SSRL, which is operated by the Department of 15. Gibrat, J.F., Madej, T., and Bryant, S.H. (1996). Surprising simi- Energy, Office of Basic Energy Sciences, and the X4A beamline at larities in structure comparison. Curr. Opin. Struct. Biol. 6, the National Synchrotron Light Source, a Department of Energy 377–385. facility, which is supported by the Howard Hughes Medical Institute. 16. Shi, J.Y., Blundell, T.L., and Mizuguchi, K. (2001). FUGUE: se- Use of the Advanced Photon Source was supported by the U.S. quence-structure homology recognition using environment- Department of Energy, Basic Energy Sciences, Office of Science, specific substitution tables and structure-dependent gap penal- under contract number W-31-109-Eng-38. Use of the BioCARS Sec- ties. J. Mol. Biol. 310, 243–257. tor 14 was supported by the National Institutes of Health (NIH), 17. Holm, L., and Sander, C. (1998). Touring protein fold space with National Center for Research Resources, under grant number DALI/FSSP. Nucleic Acids Res. 26, 316–319. RR07707. J.A. has been supported by a Chemistry/Biology Interface 18. Brown, C.K., Madauss, K., Lian, W., Beck, M.R., Tolbert, W.D., Training Grant from NIH (T32 GM 08512). This work was supported and Rodgers, D.W. (2001). Structure of neurolysin reveals a by funds from NIH (GM 61796 to M.K.C. and GM 22432 to S.I.C.) deep channel that limits substrate access. Proc. Natl. Acad. and from a fellowship from the Alfred P. Sloan Foundation (M.K.C.). Sci. USA 98, 3127–3132. 19. Artymiuk, P.J., Grindley, H.M., Park, J.E., Rice, D.W., and Willett, Received: August 21, 2001 P. (1992). 3-Dimensional structural resemblance between leu- Revised: December 15, 2001 cine aminopeptidase and carboxypeptidase-A revealed by Accepted: December 17, 2001 graph-theoretical techniques. FEBS Lett. 303, 48–52. 20. Madden, T.L., Tatusov, R.L., and Zhang, J.H. (1996). Applica- References tions of network BLAST server. Methods Enzymol. 266, 131–141. 21. Grams, F., Dive, V., Yiotakis, A., and Sto¨ cker, W. (1996). Struc- 1. Rawlings, N.D., and Barrett, A.J. (1995). Evolutionary families ture of astacin with a transition-state analogue inhibitor. Nat. of metalloproteases. Methods Enzymol. 248, 183–228. Struct. Biol. 8, 671–675. 2. Stepanov, V.M. (1995). Carboxypeptidase-T. Methods Enzymol. 22. Jancarik, J., and Kim, S.H. (1991). Sparse-matrix sampling—a 248, 675–683. screening method for crystallization of proteins. J. Appl. Crys- 3. Lee, S.H., Minagawa, E., Taguchi, H., Matsuzawa, H., Ohta, tallogr. 24, 409–411. T., Kaminogawa, S., and Yamauchi, K. (1992). Purification and 23. McDonald, N.Q., Panayotatos, N., and Hendrickson, W.A. Structure 224

(1995). Crystal-structure of dimeric human ciliary neurotrophic factor determined by MAD phasing. EMBO J. 14, 2689–2699. 24. Teng, T.Y. (1990). Mounting of crystals for macromolecular crys- tallography in a freestanding thin-film. J. Appl. Crystallogr. 23, 387–391. 25. Otwinowski, Z., and Minor, W. (1997). Processing of X-ray dif- fraction data collected in oscillation mode. Methods Enzymol. 276, 307–326. 26. McRee, D.E. (1999). XtalView/Xfit—a versatile program for ma- nipulating atomic coordinates and electron density. J. Struct. Biol. 125, 156–165. 27. Merritt, E.A., and Murphy, M.E.P. (1994). Raster3d version- 2.0—a program for photorealistic molecular graphics. Acta Crystallogr. D 50, 869–873. 28. Furey, W., and Swaminathan, S. (1997). PHASES-95: a program package for processing and analyzing diffraction data from macromolecules. Methods Enzymol. 277, 590–620. 29. Kleywegt, G.J., and Jones, T.A. (1996). xdlMAPMAN and xdl- DATAMAN—programs for reformatting, analysis and manipula- tion of biomacromolecular electron-density maps and reflection data sets. Acta Crystallogr. D 52, 826–828. 30. Kleywegt, G.J., and Jones, T.A. (1994). Halloween…masks and bones. In From First Map to Final Model, S. Bailey, R. Hubbard, and D. Waller, eds. (Warrington, United Kingdom: SERC Daresbury Laboratory), pp. 59–66. 31. Brunger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.S., Kuszewski, J., Nilges, M., Pannu, N.S., et al. (1998). Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D 54, 905–921. 32. Jones, T.A., Zou, J.Y., Cowan, S.W., and Kjeldgaard, M. (1991). Improved methods for building protein models in electron-den- sity maps and the location of errors in these models. Acta Crys- tallogr. A 47, 110–119. 33. Brunger, A.T. (1993). Assessment of phase accuracy by cross validation—the free R-value—methods and applications. Acta Crystallogr. D 49, 24–36. 34. Laskowski, R.A., Macarthur, M.W., Moss, D.S., and Thornton, J.M. (1993). PROCHECK—a program to check the stereochemi- cal quality of protein structures. J. Appl. Crystallogr. 26, 283–291. 35. Kraulis, P.J. (1991). MOLSCRIPT—a program to produce both detailed and schematic plots of protein structures. J. Appl. Crys- tallogr. 24, 946–950. 36. Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994). CLUS- TAL-W: improving the sensitivity of progressive multiple se- quence alignment through sequence weighting, position-spe- cific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.

Accession Numbers

The coordinates have been deposited in the Protein Data Bank (file names 1K9X, 1KA2, and 1KA4).