Proc. Natl. Acad. Sci. USA Vol. 96, pp. 10968–10975, September 1999 Colloquium Paper

This paper was presented at the National Academy of Sciences colloquium ‘‘Proteolytic Processing and Physiological Regulation,’’ held February 20–21, 1999, at the Arnold and Mabel Beckman Center in Irvine, CA.

Structural aspects of activation pathways of aspartic zymogens and viral 3C protease precursors

AMIR R. KHAN*, NINA KHAZANOVICH-BERNSTEIN,ERNST M. BERGMANN, AND MICHAEL N. G. JAMES†

Medical Research Council Group in Structure and Function, Department of Biochemistry, University of Alberta, Edmonton, Alberta T6G 2H7, Canada

ABSTRACT The three-dimensional structures of the in- about the conversion (5). In a similar fashion, the zymogens of active protein precursors (zymogens) of the serine, cysteine, the papain-like cysteine are converted to the active aspartic, and metalloprotease classes of proteolytic enzymes in a pH-regulated fashion. The in vitro activation of are known. Comparisons of these structures with those of the propapain is consistent with an initial intramolecular cleavage mature, active proteases reveal that, in general, the pre- event (6). The conversion of procarboxypeptidase is initiated formed, active conformations of the residues involved in by trypsin cleavage of the Arg-99p–Ala-1 bond at the proseg- catalysis are rendered sterically inaccessible to substrates by ment to mature junction (7). Prostromelysin-1 can be the residues of the zymogens’ N-terminal extensions or pro- converted to the active form by other proteolytic enzymes, segments. The prosegments interact in nonsubstrate-like heat, or the presence of organomercurial agents (8). There are some generalities regarding zymogen conversion fashions with the residues of the active sites in most of the that one can make in light of the three-dimensional structures cases. The gastric aspartic proteases have a well-character- of both the zymogens and the respective active enzymes (3). ized zymogen conversion pathway. Structures of human pro- First, the residues that constitute the active sites of the gastricsin, the inactive intermediate 2, and active human protease portions of the zymogens have virtually identical are known and have been used to define the conversion conformations to those of the mature, active proteases. The pathway. The structure of the zymogen precursor of plasmep- major exceptions are the serine proteases of the chymotrypsin sin II, the malarial , shows a new twist on the family. The activation process involves the formation of an ion mode of inactivation used by the gastric zymogens. The pair between the newly formed N-terminal residue Ile-16 ϩ ␤ prosegment of proplasmepsin disrupts the active conforma- NH3 and the -carboxylate of Asp-194 (9), which triggers the tion of the two catalytic residues by inducing a conformational changes that form the oxyanion binding major reorientation of the two domains of the mature pro- pocket and the active conformation of the S1 specificity pocket tease. The picornaviral 2A and 3C proteases have a chymo- [the nomenclature of Schechter and Berger (10) is used trypsin-like tertiary structure but with a cysteine nucleophile. throughout this manuscript]. These enzymes cleave themselves from the viral polyprotein in Second, the preformed active sites of the protease portions cis (intramolecular cleavage) and carry out trans cleavages of of zymogens are generally not accessible to substrates because other scissile peptides important for the virus life cycle. residues of the prosegments sterically block the approach to the active sites. This statement does not hold for the chymo- Although the structure of the precursor viral polyprotein is trypsin-like serine proteases as the active sites of these zymo- unknown, it probably resembles the organization of the proen- ␣ gens are able to bind protein inhibitors that induce confor- zymes of the bacterial serine proteases, subtilisin, and -lytic mational changes that form the in spite of the protease. Cleavage of the prosegment is known to occur in cis ion pair involving Asp-194 and Ile-16 being absent (11). for these precursor molecules. Proteolysis of the portion of the prosegments that interact with the residues is prevented in several different Zymogens of proteolytic enzymes consist of the intact protease ways. In prostromelysin the prosegment passes through the with an N-terminal extension. Conversion of the inactive active site in the reverse polypeptide direction (N3C) relative zymogen to the mature, active protease requires limited pro- to substrates or transition state mimics (12). A reverse orien- teolysis usually of a single peptide bond (1). Molecular rear- tation of the prosegment blocking the active site in the cysteine rangements accompany the proteolytic removal of the proseg- protease zymogens also has been observed in the structures of ment of the zymogen, eventually leading to the mature pro- rat procathepsin B (13) and human procathepsin L (14). The region of the prosegment of the gastric aspartic proteases tease. The prosegments of the zymogens range in size from two interacting with the catalytic residues Asp-32 and Asp-215 residues for some of the granzymes to more than 150 residues ␣ (porcine pepsin numbering) most intimately, includes a highly for -lytic protease, a bacterial serine protease (2). conserved lysine at position 36p (the residue numbers of the The conversion of zymogens to the respective active en- ␧ ϩ prosegment are followed by p). The NH3 group of Lys-36p zymes is achieved by several different mechanisms (3). The forms an ion pair with each of the two active site carboxylate active serine proteases of the chymotrypsin family result from groups (15). limited proteolysis of the zymogens by convertases. For ex- ample, the cascade of the blood-clotting enzymes (4) involves Conversion of Gastric Aspartic Protease Zymogens the conversion of inactive forms (e.g., prothrombin) to active forms of the enzyme (thrombin) by a highly specific catalytic The molecular structures of human (16), activa- cleavage by another of the clotting enzymes (factor Xa). On the tion intermediate 2 of human gastricsin (17), and a structural other hand, simply changing the pH of the solution in which the Ϸ gastric aspartic protease zymogens are dissolved from 6.5 to Abbreviation: HPV, human polio virus. ϩ 3.0 (an increase in [H ]ofϷ3,100-fold) is sufficient to bring *Present address: Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138. †To whom reprint requests should be addressed. E-mail: michael. PNAS is available online at www.pnas.org. [email protected].

10968 Downloaded by guest on September 26, 2021 Colloquium Paper: Khan et al. Proc. Natl. Acad. Sci. USA 96 (1999) 10969

homolog to mature, human gastricsin, human pepsin 3 (18) each of these three molecular structures. Fig. 2 shows a allow one to construct a reasonably detailed view of the diagrammatic view of the conversion pathway. This pathway is pathway followed in the conversion of the inactive zymogen to a general pathway for the gastric aspartic proteases but the the active protease. Fig. 1 shows stereo ribbon diagrams of individual enzymes differ in detail.

A

FIG. 1. Structures on the conver- sion pathway of the aspartic protease B zymogen progastricsin. The structure of human gastricsin is not known; the human pepsin structure therefore has been used as a model for gastricsin. This figure, as well as Figs. 3–6 have been prepared with BOBSCRIPT (19) and RASTER 3D (20). (A) The structure of human progastricsin (16) repre- sented in stereo. The residues of the prosegment (Ala-1p to Leu-43p) are in green, those of the gastricsin portion of the zymogen are in blue except for those regions that undergo large con- formational changes, Ser-1 to Ala-13, Phe-71 to Thr-81 and Tyr-125 to Ala- 136, which are represented in mauve. The promature junction is Leu-43p– Ser-1, the peptide bond cleaved in- tramolecularly is Phe-26p to Leu-27p. The side chains of Asp-32 and Asp-217 are represented in red. (B) Stereo view of the molecular structure of interme- diate 2 on the activation pathway of human gastricsin (17). The color C scheme used is the same as in A. The residues missing on this figure, Leu- 22p to Phe-26p and Ser-1, are disor- dered in the structure, and there is no interpretable electron density for them on the maps. The water molecule bound between the two carboxyl groups of Asp-32 and Asp-217 is shown as a red sphere. The final step in the conversion involves the dissocia- tion of the peptide Ala-1p to Phe-26p from gastricsin with the N-terminal residues of gastricsin, Ser-1 (N-ter) to Ala-13, replacing the N-terminal ␤-strand of the prosegment. (C) The structure of human pepsin (18) shown as a model of human gastricsin. The regions of gastricsin that undergo large conformational changes from their po- sitions in progastricsin are shown in pink, and the active site aspartates with the bound catalytic H2O molecule are colored red. Reproduced with permis- sion from ref. 3. Downloaded by guest on September 26, 2021 10970 Colloquium Paper: Khan et al. Proc. Natl. Acad. Sci. USA 96 (1999)

FIG. 2. A diagrammatic representation of the conversion pathway of progastricsin to gastricsin. The prosegment of progastricsin (I), A1p to L43p, has three helical segments and a net positive charge forming several ion pair interactions and electrostatic stabilization with the mature portion of the zymogen. The highly conserved K37p interacts directly with the catalytic aspartates Asp-32 and Asp-217. Lowering the pH of the solution below 4.0 converts progastricsin into intermediate 1 (26) represented by II. Refolding of the prosegment in the vicinity of the active gastricsin brings the first scissile bond Phe-26p–Leu-27p to the exposed active site aspartates (III). Cleavage at the promature junction, Leu-43p–Ser-1, (IV) is likely an intermolecular cleavage (28) and results in intermediate 2 (V), a molecular species that consists of Ala-1p to Phe-26p noncovalently associated with mature gastricsin (17). The final step in the conversion results from the dissociation of the N-terminal peptide 1–26 of the prosegment and refolding the residues Ser-1 to Ala-13 to replace the region of the prosegment in the six-stranded ␤-sheet of gastricsin (VI).

Progastricsin consists of a single polypeptide chain of 372 aa pepsinogen into pepsin was an autocatalytic process (5, 23). In (21). The N-terminal extension or prosegment is 43 aa in addition, the fact that the loss of pepsinogen was not accom- length and comprises residues Ala-1p to Leu-43p. The pro- panied by an equivalent increase in the appearance of pepsin segment is folded into a compact domain having an initial implied the presence of intermediate species on the pathway extended ␤-strand (Val-3p to Lys-8p) followed by three helical (5). Spectroscopic studies of this conversion process estab- segments: Ile-13p to Lys-20p, Leu-22p to Arg-28p, Pro-34p to lished that there are conformational changes (24) in the 5-ms Arg-39p (16). The third helical segment (a 310 helix) packs to 2-s time scale (25). Rapidly changing the pH back to ␧ ϩ against the active site residues and the NH3 group of a neutrality can reverse these conformational changes. conserved lysine residue (Lys-37p in progastricsin) forms ion Biochemical studies of the conversion of human progastric- pair interactions with the carboxyl groups of the two catalytic sin to gastricsin showed the presence of at least two interme- aspartates, Asp-32 and Asp-217. Two tyrosine side chains diates (26). Intermediate 1 is the species formed rapidly after Tyr-38p and Tyr-9 form symmetric H-bonded interactions with the pH was dropped below 4.0. The prosegment is unfolded in the carboxylates of Asp-217 and Asp-32, respectively, further intermediate 1 and the active site of gastricsin is exposed and restricting access to the active site. The phenolic side chain of accessible to substrates. The first hydrolytic event detected Ј Tyr-9 occupies the S1 binding pocket; Tyr-38p is in the S1 during the activation of progastricsin is the intramolecular binding pocket. The tertiary structure of the prosegment cleavage of the Phe-26p to Leu-27p peptide bond (26). Sub- (Leu-1p to Tyr-37p) in porcine pepsinogen (15) is virtually sequently, an intermolecular cleavage at the Leu-43p–Ser-1 identical to that described above for progastricsin (16). peptide bond (the promature junction) results in the formation The polypeptide chain from Tyr-38p to Tyr-9 in progastric- of transient intermediate 2 that can be stabilized by transfer- sin adopts a conformation that is different from the equivalent ring the pH to neutrality (Ͼ6.5). The resulting molecular segment of chain in the pepsinogens (15, 22). As well, a portion species has been characterized biochemically (26) and com- of the polypeptide chain in gastricsin Tyr-125 to Ala-136 (Fig. prises residues Ala-1p to Phe-26p noncovalently associated 1A) is displaced from the position that this chain segment with mature gastricsin (Ser-1 to Ala-329). occupies in all other aspartic protease zymogens and active Intermediate 2 recently has been characterized structurally enzymes (16). (17), and its structure is depicted in Fig. 1B. The ␤-strand The trigger for initiating the conversion of the gastric (Val-3p to Lys-8p) is in the same position as observed in the aspartic protease zymogens is a drop in pH (5). At neutral pH, structure of progastricsin. In addition, the first helix (Ile-13p to the structures of the zymogens are stabilized by the electro- Lys-20p) is intact and is very similarly oriented as it is in the static interactions of the ion pairs and the inactive conforma- zymogen structure. The two catalytic aspartates, Asp-32 and tion is maintained (16). However, when the zymogens reach Asp-217, have a water molecule bound between them in the the acid pH (Ϸ2.0) of the lumen of the stomach, the carbox- same position as the nucleophilic water observed in the native ylate groups become protonated and the repulsive interactions structures of all mature aspartic proteases (27). The S1 binding among the net positive charges of the prosegment destabilize site is occluded, however, as the side chain of Tyr-9 still forms its interactions in the active site of the protease. Kinetic studies a hydrogen bond with the carboxylate of Asp-32. The segment in the late 1930s showed that the conversion of porcine Tyr-125 to Ala-136 has moved from its position in progastricsin Downloaded by guest on September 26, 2021 Colloquium Paper: Khan et al. Proc. Natl. Acad. Sci. USA 96 (1999) 10971

(Fig. 1A) to the position and conformation common among zymogens. Proplasmepsin prosegments contain approximately the mature enzymes whose structures have been solved (Fig. 125 aa and lack sequence similarity with the archetypal gastric 1 B and C). zymogen prosegments, which are typically about 45 residues The ion pair Arg-14p to Asp-11 in intermediate 2 at pH Ϸ6.5 long (35–37). Proplasmepsin prosegments also contain a trans- stabilizes the N-terminal peptide, Ala-1p to Phe-26p, in its membrane helix that anchors these zymogens to the membrane original location in the zymogen preventing the N-terminal during delivery from the endoplasmic reticulum to the diges- residues of gastricsin (Ser-1 to Ala-12) from adopting their tive vacuole where activation and digestion occur final position in the mature enzyme. On the other hand, (38). Activation of proplasmepsin in vivo is carried out by a intermediate 2 would be relatively short-lived at acid pH maturase at acidic pH (38). Additionally, proplasmepsin II and values Ͻ3.0. At these low pH values the carboxylate of Asp-11 P. vivax proplasmepsin can be activated autocatalytically at low would be protonated, and therefore the ion pair with Arg-14p pH, with the cleavage occurring upstream of the wild-type would be severely weakened. Prolonged exposure of interme- mature N terminus (39). diate 2 to an acid environment favors dissociation of the The crystal structure of proplasmepsin II from P. falciparum ␤-strand of the zymogen and its replacement by the N terminus revealed some surprising contrasts with the gastric aspartic of mature gastricsin (Fig. 1C). protease zymogens (40). Instead of blocking a preformed Progastricsin (Fig. 2I) has stabilizing electrostatic interac- active site, as in the gastric zymogens, the prosegment in tions between the positively charged residues of the proseg- proplasmepsin causes a major distortion of the molecule, ment and the negatively charged groups of the gastricsin preventing the formation of a functional active site. portion of the zymogen (16). These electrostatic interactions The recombinant proplasmepsin II used in the crystallo- are weakened by the drop in pH that results in the protonation graphic studies had the prosegment truncated by the first 76 of the carboxylate groups of aspartic and glutamic residues residues to facilitate expression (39). Almost the entire length (29–31). In particular, protonation of the catalytic aspartate of this shortened prosegment interacts with the mature portion groups, Asp-32 and Asp-217, weakens the interactions with of proplasmepsin II. The prosegment has a well-defined sec- ␤ Lys-37p, allowing the uncoiling of the 310 helix and permitting ondary structure, consisting of an initial -strand, followed by the diffusion of this segment away from the active site. The two ␣-helices and a coil connection to the mature N terminus residues surrounding Lys-37p in progastricsin (Thr-29p to (Fig. 3). As in the gastric zymogens (15, 16, 22), the proseg- Asp-33p and Phe-40p to Leu-43p) all have substantially higher ment ␤-strand participates in the six-stranded ␤-sheet, the than average B factors, indicating that they are highly mobile central motif of aspartic proteases, and becomes replaced by and would easily undergo conformational changes that would the mature N terminus upon activation (Fig. 3). Although the expose the preformed active site (Fig. 2II). position of the prosegment ␤-strand is similar to that seen in In contrast, the ␤-strand at the N terminus of the proseg- gastric zymogens, the remainder of the prosegment adopts a ment (Val-3p to Lys-8p) associates with the mature gastricsin very different disposition. Instead of running through the through hydrogen bonding and hydrophobic interactions. substrate-binding cleft, the two helices interact exclusively with These forces are not pH dependent. With the helical regions the C domain of the molecule. The promature junction is of the prosegment uncoiled (Fig. 2II) and the polypeptide located in a tight loop comprised of residues Tyr-122p to from roughly Lys-11p to Ala-13 in a dynamic state of flux, Asp-4, the Tyr-Asp loop, where Asp-4 plays a key role in eventually a sensitive peptide (e.g., Phe-26p–Leu-27p) would maintaining the structure of the loop (with hydrogen bonds to diffuse to the preformed active site and intramolecular cleav- Tyr-122p and Ser-1) and anchoring it to the C domain (with age would occur (Fig. 2III). The resulting cleaved form of the hydrogen bonds to Lys-238 and Phe-241) (Fig. 4a). zymogen (Fig. 2IV) is enzymatically active and also would be The N terminus of the mature sequence differs free to catalyze intermolecular cleavages that have been in conformation between proplasmepsin II and plasmepsin II detected kinetically with pepsinogen (32). This is the likely fate (40, 41). Upon activation, residues 1–14 undergo a large of the bond at the prosegment to mature junction (Leu-43p– conformational change, placing Asp-4 to Phe-11 into the Ser-1); it is cleaved intermolecularly and the peptide Leu-27p central ␤-sheet. Residues 15–29 make a more subtle rearrange- to Leu-43p dissociates from the complex. The noncovalent ment that alters their interactions with the active site Psi loops complex of Ala-1p to Phe-26p bound to gastricsin (interme- (Fig. 4 b and c). When the central ␤-sheet motif and C domain diate 2 or Fig. 2V) can be stabilized by returning the pH to (residues 138–329) of plasmepsin II and proplasmepsin II are neutrality. The final step (Fig. 2 V to VI) in the conversion superimposed, their N domains (residues 30–129) are related process involves a dissociation of the ␤-strand and helical by a rotation of 14° about an axis running roughly in the plane regions (Ala-1p to Phe-26p) of the prosegment from gastricsin of the central ␤-sheet and perpendicular to the strands. This and its replacement by the N-terminal residues of gastricsin. domain movement observed in proplasmepsin II is novel in The prosegments of the pepsinogens and the progastricsins terms of its division into rigid bodies, magnitude, direction, have very similar sequences and three-dimensional structures. and effect on activity (42). It renders the active site cleft more The sequences of prosegments of other aspartic proteases are open in the zymogen than in the enzyme, severely distorting also similar to those of the gastric enzymes, suggesting that the the geometry from that of the active site in plasmepsin II. In general features of the conversion process are shared among proplasmepsin II, the active site Psi loops are farther apart the chymosins and D. Differences in the sites of relative to plasmepsin II (Fig. 4b). Asp-34 and Asp-214 are too internal cleavage and the kinetics of the activation process (33) far apart in the zymogen to carry out the general base are explained partly by the positions of the cleavage sites in the activation of a nucleophilic water molecule. The so-called different prosegments (34). immature active site is therefore catalytically inactive, and upon activation must collapse to the fireman’s grip configu- Conversion of Proplasmepsin II ration that defines the active site in all aspartic proteases of known structure (43) (Fig. 4c). The plasmepsin system presents a different view of aspartic The method of inactivation in proplasmepsin II is different protease activation than do the gastric proteases. Plasmepsin from that observed in the gastric aspartic protease zymogens. is the aspartic protease used by the parasite Plasmo- In proplasmepsin there is no positively charged moiety (such dium to degrade hemoglobin in red blood cells. The plas- as Lys-36p in pepsinogen) to neutralize the charge repulsion mepsins are synthesized as inactive zymogens, the proplas- between the catalytic Asps at neutral pH (44). Instead, the two mepsins, having N-terminal prosegments that differ both in Asp residues are kept apart from each other and are engaged sequence and in size from other known aspartic protease in a network of hydrogen bonds both within and between the Downloaded by guest on September 26, 2021 10972 Colloquium Paper: Khan et al. Proc. Natl. Acad. Sci. USA 96 (1999)

FIG. 3. A structural view of the conversion of proplasmepsin II (Left) (40) to plasmepsin II (Right) (41). The N and C domains are colored yellow, the central motif is green, the prosegment (with its helices labeled) is magenta, and the N-terminal 30 aa of the mature sequence are cyan. The catalytic aspartic acid residues (34 and 214) are colored red. The tip of the flap in proplasmepsin II, which is disordered in the crystal structure, is shown as a dotted line. The peptide bonds cleaved in autoactivation (112p-113p) and in the maturase-assisted activation (124p-1) are marked by asterisks in proplasmepsin II.

Psi loops (Fig. 4b). The function of the prosegment, together (Fig. 4a). Protonation of the Asp-4 side chain should disrupt with the rearranged mature N terminus, is to maintain the the interactions of its carboxylate oxygens (with Tyr-122p, molecule in the open conformation, leaving the active site Lys-238, and Phe-241), opening up the Tyr-Asp loop and accessible but greatly distorted. introducing a slack of five residues into the prosegment The structure of proplasmepsin II suggests that disruption of harness. With this region of the prosegment loosened, the three salt bridges (Glu-87p with Arg-92p, Asp-91p with His- molecule may adopt the domain-closed form with a functional 164, and Glu-108p with Lys-107p) at low pH plays a key role active site. It should be noted that the bond cleaved in in autoactivation. Dissociation of these interactions at low pH autoactivation of proplasmepsin II, Phe-112p to Leu-113p, is should destabilize the prosegment structure and weaken the located at the C terminus of the prosegment helix 2, which association between the prosegment and the C domain. The must be one of the early locations to lose its secondary most dramatic effect of acidification, however, should occur at structure upon acidification. Once the active site is formed, the Asp-4. This residue keeps the promature junction locked in the scissile bond, now in an extended conformation, then can be compact Tyr-Asp loop and tethers this loop to the C domain presented for cleavage either in cis or in trans.

FIG.4. (a) The promature junction in proplasmepsin II. The hydrogen bonding network of Asp-4, within the Tyr-Asp loop (Tyr-122p–Asp-4) and to the C-domain residues Lys-238 and Phe-241, is shown. (b) The immature active site in proplasmepsin II. The Psi loops (31–41 and 211–220) interact with each other through direct and water-mediated hydrogen bonds. In addition, both Psi loops form hydrogen bonds with the N-terminal residues 11–17. (c) The active site of plasmepsin II, showing the symmetrical arrangement of hydrogen bonds around the catalytic aspartates known as the fireman’s grip. The sphere between Asp-34 and Asp-214 is the oxygen atom from pepstatin that was present in the crystal structure (41). The N-terminal residues 15–18 and their interactions with the Psi loops also are shown. Downloaded by guest on September 26, 2021 Colloquium Paper: Khan et al. Proc. Natl. Acad. Sci. USA 96 (1999) 10973

FIG. 5. A structural model for the N-terminal, autocatalytic excision of the enteroviral 3C proteases. The model is based on the crystal structure of the poliovirus 3C protease (HPV 3C) (63). The secondary structure of HPV 3C is shown in a ribbon representation. The N-terminal ␤-barrel domain is blue and the C-terminal ␤-barrel domain is mauve. (I) Model of the precursor of the 3C protease with the 3BԽ3C cleavage site sequence bound in the active site in the conformation of a cognate substrate of the 3C protease. The model was derived from the P5 to P2Ј residues of OMTKY3 bound in the active site of Streptomyces griseus protease B (SGPB) (65) after the optimal superposition of HPV 3C and SGPB (63). Included in this figure are the residues starting at the P5 (Thr) position of the 3B protein. The P5 to P1 residues and residues 1–5 of HPV 3C (P1Ј to P5Ј) are colored gray; residues 6–13 are dark gray. The side chains of three active site residues, the nucleophile Cys-147 (yellow), the general acid-base catalyst His-40 (blue), and the S1 specificity determinant His 161 (light blue), are included. Residues 1–11 of HPV 3C reach into the active site of the protease and are in a mostly extended conformation. After the intramolecular cleavage the new N terminus Gly-1 dissociates from the active site while the P5 to P1 residues are still bound (II). Subsequently residues 6–13 of HPV 3C fold into a stable ␣-helix (colored black), which prevents the new N terminus from binding again to the active site and renders the conformational change irreversible. Arg-13 of the conserved sequence motif K R L ͞R ͞KN ͞I, which forms the last turn of the N-terminal helix in HPV 3C, anchors the N terminus to the structure. (III) The crystal structure of HPV 3C is shown with the N-terminal ␣-helix in black. The rearrangement of the N terminus in this model is accompanied by small conformational changes of ␤-strands aI and bI of the N-terminal domain and the loop (yellow) that connects ␤-strands aII and bII of the C-terminal domain.

Autoactivation of proplasmepsin II takes place readily be- Acidification may be necessary to induce the Tyr-Asp loop tween pH 3.8 and 4.7 (45). The lower pH range covers the pKas opening for the Gly-124p to Ser-1 scissile bond to assume an of Asp and Glu carboxylates in (46), even taking into extended conformation suitable for proteolytic cleavage. Al- account some pKa depression that may be expected because of ternatively, low pH may be required if the maturase itself has these residues’ participation in salt bridges and hydrogen an acidic pH optimum. Further studies of the maturase will be bonds. For instance, the involvement of Asp-4 in a number of needed to resolve this issue. hydrogen bonds and a salt bridge (Fig. 4a) is likely to lower its side-chain pKa relative to that of a solvent-exposed carboxy- Autocatalytic Excision of Picornaviral 3C Proteases late. The requirement for low pH for activation by a maturase is Picornaviruses constitute a large family of positive-sense, less conclusive based on the proplasmepsin II structure. The single-stranded RNA viruses (47). An early and important step promature junction is located at the surface of the molecule in the picornaviral lifecycle is the translation of the single- and therefore should be accessible to the external maturase. stranded viral RNA genome into a single large polyprotein (48, Downloaded by guest on September 26, 2021 10974 Colloquium Paper: Khan et al. Proc. Natl. Acad. Sci. USA 96 (1999)

49). The viral polyprotein is processed into the individual viral accommodate this. Several residues from ␤-strand aI also are gene products by the viral 3C protease, itself a part of the slightly moved away from their positions in the structure of the polyprotein (50). The picornaviral 3C protease is the prototype native 3C protease to widen the cleft between the N- and of the new class of chymotrypsin-like cysteine proteases (49, C-terminal domains through which the N terminus passes. 51, 52). It can cleave itself out of the viral polyprotein in cis and After the autocatalytic cleavage at Gly-1 of HPV 3C the new in trans, when it is expressed as part of the polyprotein or N terminus dissociates out of the active site (Fig. 5II). The separately (53, 54). folding of residues 5–13 into a stable helix, which packs tightly The autocatalytic excision is not correlated with concomi- onto the surface of the molecule, subsequently would render tant development of the proteolytic activity. The precursors of this conformational change irreversible. It is necessary to the 3C gene product already have proteolytic activity (55–58). remove the new N terminus from the protease active site to In the viruses of the genus enterovirus the precursor 3CD (the prevent intramolecular, competitive product inhibition of the 3D gene product constitutes the RNA-dependent RNA poly- protease. K R I merase) shows a proteolytic activity that is distinct from that The conserved sequence motif ͞R ͞KN ͞L that eventually of 3C. In poliovirus, and presumably in most other enterovi- forms the last turn of the N-terminal helix anchors the residues ruses, the proteolytic activity of the precursor 3CD is required of the N-terminal helix to the core structure of the protease. for the efficient processing of the capsid precursor proteins The side chains of Arg-13 and Asn-14 interact with the highly (58). In hepatitis A virus the 3ABC gene product appears to conserved sequence motif KFRDI of the RNA- of be an important, proteolytically active intermediate of the the 3C protease. The residues that will become the N-terminal polyprotein processing (57). helix are in an extended conformation in the precursor (Fig. Palmenberg and Rueckert (59) examined the kinetics of the 5I). The up-down side-chain pattern in this extended confor- polyprotein processing in the picornavirus, encephalomyocar- mation places the small side chains of Ala-7, Ala-9, and Ala-11 ditis virus. Their data suggest that the autocatalytic excision of (P7Ј,P9Ј, and P11Ј) into the cleft between the two domains of the 3C gene product can be a truly intramolecular event. the proteases and the larger side chains of residues Tyr-6, Further evidence for an intramolecular excision of the 3C Val-8, and Met-10 point to the surface. Larger side chains than protease from poliovirus was provided by Hanecak et al. (60). alanine in positions 7, 9, and 11 would not have fitted easily Taken together, these data suggest that the autocatalytic into the surface of the cleft. We suggest therefore that the excision of 3C from the polyprotein at both the N and C three alanine residues are important for the conformation of termini can be either intramolecular or intermolecular. the N terminus in the precursor as well as for the formation of Once atomic resolution structures of 3C proteases became the N-terminal helix. available (61–63), it was possible to develop structural models It is much more difficult to envision an intramolecular for the autocatalytic excision of the picornaviral 3C proteases. cleavage of the picornaviral 3C protease at its own C terminus. The crystal structures confirmed that the picornaviral 3C The crystal structure of the core proteins from Sindbis and proteases are structurally related to the chymotrypsin family of Semlicki forest viruses (66) show how an additional ␤-strand serine proteases. Based on some of the unique structural can reach from the C terminus to the active site of a chymo- details, the authors of the crystal structure papers (61–63) trypsin-like protease; however, the unique antiparallel ␤-rib- proposed similar models for an autocatalytic, intramolecular bon of the picornaviral 3C proteases that extends from cleavage at the N terminus of 3C. They also agreed that it is ␤-strands bII and cII and interacts with the N-terminal domain much less obvious how an intramolecular cleavage could occur would prevent this (Fig. 5III). at the C terminus of 3C. The N-terminal residues of the picornaviral 3C proteases We thank Perry d’Obrennan for help in making Fig. 2. Mae Wylie form a stable ␣-helix that precedes the first strand of the has been very helpful in getting the manuscript into its final polished N-terminal ␤-barrel domain. This helix packs against the form. A.R.K. was supported by a Medical Research Council of Canada surface of the C-terminal domain of 3C. The last turn of this Studentship; N.K.-B. was the holder of an Alberta Heritage Founda- ␣-helix is formed by the residues of a highly conserved tion for Medical Research Studentship. This work has been supported sequence motif K͞ R͞ NI͞ (48). by the Medical Research Council of Canada and by Grant R K L UO1AI38249 from the National Institute of Allergy and Infectious Another unusual feature of the picornaviral 3C proteases is Diseases of the National Institutes of Health. an antiparallel ␤-ribbon that extends from the C-terminal ␤-barrel (49). It forms an extension of the second and third ␤ 1. Neurath, H. (1957) in Advances in Protein Chemistry XII, eds. -strands of the C-terminal domain and corresponds topolog- Anfinsen, C. B., Jr., Anson, M. L., Bailey, K. & Edsall, J. T. ically to the methionine loop of the chymotrypsin-like serine (Academic, New York), pp. 319–386. proteases. This feature is also present in the bacterial serine 2. Sauter, N. K., Mau, T., Rader, S. D. & Agard, D. A. (1998) Nat. proteases such as ␣-lytic protease and Streptomyces griseus Struct. Biol. 5, 945–950. protease B (64, 65). The recent crystal structure of ␣-lytic 3. Khan, A. R. & James, M. N. G. (1998) Protein Sci. 7, 815–836. protease complexed with its prosegment (2) revealed that this 4. Davie, E. W., Fujikawa, K. & Kisiel, W. (1991) Biochemistry 30, feature plays an important role in the folding of the protease 10363–10370. 22, and in the autocatalytic, intramolecular processing of the 5. Herriott, R. M. (1939) J. Gen. Physiol. 65–78. ␣ 6. Vernet, T., Khouri, H. E., Laflamme, P., Tessier, D. C., Gour- precursor of -lytic protease. Salin, B., Storer, A. C. & Thomas, D. Y. (1991) J. Biol. Chem. 266, The structural model of an intramolecular cleavage at the N 21451–21457. terminus of the picornaviral 3C proteases (Fig. 5) predicts that 7. Aviles, F. X., Vendrell, J., Guasch, A., Coll, M. & Huber, R. the N-terminal ␣-helix folds to its final conformation only after (1993) Eur. J. Biochem. 211, 381–389. the 3C protease has cleaved its own N terminus (61–63). 8. Nagase, H., Enghild, J. J., Suzuki, K. & Salvesen, G. (1990) Before the intramolecular cleavage at the 3BԽ3C site, the Biochemistry 29, 5783–5789. corresponding residues [Gly-1 to Lys-12 in human polio virus 9. Huber, R. & Bode, W. (1978) Acc. Chem. Res. 11, 114–122. (HPV) 3C] must be in an extended conformation (Fig. 5I)to 10. Schechter, I. & Berger, A. (1967) Biochem. Biophys. Res. Com- reach into the active site through the cleft between ␤-strand bI mun. 27, 157–162. 11. Bode, W., Schwager, P. & Huber, R. (1978) J. Mol. Biol. 118, from the N-terminal domain and the loop connecting 99–112. ␤-strands aII and bII from the C-terminal domain. The loop ␤ 12. Becker, J. W., Marcy, A. I., Rokosz, L. L., Axel, M. G., Burbaum, that connects -strands aII and bII had to be moved in the J. J., Fitzgerald, P. M. D., Cameron, P. M., Esser, C. K., model of the precursor molecule (Fig. 5I), with respect to its Hagmann, W. K., Hermies, J. D. & Springer, J. P. (1995) Protein position in the native HPV 3C protease structure (Fig. 5III)to Sci. 4, 1966–1976. Downloaded by guest on September 26, 2021 Colloquium Paper: Khan et al. Proc. Natl. Acad. Sci. USA 96 (1999) 10975

13. Turk, D., Podobnik, M., Kuhelj, R., Dolinar, M. & Turk, V. 43. Fusek, M. & Vetvicka, V. (1995) Aspartic Proteases: Physiology (1996) FEBS Lett. 384, 211–214. and Pathology (CRC, New York), pp. 22–24. 14. Coulombe, R., Grochulski, P., Sivaraman, J., Menard, R., Mort, 44. Sielecki, A. R., Fujinaga, M., Read, R. J. & James, M. N. G. J. S. & Cygler, M. (1996) EMBO J. 15, 5492–5503. (1991) J. Mol. Biol. 219, 671–692. 15. James, M. N. G. & Sielecki, A. R. (1986) Nature (London) 319, 45. Moon, R. P., Bur, D., Loetscher, H., D’Arcy, A., Tyas, L., Oefner, 33–38. C., Grueninger-Leitch, F., Mona, D., Rupp, K., Dorn, A., et al. 16. Moore, S. A., Sielecki, A. R., Chernaia, M. M., Tarasova, N. I. & (1997) Eur. J. Biochem. 244, 552–560. James, M. N. G. (1995) J. Mol. Biol. 247, 466–485. 46. Tanford, C. (1962) Adv. Protein Chem. 17, 69–165. 17. Khan, A. R., Cherney, M. M., Tarasova, N. I. & James, M. N. G. 47. Rueckert, R. R. (1996) in Fields Virology, eds. Fields, B. N., (1997) Nat. Struct. Biol. 4, 1010–1015. Knipe, D. M., Howley, P. M., Channock, R. M., Melnick, J. L., 18. Fujinaga, M., Chernaia, M. M., Tarasova, N. I., Mosimann, S. C. Monath, T. P., Roizmann, B. & Straus, S. E. (Lippincott–Raven, 4, & James, M. N. G. (1995) Protein Sci. 960–972. Philadelphia), pp. 609–654. 19. Esnouf, R. M. (1997) J. Mol. Graphics 15, 133–138. 48. Bergmann, E. M. & James, M. N. G. (1999) in Proteases of 20. Merritt, E. A. & Murphy, M. E. P. (1994) Acta Crystallogr. D 50, Infectious Agents, ed. Dunn, B. (Academic, San Diego), pp. 869–873. 139–163. 21. Taggart, R. T., Cass, L. G., Mohandas, T. K., Derby, P., Barr, P. J., Pals, G. & Bell, G. I. (1989) J. Biol. Chem. 264, 375–379. 49. Bergmann, E. M. & James, M. N. G. (1999) in Handbook of 22. Bateman, K. S., Cherney, M. M., Tarasova, N. I. & James, Experimental Pharmacology, eds. von der Helm, K. & Korant, B. M. N. G. (1998) in The Aspartic Proteases: Retroviral and Cellular (Springer, Heidelberg), in press. Enzymes, ed. James, M. N. G. (Plenum, New York), pp. 259–263. 50. Palmenberg, A. C. (1990) Annu. Rev. Microbiol. 44, 602–623. 23. Herriott, R. M. (1938) J. Gen. Physiol. 21, 501–540. 51. Gorbalenya, A. E. & Snijder, E. J. (1996) Perspect. Drug Discovery 24. McPhie, P. (1972) J. Biol. Chem. 247, 4277–4281. Des. 6, 64–86. 25. Auer, H. E. & Glick, D. M. (1984) Biochemistry 23, 2735–2739. 52. Ryan, M. D. & Flint, M. (1997) J. Gen. Virol. 78, 699–723. 26. Foltmann, B. & Jensen, A. L. (1982) Eur. J. Biochem. 128, 63–70. 53. Harmon, S. A., Updike, W., Jia, X.-Y., Summers, D. F. & 27. Davies, D. R. (1990) Annu. Rev. Biophys. Chem. 19, 189–215. Ehrenfeld, E. (1992) J. Virol. 66, 5242–5247. 28. Al-Janabi, J., Hartsuck, J. & Tang, J. (1971) J. Biol. Chem. 247, 54. Richards, O. C., Ivanoff, L. A., Bienkowska-Szewczyk, K., Butt, 4628–4632. B., Petteway, S. R., Jr., Rothstein, M. A. & Ehrenfeld, E. (1987) 29. Foltmann, B. (1981) Essays Biochem. 17, 52–84. Virology 161, 348–356. 30. Perlmann, G. E. (1963) J. Mol. Biol. 6, 452–464. 55. Davis, G. J., Wang, Q. M., Cox, G. A., Johnson, R. B., Wakulchik, 31. Glick, D. M., Shalitin, Y. & Hitt, C. R. (1989) Biochemistry 28, M., Datson, C. A. & Villarreal, E. C. (1997) Arch. Biochem. 2626–2630. Biophys. 346, 125–130. 32. Marciniszyn, J., Huang, J. S., Hartsuck, J. A. & Tang, J. (1976) 56. Ju¨rgensen, D., Kusov, Y. Y., Facke, M., Kra¨usslich, H. G. & J. Biol. Chem. 251, 7095–7102. Gauss-Mu¨ller, V. (1993) J. Gen. Virol. 74, 677–683. 33. Kageyama, T., Ichinose, M., Miki, K., Athauda, S. B., Tanji, M. 57. Probst, C., Jecht, M. & Gauss-Mu¨ller, V. (1998) J. Virol. 72, & Takahashi, K. (1989) J. Biochem. (Tokyo) 105, 15–22. 8013–8020. 34. Dunn, B. (1997) Nat. Struct. Biol. 4, 969–972. 58. Ypma-Wong, M. F., Dewalt, P. G., Johnson, V. H., Lamb, J. G. 35. Dame, J. B., Reddy, R. G., Yowell, C. A., Dunn, B. M., Kay, J. & Semler, B. L. (1988) Virology 166, 265–270. & Berry, C. (1994) Mol. Biochem. Parasitol. 64, 177–190. 59. Palmenberg, A. C. & Rueckert, R. R. (1982) J. Virol. 41, 244–249. 36. Berry, C., Dame, J. B., Dunn, B. M. & Kay, J. (1995) in Aspartic 60. Hanecak, R., Semler, B. L., Ariga, H., Anderson, C. W. & Proteases: Structure, Function, Biology, and Biomedical Implica- Wimmer, E. (1984) Cell 37, 1063–1073. tions, ed. Takahashi, K. (Plenum, New York), pp. 511–518. 61. Bergmann, E. M., Mosimann, S. C., Chernaia, M. M., Malcolm, 37. Francis, S. E., Gluzman, I. Y., Oksman, A., Knickerbocker, A., B. A. & James, M. N. G. (1997) J. Virol. 71, 2436–2448. Mueller, R., Bryant, M. L., Sherman, D. R., Russell, D. G. & Goldberg, D. E. (1994) EMBO J. 13, 306–317. 62. Matthews, D. A., Smith, W. W., Ferre, R. A., Condon, B., 38. Francis, S. E., Banerjee, R. & Goldberg, D. E. (1997) J. Biol. Budahazi, G., Sisson, W., Villafranca, J. E., Janson, C. A., Chem. 272, 14961–14968. McElroy, H. E., Gribskov, C. L. & Worland, S. (1994) Cell 77, 39. Hill, J., Tyas, L., Phylip, L., Kay, J., Dunn, B. M. & Berry, C. 761–771. (1994) FEBS Lett. 352, 155–158. 63. Mosimann, S. C., Chernaia, M. M., Sia, S., Plotch, S. & James, 40. Khazanovich Bernstein, N., Cherney, M. M., Loetscher, H., M. N. G. (1997) J. Mol. Biol. 273, 1032–1047. Ridley, R. G. & James, M. N. G. (1999) Nat. Struct. Biol. 6, 32–37. 64. Fujinaga, M., Delbaere, L. T. J., Brayer, G. D. & James, M. N. G. 41. Silva, A. M., Lee, A. Y., Gulnik, S. V., Maier, P., Collins, J., Bhat, (1985) J. Mol. Biol. 184, 479–502. T. N., Collins, P. J., Cachau, R. E., Luker, K. E., Gluzman, I. Y., 65. Huang, K., Lu, W., Anderson, S., Laskowski, M., Jr. & James, et al. (1996) Proc. Natl. Acad. Sci. USA 93, 10034–10039. M. N. G. (1995) Protein Sci. 4, 1985–1997. 42. Sali, A., Veerapandian, B., Cooper, J. B., Moss, D. D., Hofmann, 66. Tong, L., Wengler, G. & Rossmann, M. G. (1993) J. Mol. Biol. T. & Blundell, T. L. (1992) Proteins 12, 158–170. 230, 228–247. Downloaded by guest on September 26, 2021