The EMBO Journal vol.15 no.20 pp.5492-5503, 1996 Structure of human procathepsin L reveals the molecular basis of inhibition by the prosegment

Rene Coulombe1 2'3, Pawel Grochulski1l3"4, mammals, from this superfamily represent a J.Sivaraman1'2'3, Robert Menardl'2, major component of the lysosomal degradation John S.Mort2'5 and Miroslaw Cygler1l2'3'6 system (Kirschke and Barrett, 1987). While new members have been discovered recently which appear to have 'Biotechnology Research Institute, National Research Council of selective ( S; Kirschke et al., 1989) or specific Canada, 6100 Royalmount Avenue, Montreal, Quebec H4P 2R2, (; Tezuka at cellular 5Shriners Hospital for Crippled Children and Department of Surgery, al., 1994) distributions, McGill University, Montreal, Quebec H3G 1A6, 2Protein Engineering the best characterized enzymes, B and L, are Network of Centers of Excellence, Canada and 3Montreal Joint Centre generally present in all cells. In addition to their intracellu- for Structural Biology, Quebec, Canada lar role in protein recycling, evidence has been accumulat- 4Permanent address: Institute of Physics, Technical University of Lodz, ing that implicates cathepsins B and L in various Poland physiological and pathological processes where they play 6Corresponding author an extracellular role. is of particular interest because of evidence for its action in tumor invasion and Cathepsin L is a member of the superfamily of metastasis (Yagel et al., 1989) and in bone resorption cysteine and, like many other proteases, it is (Kakegawa et al., 1993). In addition, it is present at high synthesized as an inactive proenzyme. Its prosegment levels in the synovium ofpatients with rheumatoid arthritis, shows little homology to that of procathepsin B, whose suggesting a role in the progress of this inflammatory structure, the first for a cysteine proenzyme, disease (Trabandt et al., 1990). has been determined recently. We report here the To protect the cell from the potentially disastrous 3-D structure of a mutant of human procathepsin L consequences of uncontrolled degradative activity, essen- determined at 2.2 A resolution, describe the mode of tially all proteases are synthesized as inactive proenzymes binding employed by the prosegment and discuss the due to the presence of an N-terminal propeptide extension. molecular basis for other possible roles of the In addition to its role as a potent inhibitor of proteolytic prosegment. The N-terminal part of the prosegment is activity (Carmona et al., 1996), the prosegment of cathep- globular and contains three a-helices with a small sin L has been shown to be crucial for the correct folding hydrophobic core built around aromatic side chains. of the newly synthesized and to stabilize the This domain packs against a loop on the enzyme's protein to the denaturing effects of neutral to alkaline pH, surface, with the aromatic side chain from the which rapidly inactivates the enzyme (Tao et al., 1994). prosegment being located in the center of this loop and Cleavage and dissociation ofthe propeptide with concomit- providing a large contact area. The C-terminal portion ant activation of the protease occur as the result of of the prosegment assumes an extended conformation autoprocessing under acidic conditions (Mason et al., and follows along the substrate binding cleft toward 1987). In vivo this takes place as the newly synthesized the N-terminus of the mature enzyme. The direction proenzyme is routed from the Golgi apparatus to the of the prosegment in the substrate binding cleft is . Whether processing in vivo is an intra- or opposite to that of substrates. The previously described intermolecular event is not yet clear, although the pro- role of the prosegment in the interactions with mem- cessing in vitro can be achieved by intermolecular events. branes is supported by the structure of its N-terminal In addition, recent studies clearly indicate that the pro- domain. The fold of the prosegment and the mechanism cessing of procathepsin L is enhanced dramatically in the by which it inhibits the enzymatic activity of procathep- presence of polyanions such as dextran sulfate (Mason sin L is similar to that observed in procathepsin B and Massey, 1992), suggesting that interactions with despite differences in length and sequence, suggesting membrane components may play an important role in the that this mode of inhibition is common to all enzymes activation process in vivo. from the papain superfamily. 3-D structures of several enzymes from the papain Keywords: cathepsin L//proenzyme/ superfamily have been determined (papain, Drenth et al., propeptide/3-D structure 1968; actinidin, Baker, 1977; calotropin Dl, Heinemann et al., 1982; , Musil et al., 1991; , Pickersgill et al., 1991; , McGrath et al., 1995; Introduction , O'Hara et al., 1995), and their overall similarity corresponds to the level expected from The largest family of cysteine proteases identified to date sequence homologies. A more detailed analysis of the by is the papain superfamily. Its sequences has allowed the identification of two subfamilies members include a wide range of enzymes from both within this family (Karrer et al., 1993). In terms of the prokaryotes and eukaryotes, encompassing bacteria, plants, mammalian members of the papain superfamily, these invertebrates and vertebrates (Berti and Storer, 1995). In subfamilies can be designated cathepsin L-like and

549252 Oxford University Press Structure of human procathepsin L

Fig. 1. Stereoview of the Cu. tracing of procathepsin L. The prosegment is shown in thick lines. Its bulk is located on one side of catL contacting the 140-155 (PBL) loop. Catalytic residues are shown in full, as well as the Phe56p from the prosegment. The dotted line indicates the location of the disordered loop. Selected residues are numbered. cathepsin B-like enzymes, with the majority of the from the cathepsin L and B subfamilies leaves open the enzymes (amongst others cathepsins L, S and K) belonging question of the proregion fold and the mechanism of to the former group. The main differences between the inhibition for the former, the major subfamily of papain- cathepsin L and B subfamilies are restricted to a few short like proteases. The considerable interest in the proregions stretches where they have distinct fingerprint sequences. also stems from the fact that their synthetic peptidyl The cathepsin B group also has a long insertion at around equivalents are not only extremely good but also rather residue 95 (cathepsin L numbering), termed the occluding selective inhibitors of the parent enzymes (Fox et al., loop (Musil et al., 1991), which is responsible for the 1992; Carmona et al., 1996). exopeptidase activity unique to this subfamily. While We report here the 2.2 A resolution structure of human the proteases of the two subfamilies are still highly procathepsin L. Our results show that the proregion fold homologous, their proregions show very little homology and its mode of interaction with the enzyme are highly across the two groups and differ significantly in length: reminiscent of those observed in procathepsin B, thus those of the cathepsin L subfamily are 30-40 residues strongly suggesting that the general mechanism of inhibi- longer than those of the cathepsin B subfamily. Of the tion of enzymatic activity by the propeptide is conserved mammalian cysteine proteases in the first group, cathepsin across the whole papain superfamily. L represents one of the best biochemically characterized members. Recent studies indicate that the prosegment of cathepsin L has other functions in addition to its role as Results an inhibitor and stabilizer. It has been shown that a specific Description of the structure region of the prosegment provides at least part of the The molecule of procathepsin L is globular, with approxim- recognition site for modification with mannose-6-phos- ate dimensions of 40 X 55 X 65 A. The part that corresponds phate, a signal for targeting to the lysosome (Cuozzo to the mature cathepsin L (we will refer to it as catL) et al., 1995), and that the N-terminal basic region may closely resembles other papain-like cysteine proteases. play a role in microsomal membrane binding (McIntyre The bulk of the prosegment lies on one side of catL, near and Erickson, 1993; McIntyre et al., 1994). the 140-155 loop (Figure 1). X-ray crystallographic studies of serine (Huber and Bode, 1978; Gallagher et al., 1995), aspartic (James and Cathepsin L. CatL has a fold typical for cysteine proteases Sielecki, 1986) and metalloprotease (Coll et al., 1991; belonging to the papain superfamily (see, for example, Becker et al., 1995) proenzymes have demonstrated the Baker and Drenth, 1987). Briefly, it is made up of two structural basis for the inactivation of the enzymes by domains delimiting an active-site cleft containing the their propeptides, but no such data were available for the catalytic Cys25 and Hisl63 residues. Cys25 is in the first, members of the cysteine protease family. This situation mostly a-helical domain, while Hisl63 is in the second, changed recently with our and others' determinations of mostly 5-sheet domain. Comparison with actinidin (Figure the 3-D structures of rat (Cygler et al., 1996) and human 2) shows that 182 Co atoms (out of 218 residues of procathepsin B (Turk et al., 1996). These structures actinidin) superimpose with a r.m.s. deviation of 0.53 A. delineated the principles by which the proregion inhibits The backbone differences between these two molecules enzymatic activity. However, the lack of a convincing and also papain are due to insertions/deletions in loops sequence homology between the proregions of enzymes that are almost exclusively on the front side of the 5493 R.Coulombe et aL

Fig. 2. Stereo drawing of the Ca tracing of catL and actinidin after their superposition. CatL is shown in thick lines, actinidin in thin lines. Selected residues of cathepsin L are numbered to help orientate the reader.

9 1)

'N1'

1 1P I

3 1 P l.3. 2.. , > 31p .. p0. 19 / - 651 701) 11 62p

556 ..6;)

Fig. 3. Stereoview of the N-terminal globular domain of the prosegment with the secondary structure elements marked. Side chains forming the hydrophobic core and those involved in highly conserved salt bridges are shown in full. Positively charged residues are blue, negatively charged are red, aromatic are orange and aliphatic are brown. Phe56p, which fills the center of PBL, is shown in green. Hydrogen bonds are shown as thin black lines. Prepared by Molscript (Kraulis, 1991). molecule (when looking at the active-site cleft; Figure 2). this helix in an extended conformation (Plp: 56p-59p), While the overall shape of the active-site cleft is the same forming a hairpin structure. The propeptide chain follows in all these molecules, it differs in some of the side chains toward the and folds on the way into a third, lining the cleft (see sequence alignment in Berti and short a-helix made of only two turns (ax3p: 68p-75p). Storer, 1995), which accounts for the differences in their This a-helix anchors the following part of the prosegment substrate specificities (Brocklehurst et al., 1987). which dips into the substrate binding cleft (75p-79p). The prosegment. The prosegment has two components: a From there, the prosegment emerges onto the surface of globular domain formed by the first ~75 residues, and the protein and follows along the groove between the two a ~20 residue-long, extended C-terminal portion. This domains of catL towards the N-terminal Alal of catL. A extended tether links the globular domain to residue Alal short, two-stranded antiparallel f-sheet is formed there of catL. The globular domain is highly structured; it between Lys87p-Phe89p and Phell2-Aspll4 of catL. contains three a-helices and connecting loops (Figure 3). The globular domain of the prosegment has a hydro- There is no electron density for the first four residues, phobic core formed by side chains extendig from helices indicating that they are disordered in the crystal. The well- alp and a2p. They include three interdigitating trypto- ordered structure starts with Asp5p, at the beginning of a phans (12p, 15p and 35p), Hisl9p, Tyr23p, the aliphatic four-turn a-helix (alp: 6p-19p). A short loop leads to part of the Arg2ip side chain and two methionines, 39p the central, long, amphipathic a-helix containing seven- and 60p. In addition, the interface between helix a2p and and-a-half turns (a2p: 25p-5ip). These two helices are strand ,Blp is formed from hydrophobic portions of the inclined to each other at an angle of -65°. From the end side chains of Ile42p, Asn46p, Thr57p and Met58p. These of helix a2p the chain turns back and follows along interactions are supplemented by a stacking between 5494 Structure of human procathepsin L

Table I. Salt bridges within the prosegment

Atom Distance (A) Atom Distance (A) Atom Distance (A) Atom

OE Glu7oP 3.0 NH 1Ar3Ip OE2GlUYOP 2.9 NH2Ar3lp 3.2 OE2GIU27P NEArg3lp 3 0 OE2GIu27p OElGIu9p 2.8 NHIArg32p OE2 Gu9p 33 N H lArg32p NEArg32p 3.2 OE2G1u36p OD1Asp65p 3.1 2Ar-2 p 2.9 OAIa62p OHTYr23P 2.8 OD2ASP65P 3.0 NEArg2Ip NHIAr2Ip 3.1 OHisI9p OE1 GIu28p 2.9 NZLys I6p OE2Glu28p 3.2 NZL- s l6p OE2Glu48p 2.9 NE2Hls54p NZLYs53p 2.9 OE2GIu5 p

His45p and His54p and by three side-chain-main- the interactions in the cleft is nearly 500 A-. A striking chain hydrogen bonds (NE2His45p...OPhe56p= 2.8 A feature of this association is the orientation of the ND2Asn46p...OMet58P = 3.0 A, ODlASn46P...NMet58P = 2.9 A). prosegment in the cleft, which is opposite to that of a Electrostatic interactions in the form of salt bridges are natural substrate. The prosegment lines the cleft along the likely to play a significant role in the stabilization of the S and S' subsites (defined by analogy to papain and globular structure of the prosegment's N-terminal domain. cathepsin B; Baker and Drenth, 1987; Turk et al., 1995), There are six salt-bridged clusters in close proximity to but the S' subsites are occupied by the N-terminal residues each other within the prosegment. Two of them, containing Phe7lp, Met75p and Asn76p, while the S subsites are Arg3 lp and Arg32p, join three side chains, while the side occupied by the C-terminal residues Gly77p-Gln79p. chain of Arg2lp hydrogen bonds not only to Asp65p but Helix o3p is lodged on the S' side of the active site, also to the carbonyl groups of Hisl9p and Ala62p (Table between the loop Gln2O-Ser24 (leading to the active-site I and Figure 3). Two of these salt bridges are between cysteine), residues Alal38-Leul44, on the edge of PBL, helices cxlp and u2p, and one is between helices ux2p and an indole ring of Trp189. The main chain of the and oc3p. proregion between Met75p and Gln79p forms numerous hydrogen bonds to catL (Figure 5). Met75p extends into Interactions between the prosegment and catL the S' sites and extends above the indole ring of Trpl89. The prosegment contacts catL in two main areas: (i) along The amino acid in position 77p plays an important role the substrate binding cleft and (ii) along the surface formed in the way the proregion fills the cleft. A glycine allows by the Hisl40-Aspl55 loop. This loop will be referred to for a close approach to the active-site nucleophile and as the prosegment binding loop (PBL). permits the polypeptide to reach the bottom of the cleft. PBL. The prosegment contacts PBL mainly through The becomes occupied by the carbonyl of backbone of which forms the preceding residue, Asn76p. This oxygen forms three residues of the P3lp strand, the in WT) a short, two-stranded 13-sheet with the backbone of catL, hydrogen bonds: to NH and OG of Ser25 (Cys residues Phe 147-lIe 150. In addition to the hydrogen bonds and to NE2 of Gln 19. Other important contacts are formed formed between the main-chain atoms of these two strands, by the side chain of residue 78p. In the wild-type enzyme there are also many hydrophobic contacts between their this residue is a phenylalanine, but in the mutant studied side chains (Figure 4). The side chain of Met58p is in here it is substituted by a leucine. This Leu78p occupies the S, subsite and is close to Met70, Alal35 and Glyl64. close contact with Phel45 and Serl42 and contributes to the the van der Waals interactions with catL. Of particular Gly68 is highly conserved in proteases from papain of with superfamily and interacts with the peptidyl substrates and importance are the interactions Phe56p Tyrl46 bonds. and TyrlS 1. These tyrosines are part of an aromatic cluster inhibitors through the formation of two hydrogen extending from the active site and containing, in addition, These bonds have been observed in complexes with Phe 143, His 163, Trp 189 and Trp 193. The prosegment also substrate-like inhibitors as well as with the reverse inhibitor interacts with the other side of this aromatic cluster E-64 (see, for example, Gour-Salin et al., 1994). The f l and same hydrogen bonds are also formed between Gly68 and through residues from the loop between strand p of helix (x3p; two side chains, Phe63p and Phe7lp, extend the prosegment, and involve the carbonyl oxygen this cluster in the vicinity of the active site. The side Gly77p and the NH of Gln79p. chains of two other residues of the prosegment, Met4lp C-termiinal segment. The prosegment emerges from the and Asn6lp, contact PBL through Glul4l and Phel45, substrate binding cleft and follows in an extended con- respectively. The contact surface area between PBL and formation towards the N-terminus of catL, with its side the prosegment is in excess of 400 A-. chains spread on both sides of the backbone along the Active-site region. The contacts between catL and the surface of the enzyme. A short f-strand is formed by prosegment within the substrate binding cleft are also Gly87p-Phe89p as mentioned above. Only two of the side extensive and involve residues Phe71p and Met75p to chains, Phe89p and Tyr95p, point directly towards the Gln79p (Figure 5). The contact surface area created by surface of catL and occupy small hydrophobic pockets. 5495 R.Coulombe et al

A.

... ..

s in- 5 > -/ . xy; t . /e: :: ::: :. s-: i ::: ::: :: .::--::-: :. ::-::x. .x. 2 Sis @ \; S - ;' --2; 78 ze

; ¢. t e; ¢-- Z f

f 4s ; aS- || -; 2 ;$

H s; / j

> tF > r . ;zs . (

.... T. 'i . b. >

'> jGb,

; l. .' |s;fl'#l oSSj/

Fig. 4. Contacts between the prosegment and the PBL of catL. (A) Final 3F0-2Fc electron density around residues of the prosegment interacting with PBL displayed at 1a level. (B) Residues in the interface (4.2 A cut-off distance) are shown in full. The prosegment is shown as shaded lines and the PBL as open lines. Hydrogen bonds are shown as dashed lines. :.,7.4::

These intimate contacts with catL result in their tempera- of cathepsin L (Figure 6A). Their structure-based sequence ture factors being significantly lower than those of other alignment*.is!3*shown*>rS_''_bin-i.....Figureg.¢., 6B. The main difference is residues in this part of the proregion and are of the same at the N-terminus:*;:;.e*the,..-first...... helix,.0alp, and the N-terminal order as residues in the vicinity of the active site. Of the part of a2p, which do not interact with catL, are missing four basic residues found there, three form hydrogen in procathepsin B, which makes this proregion shorter by bonds, mostly to the backbone atoms of catL. Yet the 34 residues. The key interactions between the proregion temperature factors indicate that their mobility is somewhat and the protease observed in procathepsin B are also higher than the rest of the structure. present in procathepsin L. The PBL plays an important role in both enzymes and provides an extensive auxiliary Discussion for the proregion. Both proregions utilize an aromatic side chain, positioned in the center of this loop, Although cathepsins L and B share ~25% amino acid to maximize contacts with the enzyme (Figure 6). While sequence identity within their mature portions, there is helix a2p and strand flp overlap well between the two little homology in their proregion sequences. Yet the proenzymes, helix a3p differs in orientation by ~30. This recently elucidated structure of procathepsin B (rat, Cygler difference is in part caused by of the presence of the et al., 1996; human, Turk et al., 1996) shows a surprising occluding loop, a feature characteristic only of the cathep- overall structural similarity of its proregion to the proregion sin B subfamily.

5496 Structure of human procathepsin L

Alts

.Afl ;,A -15 I;

A7,

_;-/v S *' ;

C(i cj()GM1b.

Fig. 5. Interactions of prosegment with catL within the substrate binding cleft. The prosegment is shown as shaded lines. Hydrogen bonds are shown as dashed lines.

A

Procathepsin L Procathepsin B

B .q # + + --* .-*-46-. * + .+ .** *..++ L .. ..KAmtXRLYGm aIF... EDMSD.ITLN

DJ^CFtS_zcD2.tII:;Y:_N B -.,* ..*..4

6iib.f. ( vsz.d|.tl .kI' pr - . i t i I: LIt "i 2T;1 01.kf I~ts~~I;-kll~~~~~~~~~~~~~~I12I'trIvIalo 5t B-- i'! i'i i i i I in I 1; B.... .f-o+ -s, -. )thitil L -I 'I 'hi r - "d.l'iI' i: > I I L! II.'I_,T' t d1-IiI';it..1n1.tt.1ti ..... 5497 R.Coulombe et al.

*frA-

a,gg

Fig. 7. Superposition of the substrate binding site region of the actinidin-E-64 complex and procathepsin L. Procathepsin L is shown as shaded lines; the proregion and the E-64 inhibitor are shown as thick lines.

Mode of activity inhibition cathepsin B showed that the only significant difference is The inhibition of the enzymatic activity by binding in the in the conformation of the large occluding loop. No substrate binding cleft in a reverse, nonproductive direction rearrangement of the N-terminus was noticed (Cygler is common to cathepsins B and L and is most probably a et al., 1996). The occluding loop is absent in cathepsin L feature of all papain-like cysteine proteases. The utilization and thus no large change is anticipated. Second, the trace of the reverse binding mode has also been found recently of the N-terminal segment of catL in the current structure in the metalloprotease stromelysin (Becker et al., 1995; overlaps very well with those of other cysteine proteases. although the mechanism of inhibition differs somewhat Similarly, superposition of the Ca tracing of catL and, for because of the presence of Zn>2 in the active site and its example, actinidin (Figure 2) shows that there are no coordination by the prosegment), suggesting that this is a significant differences along the proregion binding path. general mechanism for utilizing a peptide to provide a Such differences occur in loops remote from the proregion. good fit to the substrate binding site and, at the same The conservation of areas that are in contact with the time, immunity to proteolysis. The binding ofthe proregion proregion in procathepsin L at the same time supports the in the substrate binding cleft can be compared with that notion that the proregion binding mode is well conserved of a potent but nonspecific natural inhibitor of cysteine throughout the whole family. proteases, E-64, which is also known to bind in a reverse The short segment encompassing residues 175-179 is direction. While there is no structure of the catL-E-64 disordered in the crystal. This segment, which in the wild- complex, we can compare procathepsin L with the type protein has the Thr-Glu-Ser-Asp-Asn sequence, actinidin-E-64 complex. There are only small differences contains in the protein studied here two mutations and between catL and actinidin in this region. The superposi- has the sequence Thr-Gly-Ser-Gly-Asn. The introduction tion of the two structures is shown in Figure 7. It is clear of two glycines in this surface turn between two 5-strands that E-64 mimics extremely well the binding of the is the likely cause of the observed flexibility, and it is proregion. The covalent bond formation to the active-site expected that in the wild-type cathepsin L this turn is cysteine allows the N-terminal end of E-64 to insert deeper well ordered. than the prosegment into the cleft and to position the carbonyl oxygen more deeply in the oxyanion hole. Conservation of amino acids in the prosegment Leucines occupy nearly identical positions in the S2 The aligned sequences for the proregions of 39 subsite, but utilize different rotamers, possibly because are presented in Figure 8. Even though the overall level of local differences between the enzymes. Finally, the of sequence identity is lower than that observed for the hydrogen bonds to Gly68, also shown to be important for mature enzyme region, there are clear similarities between substrate hydrolysis (Berti et al., 1991), are identical in the sequences. The central region, comprising residues both structures. 21p-77p, displays a higher level of homology (32%) and contains the previously reported ERFNIN (Glu27p-X3- Cathepsin L structure Arg-X3-Phe-X2-Asn-X3-Ile-X3-Asn46p) and GNFD Although we have determined the structure of the pro- (Gly59p-XI-Asn-XI-Phe-XI-Asp65p) conserved motifs enzyme, we can speculate that the catL portion represents (Ishidoh et al., 1987; Karrer et al., 1993; Vernet et al., very well mature cathepsin L and that activation of the 1995). The ERFNIN motif, containing conserved residues proenzyme does not lead to significant conformational separated by three amino acids, was predicted to be part changes. There are two lines of evidence to support this of an oc-helix (Karrer et al., 1993), and this is confirmed view. First, comparison of procathepsin B with mature by the crystal structure (a2p). The present, more extensive 5498 Structure of human procathepsin L

lp lOp 20p 30p 40p 50p 60p 70p 8Op 9Op 96p CATL_HUMAN TLTFDHSLEAQWTKWKAMHN--RLYG-MNEEGWRRAVWEKNMKMIELHNQEYREGKHSFTMAMNAFGDMTSEEFRQ-VMNGFQNRKPRKGKVFQEPLFYE CATL_MOUSE TPKFDQTFSAEWHQWKSTHR--RLYG-TNEEEWRRAIWEKNMRMIQLHNGEYSNGQHGFSMEMNAFGDMTNEEFRQ-VVNGYRHQKHKKGRLFQEPLMLK CATL_RAT TPKFDQTFNAQWHQWKSTHR--RLYG-TNEEEWRRAVWEKNMRMIQLHNGEYSNGKHGFTMEMNAFGDMTNEEFRQ-IVNGYRHQKHKKGRLFQEPLMLQ CATS HUMAN QLHKDPTLDHHWHLWKKTYG--KQYKEKNEEAVRRLIWEKNLKFVMLHNLEHSMGMHSYDLGMNHLGDMTSEEVMS-LMSSLRVPSQWQRNITYKSNPNRI CATS_RAT ERPTLDHHWDLWKKTRM--RRNTDQNEEDVRRLIWEKNLKFIMLHNLEHSMGMHSYSVGMNHMGDMTPEEVIG-YMGSLRIPRPWNRSGTLKSSSNQT CATK_H'JMAN LYPEEILDTHWELWKKTHR--KQYNNKVDEISRRLIWEKNLKYISIHNLEASLGVHTYELAMNHLGDMTSEEVVQ-KMTGLKVPLSHSRSNDTLYIPEWEGR CATK_RABIT LHPEEILDTQWELWKKTYS--KQYNSKVDEISRRLIWEKNLKHISIHNLEASLGVHTYELAMNHLGDMTSEEVVQ-KMTGLKVPPSRSHSNDTLYIPDWEGR CATH_HUMAN ELSVNSLEKFHFKSWMSKHR--KTYSTE-EYHHRLQTFASNWRKINAHN-NGN-- -HTFKMALNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGP CATH_RAT ELTVNAIEKFHFTSWMKQHQ--KTYSSR-EYSHRLQVFANNWRKIQAHN-QRN---HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGP CYS1_HOMAM NPSWEEFKGKFG--RKYVDLEEERYRLNVFLDNLQYIEEFNKKYERGEVTYNLAINQFSDMTNEKFNA-VMKGYKKGPR--PAAVFTSTDA CYS2-HOMAM SPSWEHFKGKYG--RQYVDAEEDSYRRVIFEQNQKYIEEFNKKYENGEVTFNLAMNKFGDMTLEEFNA-VMKG-NIPRRSAPVSVFYPKKET CYS3-HOMAM SPSWDHFKTQYG--RKYGDAKEELYRQRVFQQNEQLIEDFNKKFENGEVTFKVAMNQFGDMTNEEFNA-VMKGYKKGSRGEPKAVFTAEA PAPA_CARPA DLTSTERLIQLFESWMLKHN--KIYKNIDEKIYRFEIFKDNLKYIDETN-KKN-- -NSYWLGLNVFADMSNDEFKE-KYTGSIAGNYTTTELSY-EEVLNDGDVN PAP3_CARPA DLTSTERLIQLFNSWMLNHN--KFYENVDEKLYRFEIFKDNLNYIDETN-KKN-- -NSYWLGLNEFADLSNDEFNE-KYVGSLIDA--TIEQSYDEEFINEDTVN PAP4_CARPA DLTSTERLIQLFNSWMLKHN--KNYKNVDEKLYRFEIFKDNLKYIDERN-KMI ---NGYWLGLNEFSDLSNDEFKE-KYVGSLPED--YTNQPYDEEFVNEDIVD ACTN_ACTH TQRTNDEVKAMYESWLIKYG--KSYNSLGEWERRFEIFKETLRFIDEHNADTN--- RSYKVGLNQFADLTDEEFRS-TYLGFTSGSNKTKVSNRYEPRFGQV ALEU_HORVU GALGRTRHALRFARFAVRYG--KSYESAAEVRRRFRIFSESLEEVRSTNRK ---- GLPYRLGINRFSDMSWEEFQA-TRLGAAQTCSATLAGNHLMRDAAA ORYA_ORYSA GERSEEEARRLYAEWKAEHG--KSYNAVGEEERRYAAFRDNLRYIDEHNAAADAGVHSFRLGLNRFADLTNEEYRD-TYLGLRNKPRRERKVSDRYLAADNEA ORYB_ORYSA EGPTEAEARAAYDLWLAENGGGSPNALGGEHERRFLVFWDNLKFVDAHNARADEGGG-FRLGMNRFADLTNEEFRA-TFLGAKVAERSRAAGERYRHDGVEE ORYC_ORYSA AALGRTRGALRFARFAVRHG--KRYGDAAEVQRRFRIFSESLELVRSTNRR ---- GLPYRLGINRFADMSWEEFQA-SRLGAAQNCSATLAGNHRMRDAPA P34_SOYBN KFTTQKQVSSLFQLWKSEHG--RVYHNHEEEAKRLEIFKNNSNYIRDMNANRK-SPHSHRLGLNKFADITPQEFSKKYLQAPKDVSQQIKMANKKMKKEQYSCDH CYS1_HORVU DLESEEALWDLYERWQSAH--- RVRRHHAEKHRRFGTFKSNAHFIHSHNKRGD--- HPYRLHLNRFGDMDQAEFRA-TFVGDLRRDTPAKPPSVPGFMYAALNVSD CYS2_HORVU DLESEEALWDLYERWQSAH--- RVRRHHAEKHRRFGTFKSNAHFIHSHNKRGD--- HPYRLHLNRFGDMDQAEFRA-TFVGDLRRDTPSKPPSVPGFMYAALNVSD CYSP_PEA EEDHLLNAEHHFTSFKSKFS--KSYATKEEHDYRFGVFKSNLIKAKLHQNR ---- DPTAEHGITKFSDLTASEFRR-QFLGLKKRLRLPAHAQKAPILPTTN CYSP_VIGMU DLESEESLWDLYERWRSHH--- TVSRSLGEKHKRFNVFKANVMHVHNTN-KMD--- KPYKLKLNKFADMTNHEFRS-TYAGSKVNHHKMFRGSQHGSGTFMYEKVGS CYS1_DICDI SRGIPPEEQSQFLEFQDKFN--KKYS-HEEYLERFEIFKSNLGKIEELNLIAINHKADTKFGVNKFADLSSDEFKN-YYLN-NKEAIFTDDLPVADYLDDEFINS CYS2_DICDI RRFSESQYRTAFTEWTLKFN--RQYSSS-EFSNRYSIFKSNMDYVDNWNSKGD---SQTVLGLNNFADITNEEYRK-TYLGTRVNAHSYNGYDGREVLNVEDLQT CYS2_LEIPI IHVGTPAAALFEEFKRTYG--RAYETLAEEQQRLANFERNLELMREHQAR--NPHAQF--GITKFFDLSEAEFAARYLNGAAYFAAAKRHAAQHYRKARADLSA LCPA_LEIME PPVDNFVASAHYGSFKKRHG--KAFGGDAEEGHRFNAFKQNMQTAYFLNTQN--PHAHY-DVSGKFADLTPQEFAKLYLNP-DYYARHLKNHKEDVHVDDS CYSP_TRYBB SLHVEESLEMRFAAFKKKYG--KVYKDAKEEAFRFRAFEENMEQAKIQAAA--NPYATF--GVTPFSDMTREEFRARYRNGASYFAAAQKRLRKTVNVTTGR CYSP_TRYCR SLHAEETLTSQFAEFKQKHG--RVYESAAEEAFRLSVFRENLFLARLHAAA--NPHATF--GVTPFSDLTREEFRSRYHNGAAHFAAAQERARVPVKVEVVG CYSP_PLAFA DPINNIKYASKFFKFMKEHN--KVYKNIDEQMRKFEIFKINYISIKNHNKLN--KNAMYKKKVNQFSDYSEEELKE-YFKTLLHVPNHMIEKYSKPFENHLKDNILIS CYSP_PLAVI GLFVNLKYASKFFNFMNKYK--RSYKDINEQMEKYKNFKMNYLKIKKHNET----NQMYKMKVNQFSDYSKKDFES-YFRKLVPIPDHLKKKYVVPFSSMNNGKGKNV CYSP_PLAVN SMQDNIKYASKFFKYMKENN--KKYENMDEQLQRFENFKIRYMKTQKHNEMVGKNGLTYVQKVNQYSDFSKEEFDN-YFKKLLSVPMDLKSKYIVPLKKHLANTNLIS CYSP_THEAN DAESELDMLIEFDAFVEKYK--KVHRSFDQRVQRFLTFRKNYHIVKTHKPT-----EPYSLDLNKFSDLSDEEFKA-LYPVITPPKTYTSLSKHLEFKKMSHKNPIYI CYSP_THEPA DPKLEYEVYREFEEFNSKYN--RRHATQQERLNRLVTFRSNYLEVKEQKGD-----EPYVKGINRFSDLTEREFYK-LFPVMKPPKATYSNGYYLLS-- -HMANKTYL CATV_NPVAC YDLLKAPNYFEEFVHRFN--KDYGSEVEKLRRFKIFQHNLNEIINKNQN--DS-AKY--EINKFSDLSKDETIA-KYTGLSLPIQTQNFCKVIVLDQPPGK CATV_NPVBM YDPLKAPNYFEEFVHRFN--KNYSSEVEKLRRFKIFQHNLNEIINKNQN--DS-AKY--EINKFSDLSKDETIA-KYTGLSLPTQTQNFCKVILLDQPPGK CATV_NPVCF YDVLKAPNYFEDFLHKFN--KSYSSESEKLRRFQIFRHNLEEIINKNHN--DSTAQY--_-INKFADLSKDETIS-KYTGLSLPLQTQNFCEVVVLDRPPDK

lp lOp 2Op 30p 40p 50p 60p 70p 8Op 90p 96p

Fig. 8. Sequence alignment of the proregions of cathepsin L-like cysteine proteases. Numbering is based on the cathepsin L proregion sequence. The proteases are identified by their ID name in the Swiss-Prot database. Many sequences contain additional N-terminal residues not shown in the alignment. CYSP_PLAFA, CYSP_PLAVI, CYSP_PLAVN, CYSP_THEAN and CYSP_THEPA also contain additional C-terminal residues. Sequence homologies are based on the mutation data matrix and reflect the similarity of amino acid functions in their interactions with other amino acids in a protein (Schwartz and Dayhoff, 1979; George et al., 1990). The amino acids are arranged in the following groups: aromatic (Phe, Tyr, Trp), hydrophobic (Val, Leu, Met, Ile), basic (Lys, Arg, His), acidic (Asp, Glu), amide (Asn, Gln), small (Gly, Ala, Pro, Thr, Ser) and Cys. Bold residues in sequences represent ¢80% homology (by group). Positions showing >95% homology are marked with an asterisk. A gap was treated as a residue for determining the percentage homology. alignment also shows that Gly in the GNFD motif is not Reasons for the conservation of 64p (Gly/Ala/Ser) and conserved (only two-thirds of the sequences contain Gly 67p (Thr/Ser), on the other hand, are not obvious from or Ala at position 59p). In addition, Asp65p, which is the structure. It is clear that only some and not all of the present in all sequences in Figure 8, is not structurally salt bridges observed in the proregion of procathepsin L conserved between procathepsins L and B. This is in are conserved throughout the whole subfamily. agreement with the previously reported alignments con- This alignment shows that the proregion residues taining the cathepsin B proregion (Ishidoh et al., 1987; occupying the substrate binding cleft vary significantly Vemet et al., 1995). The C-terminal region of the propep- with regard to size and hydrophobicity. Even the Gly in tides varies greatly in the different species. This lack of position 77p, which from the present structure seems to sequence conservation reflects the rather loose contacts be crucial for the prosegment to reach the bottom of the between the proregion and the enzyme in this region and cleft, may not be essential, because some sequences have the lack of structural constraints imposed on this segment residues with small side chains in this position. On the of the proregion. other hand, the aromatic character of residue 7 lp in subsite Residues that are highly conserved (Figure 8) play S ' is largely maintained, being substituted in only a few important roles in maintaining the globular fold of the sequences by isoleucine or valine. N-terminal domain of the proregion, and most of them have been mentioned previously. To summarize (Figure Functional roles of prosegment sections 3), aromatic residues 12p, 15p, l9p, 23p and 35p and Data that indicate the importance of various structural aliphatic 60p form the hydrophobic core between helices features for the inhibitory properties of the prosegment axlp and a2p; charged residues 21p (Arg/Lys), 27p (Glu), come mainly from studies of cathepsin L inhibition by its 31p (Arg), 65p (Asp) and 70p (Glu) form salt bridges; propeptide (Carmona et al., 1996). Even though there are residues 38p (Asn), 42p (Ile/Val) and 46p (Asn) are on no structural data for a noncovalent cathepsin L-propeptide one side of helix a2p and face the strand 3lp; residue complex, it can be reasonably assumed that the binding 56p (aromatic) provides contact with PBL; residues 61p mode and the nature of the interactions are the same as (Asn), 63p (Phe), 66p (Met/Leu), 71p (hydrophobic) and observed in the proenzyme. Four variants of the cathepsin 75p (hydrophobic) are within the loop leading to helix L propeptide have been produced and their inhibitory a3p and face the catL forming a hydrophobic minicore. activity measured (Figure 9). Peptide phcl- 1, corresponding

5499 R.Coulombe et al.

to the mature form. The pH dependency of processing for both procathepsins L and B (Smith and Gottesman, 1989; Rowan et al., 1992) is similar to that observed for inhibition of the mature enzymes by their propeptides (Fox et al., 1992; Carmona et al., 1996) and is probably governed by the same mechanism. There are a number of ionizable groups on the proregion and on the enzyme that could modulate the proregion-enzyme interactions. To positively identify such residues will require more studies using proenzyme and/or propeptide variants. Data obtained !'roCa1hCpsifl 1. with the cathepsin L propeptides indicate that more than one residue of the prosegment is involved in this process M,I and that they are located within positions 21p-8Ip because inhibition of cathepsin L by phcl-2 (lp-81p) and phcl-3 (21p-95p) showed a pH dependence similar to that of phcl-l (4p-90p) and of proenzyme activation (Carmona et al., 1996). From the crystal structure of procathepsin L and in conjunction with previous data on papain (Vernet Ki-0.088 IIM 'Ki=t).()Ni et al., 1995) two residues likely to participate in this process are Asp65p and Glu70p. These two acidic residues participate in well-conserved salt bridges (Figure 8 and Table I). In a previous study on the processing of propapain

" irr--(:}1 )3i (Vernet et al., 1995), random mutagenesis .) experiments (Ilp-901 I)l- V revealed that residue Asp65p is important for proper folding of the proenzyme, confirming that this residue contributes to the structural integrity of the proregion. In ..L-,t'I the same study it was found that the replacement of ".` .-t:.JtI Phe63p by histidine could trigger processing when this 4 residue was in the protonated state. Even though direct ml--4 involvement of Asp65p in pH-dependent processing could not be these results indicate that alteration of K-=1'1."-) nNI K --2900: nNtT)(?1 observed, I K -~JO() ~ the charge state in the GNFD motif of propapain, which encompasses Asp65p, can trigger processing and that the GNFD motif could participate in the pH regulation of this process (Vernet et al., 1995). It must be noted that the p)cII" :3 (2! p -- 35. -) 2 j) e73--,- interactions involving residues Asp65p and Glu7Op are iJIIt.I- () absent in procathepsin B. Therefore, one must consider Fig. 9. Cartoon representation of the synthetic peptiderscorresponding that either the cathepsin B proregion, which in this respect to fragments of the prosegment and their inhibition co]nstant, Ki, is distinct from that of cathepsin L-like enzymes, uses a towards the catL. different mechanism to mediate the pH regulation of processing/inhibition, as suggested previously (Carmona to residues 4p-90p of the proregion of procathepsin L, is et al., 1996), or a different part on the prosegment provides a potent inhibitor of the mature enzyme, with a Ki of the pH sensing. 0.088 nM at pH 5.5. From the use of truncated propeptides Finally, because the propeptides of cathepsins L and B it was found that the C-terminal region cointributes little have been shown to inhibit selectively their parent to overall inhibition because the peptide Iphcl-2, which enzymes, propeptides must also contain features ensuring lacks 15 C-terminal residues, is still a good irnhibitor (Ki = that inhibition is highly selective for the protease from 0.66 nM). This is corroborated by the st:ructural data, which they originate. Because of the large sequence which show that this part of the prosegment has higher differences in the proregions of cathepsins B and L, it mobility and makes fewer interactions with c:atL. Removal was not surprising to find that the propeptide of cathepsin of the first 20 residues in the propeptide (pceptide phcl-3) B does not significantly inhibit members of the cathepsin causes a 130-fold increase in Ki, indicating tthat helix alp L subfamily (i.e. papain) and is selective for cathepsin B contributes to overall inhibition. However, p?eptide phcl-3 (Fox et al., 1992). However, inhibition by the cathepsin is still a good inhibitor of cathepsin L (Ki = 11.5 nM), L propeptide was found to be selective for cathepsin L and the presence of helix alp is not ess(ential for the over other enzymes of this subfamily, including papain inhibitory activity. The removal of both cDclp and a2p and a closely related . The molecular basis for helices, however, causes a dramatic increaise in Ki (33 the selectivity of propeptides for their corresponding 000-fold). The 3-D structure shows that thease a-helices, cysteine proteases remains unclear. but in particular a2p, stabilize the conforrnation of the extended ,Blp peptide stretch and the a3p helix which Other functions of the prosegment provide extensive interactions with catL. The prosegment appears to play a dual role in the One of the functions of the proregion is t4 oregulate the intracellular sorting of cathepsin L. The classic targeting pH-dependent activation or processing of thie proenzyme pathway for the routing of lysosomal enzymes operates 5500 Structure of human procathepsin L by way of the mannose-6-phosphate receptor. Proteins Invitrogen. The clarified supernatant from the culture medium was for lysosome are identified by modification of concentrated using an Amicon spiral concentrator with an SIY10 destined membrane followed by a stirred cell with an Amicon PMI0 membrane. the mannose residues on their asparagine-linked oligo- The concentrate was dialyzed overnight against 50 mM sodium acetate, saccharides through the combined actions of a specific pH 5.0, 0.02% sodium azide and 75 mM NaCl. The solution was applied UDP-N-acetylglucosamine- l-phosphotransferase and a to an SP-Sepharose column and the fractions containing procathepsin L phosphodiesterase. The recognition signal, whereby only were collected and pooled. For comparative studies we also prepared a second construct with the lysosomal proteins are so modified, is still not understood. wild-type sequence containing only a single mutation (ThrllOAla) to In cathepsin L, one such carbohydrate unit is present at abolish glycosylation. This protein was purified in a similar way to the residue 108 in the mature protein; however, the work of inactive mutant, with the exception that it was reversibly inhibited by Cuozzo et al. (1995) demonstrated that the proregion HgCl2 or methyl methanethiosulfonate prior to loading onto the SP- is required for mannose phosphorylation. Evidence is Sepharose column at pH 5.0. accumulating that basic residues are a determining factor Crystallization and structure solution in the recognition motif, and an alanine scan of all lysine The Cys25Ser mutant protein has been crystallized (Coulombe et al., residues in mouse procathepsin L pointed to Lys37p and 1996) from 1.4 M (Na,K)PO4 buffer, pH 7.8. The crystals belong to the Lys82p as critical residues. On the proenzyme structure orthorhombic space group P212121 with cell dimensions a = 40.1, b = 88.1, c = 94.9 A and contain one molecule in the asymmetric unit. A these two lysines are on the opposite face of the enzyme native data set to 2.2 A resolution was collected at the station BL6A2, to that occupied by the carbohydrate, and it is not obvious Photon Factory synchrotron facility (Tsukuba, Japan), using a how they contribute to the recognition element. Weissenberg camera (Sakabe, 1991) and a wavelength (X) of 1.00 A. In addition to the mannose-6-phosphate targeting sys- The frames were processed using the program WEIS (Higashi, 1989) tem, the presence of specific receptors for some lysosomal and yielded a total of 46 120 observations that merged to 13 872 unique reflections with an Rmerge value of 0.069. This data set was 87% complete proteinases, which bind the proenzymes at low pH and to 2.5 A resolution and 78% complete to 2.2 A resolution (51% in the may play a role in the processing reaction, has been 2.3-2.2 A resolution shell). Mature cathepsin L showed -41% amino proposed. McIntyre et al. (1994) have shown that basic acid sequence identity to papain, actinidin and caricain, whose 3-D peptides representing residues Lys 1 6p-Gly24p of the structures are known (PDB codes 9pap, laec and Ippo, respectively). Initial molecular replacement calculations, carried out using the program cathepsin L propeptide inhibit receptor binding. This AMoRe (Navaza, 1994), indicated that actinidin is the best model for region forms an elbow between helices oclp and a2p and cathepsin L. A sliding window of 10 residues was removed from this is well exposed to the solvent. Binding to such a receptor model and the impact on the peak:noise ratio of the rotation function may also stimulate procathepsin L processing, as mimicked was followed. It was found that by removing the regions that differ in vitro by the enhancement of activation seen in the significantly between actinidin, papain and caricain, the signal:noise ratio in the rotation and translation functions increased. The model used presence of polyanions (Mason and Massey, 1992; Ishidoh for calculations contained 78% of the atoms of cathepsin L but only and Kominami, 1995). 62% of the procathepsin L atoms. The best molecular replacement solution was significantly higher than the noise, and had a correlation Conclusions coefficient of 0.391 and an R-factor of 0.461 for the 10.0-3.5 A resolution shell. Despite these encouraging indicators, the electron density in maps The papain superfamily now contains > 100 known calculated from this model was scattered and disconnected beyond the members. Within each of the two subfamilies, cathepsin model itself; no interpretation of the proregion was possible. Density L- and cathepsin B-like enzymes, sequence homology also modification [solvent flattening and histogram matching, programs dm extends to the proregion. Now that the 3-D structure of a (Cowtan, 1994) and SQUASH (Zang, 1993)], automatic interpretation of unassigned density (program ARP; Lamzin and Wilson, 1993), representative proenzyme from each subfamily has been refinement of the partial model and various other attempts failed to determined, one can generalize these structural findings improve the interpretability of the electron density map beyond the and predict that the observed fold and path of the proregion cathepsin L portion, possibly because of the low solvent content of along the enzyme's surface will be similar in all enzymes only 48%. of the papain superfamily. The PBL will provide an While the work on the Cys25Ser mutant was in progress, we succeeded in crystallizing the second, enzymatically competent, recombinant variant auxiliary binding site for the prosegment, and the inter- of human procathepsin L. This protein was concentrated after purification actions within the substrate binding cleft will be similar. to 10 mg/ml. Before setting up crystallization trials the reversible Significant differences will most probably occur at the N- inhibitor was removed from the enzyme by adding 2 mM dithiothreitol. or C-terminal parts of the prosegment. A future task is Crystals were obtained by the hanging drop method under conditions different from those used for the previously described mutant. The to identify the specificity determinants that allow the reservoir solution contained 0.7 M (Na,K)PO4, pH 7.8. The drop was propeptides to differentiate between closely related prepared by mixing 5 ,ul protein solution with 5 gl reservoir solution. members of the papain superfamily. The crystals grew to a maximum size of 0.3 x0.3 x0.4 mm3 in -2 weeks. They belong to the trigonal system, space group P3221, a = 103.9 and c = 86.8 A, and diffract to only 3.0 A resolution on a rotating anode Materials and methods generator at room temperature. These crystals are more loosely packed than the previously described form (Vm = 3.77 A3/Da) and contain 67% Protein expression and purification solvent. The diffraction data were collected on a San Diego MultiWire A human cathepsin L cDNA containing the proregion was inserted into area detector. A total of 45 966 observations were recorded, which the pPIC9 vector (Invitrogen) and expressed in the methylotrophic yeast merged into 19 232 unique reflections with Rmer,e = 0.078. This data Pichia pastoris as described previously (Coulombe et al., 1996). The set was 91.5% complete to 3.0 A resolution. Molecular replacement wild-type enzyme readily undergoes self-processing at pH values <5.0. using the above-mentioned truncated model of actinidin led to a correctly To increase the stability of the proenzyme and to extend the pH range oriented and positioned model and was followed by density modification for the crystallization trials, we constructed an inactive Cys25Ser mutant (dm program; Cowtan, 1994). The 3.0 A electron density map was of that was incapable of self-processing. At the same time, the glycosylation much better quality than for the previous crystal form, and a number of site AsnAspThrllO was modified to AsnAspAlallO to avoid hetero- helices could be recognized in the proregion. The model was extended geneity resulting from glycosylation by the yeast system. As we in several stages, each time applying solvent flattening, which additionally discovered later, three additional mutations, Phe78pLeu, Glu176Gly and improved the maps. When nearly all of the residues had been located, Aspl78Gly, were unknowingly introduced into the cDNA. Recombinant the model was transferred to the orthorhombic crystal form for refinement procathepsin L was produced following the protocol recommended by against the higher resolution data. The R-factor for 8.0-2.2 A resolution 5501 R.Coulombe et al. data was 0.271 (Rfre, = 0.296 for 10% of randomly chosen reflections). Coulombe,R., Li,Y., Takebe,S., Menard,R., Mason,P., Mort,J.S. and Several cycles of refinement, which at this stage consisted of only Cygler,M. (1996) Crystallization and preliminary X-ray diffraction minimization and B-factor optimization, followed by refitting on a studies of human procathepsin L. Proteins: Struct. Funct. Genet., 25, graphics workstation using the program 0 (Jones et al., 1991), resulted 398-400. in a model that contained nearly all residues of the protein: residues Sp- Cowtan,K.D. (1994) Joint CCP4 ESF-EACBM Newslett. Protein Cryst., 96p of the prosegment (marked with the suffix p), residues 1-174 and 31, 34-38. 180-220 of the mature cathepsin L and 71 solvent molecules. The Cuozzo,J.W., Tao,K., Wu,Q., Young,W. and Sahagian,G.G. (1995) attempts to speed up the refinement by using a molecular dynamics Lysine-based structure in the proregion of procathepsin L is the protocol were abandoned in this case because they often resulted in an recognition site for mannose phosphorylation. J. Biol. Chem., 270, increase in Rf,e associated with a decrease in the standard R-factor. The 15611-15619. final refinement statistics were R-factor = 0.186, Rf,, = 0.244 for Cygler,M., Sivaraman,J., Grochulski,P., Coulombe,R., Storer,A.C. and reflections with I > c((I) and within 8.0-2.2 A resolution and deviations Mort.J.S. (1996) Structure of rat procathepsin B. Model for inhibition of bonds and bond angles of 0.007 A and 1.20, respectively. Because of cysteine protease activity by the proregion. Structure, 4, 405-416. the orthorhombic crystals diffract to a much higher resolution we based Drenth,J., Jansonius,J.N., Koekoek,R., Swen,H.M. and Wolthers,B.G. our description on this crystal form. The coordinates have been deposited (1968) Structure of papain. Nature, 218, 929-933. in Brookhaven Data Bank with entry code 1CJL. Fox,T., de Miguel,E., Mort,J.S. and Storer,A.C. (1992) Potent slow- binding inhibition of cathepsin B by its propeptide. Biochemistry, 31, Prosegment sequence alignment 12571-12576. An initial sequence alignment was produced using the program Gallagher,T., Gilliland,G., Wang,L. and Bryan,P. (1995) The prosegment- CLUSTALW (Thompson et al., 1994) with the following parameters: subtilisin BNP' complex: crystal structure of a specific 'foldase'. gap opening penalty 10.00, gap extension penalty 0.05, delay divergent Strulcture, 3, 907-914. sequences 40%, protein weight matrix: BLOSUM series. The alignment George,D.G., Barker,W.C. and Hunt,L.T. (1990) Methods Enzymol., 183, was satisfactory for the major part of the sequences, where the highly 333-351. conserved motifs between residues 12p and 70p are present. Further Gour-Salin,B.J., Lachance,P., Magny,M.C., Plouffe,C., Menard,R. and adjustment of the alignment was performed manually based on the Storer,A.C. (1994) E64 analogues as inhibitors of cysteine proteinases: structural data for procathepsin L. The alignment in the region 47p-58p, investigation of S2 subsite interactions. Biochem. J., 299, 389-392. which corresponds to the loop between helix a2p and strand Plp in Heinemann,U., Pal,G.P., Hilgenfeld,R. and Saenger,W. (1982) Crystal procathepsin L, is less well defined. Within this region, gaps were and molecular structure of the sulfhydryl protease calotropin Dl at introduced into the sequences for CYS2_LEIPI, LCPA_LEIME, 3.2 A resolution. J. Mol. Biol., 161, 591-606. P_TRYBB, CYSP_TRYCR, CATV_NPVAC, CATV_NPVBM and HigashiT. (1989) The processing of diffraction data taken on a screenless CATV_NPVCF to place either Tyr or Phe at position 56p. A one-residue Weissenberg camera for macromolecular crystallography. J. Appl. gap was introduced between residues 73p and 74p so as to have a small Crystallogr., 22, 9-18. residue (Gly, Ser, Ala or Pro) at position 77p for CATH_HUMAN, Huber,R. and Bode,W. (1978) Structural basis of the activation and CATH_RAT, P34_SOYBN, CYS2_LEIPI, LCPA_LEIME, CYSP_ action of trypsin. Acc. Chem. Res., 11, 114-122. TRYBB and CYSP_TRYCR. This residue is positioned in the active Ishidoh,K. and Kominami,E. (1995) Procathepsin L degrades site of procathepsin L, and the crystal structure indicates that only very proteins in the presence of glycosaminoglycans small residues can be accommodated at this position. The alignment in vitro. Biochem. Biophvs. Res. Communi., 217, 624-631. program did not perform well in the C-terminal region (i.e. residues Ishidoh,K., TowatariT., Imajoh,S., Kawasaki,H., Kominami,E., >77p) because of the very poor homology in this portion of the sequence. Katunuma,N. and Suzuki,K. (1987) Molecular cloning and sequencing Therefore, this region is presented without any alignment except for the of cDNA for rat cathepsin L. FEBS Lett., 223, 69-73. introduction of a few gaps to align obviously similar regions for James,M.N.G. and Sielecki,A.R. (1986) Molecular structure of an aspartic sequences originating from closely related sources (e.g. papain, caricain proteinase zymogen, porcine pepsinogen, at 1.8 A resolution. Nature, and glycyl endopeptidase). 319, 33-38. Jones,T.A., Zou,J.Y., Cowan,S.W. and Kjeldgaard,M. (1991) Improved methods for building protein models in electron density maps and the Acknowledgements location of errors in these models. Acta Crystallogr, A47, 110-119. The authors thank Ms Sachiko Takebe and Patrizia Mason for help in Kakegawa,H. et al. (1993) Participation of cathepsin L on bone resorption. purification of the mutant procathepsin L. This is NRCC publication FEBS Lett., 321, 247-250. No. 39933. Karrer,K.M., Peiffer,S.L. and Ditomas,M.E. (1993) Two distinct subfamilies within the family of cysteine protease . Proc. Natl Acad. Sci. USA, 90, 3063-3067. References Kirschke,H. and Barrett,A.J. (1987) Chemistry of lysosomal proteases. Baker,E.N. (1977) Structure of actinidin. Details of the polypeptide In Glaumann,H. and Ballard,F.J. (eds), : their Role in chain conformation and active site from an electron density map at Protein Breakdown. Academic Press, London, UK, pp. 193-238. 2.8 angstroms resolution. J. Mol. Biol., 115, 263. Kirschke,H., Wiederanders,B., Bromme,D. and Rinne,A. (1989) Baker,E.N. and Drenth,J. (1987) The thiol proteases: structure and Cathepsin S from bovine spleen. Purification, distribution, intracellular mechanism. In Jurnak,F.A. and McPherson,A. (eds), Biological localization and action on proteins. Biochem. J., 264, 467-473. Macromolecules and Assemblies. John Wiley & Sons, New York, Vol. Kraulis,P.J. (1991) Molscript: a program to produce both detailed and 3, pp. 313-368. schematic plots of protein structures. J. Appl. Crystallogr, 24,946-950. Becker,J.W. et al. (1995) Sttromelysin-l: three-dimensional structure of Lamzin,V.S. and Wilson,K.S. (1993) Automated refinement of protein the inhibited catalytic domain and of the C-truncated proenzyme. models. Acta Crvstallogr., D49, 129-147. Protein Sci., 4, 1966-1976. Mason,R.W. and Massey,S.D. (1992) Surface activation of pro-cathepsin Berti,P.J. and Storer,A.C. (1995) Alignment/phylogeny of the papain L. Biochem. Biophys. Res. Commun., 189, 1659-1666. superfamily of cysteine proteases. J. Mol. Biol., 246, 273-283. Mason,R.W., Gal,S. and Gottesman,M.M. (1987) The identification of Berti,P.J., Faerman,C.H. and Storer,A.C. (1991) of papain- the major excreted protein (MEP) from a transformed mouse fibroblast substrate interaction energies in the S2 to S2' subsites. Biochemistry, cell line as a catalytically active precursor of cathepsin L. Biochem. 30, 1394-1402. J., 248, 449-454. Brocklehurst,K., Willenbrock,F. and Salih,E. (1987) Cysteine proteinases. McGrath,M.E., Eakin,A.E., Engel,J.C., McKerrow,J.H., Craik,C.S. and In Neuberger,A. and Brocklehurst,K. (eds), Hydrolytic Enzymes. Fletterick,R.J. (1995) The crystal structure of cruzain: a therapeutic Elsevier Biomedical Press, Amsterdam, The Netherlands, pp. 39-158. target for Chagas' disease. J. Mol. Biol., 247, 251-259. Carmona,E., Dufour,E., Plouffe,C., Takebe,S., Mason,P., Mort,J.S. and McIntyre,G.F. and Erickson,A.H. (1993) The lysosomal proenzyme Menard,R. (1996) Potency and selectivity of the cathepsin L propeptide receptor that binds procathepsin L to microsomal membranes at pH 5 as an inhibitor of cysteine proteases. Biochemistry, 35, 8149-8157. is a 43-kDa integral membrane protein. Proc. Natl Acad. Sci. USA, Coll,M., Guasch,A., Aviles,F.X. and Huber,R. (1991) Three-dimensional 90, 10588-10592. structure of porcine procarboxypeptidase B: a structural basis of its Mclntyre,G.F., Godbold,G.D. and Erickson,A.H. (1994) The pH- activity. EMBO J., 10, 1-9. dependent membrane association of procathepsin L is mediated by 5502 Structure of human procathepsin L

a 9-residue sequence within the propeptide. J. Biol. Chem., 269, 567-572. Musil,D. et al. (1991) The refined 2.15 A X-ray crystal structure of human liver cathepsin B: the structural basis for its specificity. EMBO J., 10, 2321-2330. Navaza,J. (1994) AMoRe: an automated package for molecular replacement. Acta Crvstallogr., A50, 157-163. O'Hara,B.P., Hemmings,A.M., Buttle,D.J. and Pearl,L.H. (1995) Crystal structure of glycyl endopeptidase from Carica papaya: a cysteine endopeptidase of unusual substrate specificity. Biochemistry, 34, 13190-13195. Pickersgill,R.W., Rizkallah,P., Harris,G.W. and Goodenough,P.W. (1991) Determination of the structure of papaya protease omega. Acta Crystallogr, B47, 766-771. Rowan,A.D., Mason,P., Mach,L. and Mort,J.S. (1992) Rat procathepsin B. Proteolytic processing to the mature form in vitro. J. Biol. Chem., 267, 15993-15999. Sakabe,N. (1991) X-ray diffraction data collection system for modem protein crystallography with a Weissenberg camera and an imaging plate using synchrotron radiation. Nucl. Instrum. Methods Phys. Res., A303, 448-463. Schwartz,R.M. and Dayhoff,M.O. (1979) In Dayhoff,M.O. (ed.), Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, DC, Vol 5, Suppl. 3, p. 353. Smith,S.M. and Gottesman,M.M. (1989) Activity and deletion analysis of recombinant human cathepsin L expressed in Escherichia coli. J. Biol. Chem., 264, 20487-20495. Tao,K., Stearns,N.A., Dong,J., Wu,Q. and Sahagian,G.G. (1994) The proregion of cathepsin L is required for proper folding, stability, and ER exit. Arch. Biochem. Biophys., 311, 19-27. Tezuka,K., Tezuta,Y., Maejima,A., Sato,T., Nemoto,K., Kamioka,H., Hakeda,Y. and Kumegawa,M. (1994) Molecular cloning of a possible cysteine proteinase predominantly expressed in osteoclasts. J. Biol. Chem., 269, 1106-1109. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680. Trabandt,A., Aicher,W.K., Gay,R.E., Sukhatme,V.P., Nilson- Hamilton,M., Hamilton,R.T., McGhee,J.R., Fassbender,H.G. and Gay,S. (1990) Expression of the collagenolytic and Ras-induced cysteine proteinase cathepsin L and proliferation-associated oncogenes in synovial cells of MRL/I mice and patients with rheumatoid arthritis. Matrix, 10, 349-361. Turk,D., Podobnik,M., Popovic,T., Katunuma,N., Bode,W., Huber,R. and Turk,V. (1995) Crystal structure of cathepsin B inhibited with CA030 at 2.0 A resolution: A basis for the design of specific epoxysuccinyl inhibitors. Biochemistry, 34, 4791-4797. Turk,D., Podobnik,M., Kuhelj,R., Dolinar,M. and Turk,V. (1996) Crystal structures of human procathepsin B at 3.2 and 3.3A resolution reveal an interaction motif between a papain-like cysteine protease and its propeptide. FEBS Lett., 384, 211-214. Vernet,T., Berti,P.J., de Montigny,C., Musil,R., Tessier,D.C., Menard,R., Magny,M.C., Storer,A.C. and Thomas,D.Y. (1995) Processing of the papain precursor. The ionization state of a conserved amino acid motif within the Pro region participates in the regulation of intramolecular processing. J. Biol. Chem., 270, 10838-10846. Yagel,S., Warner,A.H., Nellans,H.N., Lala,P.K., Waghorne,C. and Denhardt,D.T. (1989) Suppression by cathepsin L inhibitors of the invasion of amnion membranes by murine cancer cells. Cancer Res., 49, 3553-3557. Zang,K.Y.J. (1993) SQUASH - combining constraints for macro- molecular phase refinement and extension. Acta Crystallogr, D49, 213-222. Received on Jutne 4, 1996; revised on Juily 5, 1996

5503