Proc. Nati. Acad. Sci. USA Vol. 81, pp. 3703-3707, June 1984 Biochemistry

Amino acid sequence of porcine spleen D (primary structure/aspartic /lysosomal /homology with ) JAIPRAKASH G. SHEWALE* AND JORDAN TANG*t *Laboratory of Protein Studies, Oklahoma Medical Research Foundation, and tThe Department of Biochemistry and Molecular Biology, The University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104 Communicated by Stuart Kornfeld, March 12, 1984

ABSTRACT The amino acid sequence of porcine spleen V8 and endoproteinase Lys C were obtained from Millipore, heavy chain has been determined and, hence, the Miles, and Boehringer Mannheim, respectively. Sequencer complete structure of this enzyme is now known. The sequence reagents were from Beckman and from Burdick and Jackson of heavy chain was constructed by aligning the structures of (Muskegon, MI). All other chemicals used in this study were peptides generated by cyanogen bromide, trypsin, and endo- of analytical grade. proteinase Lys C cleavages. The structure of the light chain Separation of Peptides. Peptides from CNBr cleavage, has been published previously. The cathepsin D molecule con- tryptic and endoproteinase Lys C digests were initially frac- tains 339 amino acid residues in two polypeptide chains: a 97- tionated by gel filtration on Sephadex G-50 (Pharmacia) in residue light chain and a 242-residue heavy chain, with a com- either 10% formic acid or 0.1 M ammonium bicarbonate. In- bined Mr of 36,779 (without carbohydrate). There are two car- dividual peptides were further purified using ion-exchange bohydrate units linked to asparagine residues 70 and 192. The chromatography with DEAE-cellulose (DE 52, Whatman) disulfide bond arrangement in cathepsin D is probably similar and HPLC (C8 or C18) individually or in combination. S. to that of pepsin, because the positions of six half-cystine resi- aureus protease digests were directly fractionated on HPLC. dues are conserved. The aspartyl residues, corre- The details of the isolation of peptides will be discussed else- sponding to aspartic acid-32 and -215 of pepsin, are located at where. residues 33 and 224 in the cathepsin D molecule. The amino Reaction of the seven methionines in reduced carboxy- acid sequence around these aspartyl residues is strongly con- methylated heavy chain with CNBr (70% HCOOH, 24 hr, served. Cathepsin D shows a strong homology with other acid room temperature) yielded eight fragments, CN1-CN8 (Fig. . When the sequence of cathepsin D, , and pep- 1). The dipeptide Gly-Hse [cathepsin D residue number sin are aligned, 32.7% of the residues are identical. The ho- (CDN) 301-302; pepsin residue number (PN) 289-290] was mology is observed throughout the length of the molecules, in- not isolated. Instead, an eighth fragment (CN8) was generat- dicating that three-dimensional structures of all three mole- ed because of oxidative cleavage of a Trp-Ile (CDN 312-313; cules are similar. PN 300-301) bond. Fragments CN3 and CN5 were further cleaved with the S. aureus protease (0.1 M ammonium bicar- Cathepsin D (EC 3.4.23.5), is a lysosomal bonate; 37°C; 4 or 20 hr; protease/protein ratio, 5:100), and present in all mammalian cells. The physiological function of peptides CN3 S1 (CDN 149-159; PN 142-152), CN3 S2 this enzyme is the breakdown of tissue proteins under nor- (CDN 160-173; PN 153-164), and CN5 S1 (CDN 240-253; mal and pathological conditions (1-4). The structural studies PN 232-245) were isolated. on cathepsin D were initiated in this laboratory because, in Cleavage at arginine residues was accomplished by modi- contrast to other well studied aspartic proteases, cathepsin fication of the E-amino groups of lysine residues with maleic D is an intracellular enzyme. A structure-function compari- anhydride and subsequent proteolytic digestion with trypsin son of cathepsin D with other aspartic proteases may gener- (0.1 M ammonium bicarbonate; 37°C; 2 hr; protease/protein ate some insight into the function and regulation of this ratio, 5:100). Peptides A1-A9 were isolated as described enzyme. Also, cathepsin D can serve as a model for the lyso- above. COOH-terminal peptide (CDN 333-339; PN 321-327) somal , whose synthesis and targeting have generat- was not isolated. Peptide A7 (CDN 300-321; PN 288-309) ed considerable interest recently (5, 6). For the structural was generated because of secondary cleavage at a Phe-Met studies, cathepsin D from porcine spleen was purified (CDN 299-300; PN 287-288) bond. Peptides A2 S1 (CDN to homogeneity in our laboratory (7). This enzyme com- 106-110; PN 99-103) and A2 S2 (CDN 111-134; PN 104-127) prises two chains, each of which contains an asparagine- were isolated by digestion of the peptide A2 with S. aureus linked oligosaccharide. Previously, we have published struc- protease (0.1 M ammonium bicarbonate; 37°C; 4 hr; prote- tural studies on the light chain (8) and on the oligosaccha- ase/peptide ratio, 5:100). rides (9) of cathepsin D. Peptide KC1 (CDN 287-339; PN 279-327), required for se- We report here the evidence for the structure of heavy quence overlap between three CNBr and two tryptic pep- chain and, hence, the complete amino acid sequence of the tides, was produced by endoproteinase Lys C digestion (0.1 cathepsin D. The sequence is compared with the structure of M ammonium bicarbonate; 37°C; 6 hr, protease/protein ra- pepsin (10) and renin (11, 12). The location and possible tio, 5:100) of heavy chain. functional significance of conserved residues is discussed. Other Methods. The amino acid analyses of acid hydroly- sates were performed on a Durrum (Palo Alto, CA) D500 MATERIALS AND METHODS amino acid analyzer as described by Spackman et al. (13). was with a Beck- Materials. Cathepsin D was purified from porcine spleen Automatic sequence analysis performed as described (7). Proteases: trypsin, Staphylococcus aureus Abbreviations: CDN, cathepsin D residue numbers; PN, pepsin res- idue numbers; CN1-CN8, peptides generated by cleavage with The publication costs of this article were defrayed in part by page charge CNBr; A1-A9, tryptic (arginine) cleavage peptides; KC1, endopro- payment. This article must therefore be hereby marked "advertisement" teinase Lys C cleavage peptide; S1 and S2, peptides generated by in accordance with 18 U.S.C. §1734 solely to indicate this fact. cleavage with Staphylococcus aureus protease.

3703 3704 Biochemistry: Shewale and Tang Proc. NatL Acad Sci. USA 81 (1984) RESIDUE 98 100 150 200 250 310 339 NUMBER II

HC

CNI CN3 CN5 CN7 CN

CN2 CN4 CN6 CN8

A I A 3 A5 A 7 A9 A ..

_ _~~~~~~~~. .. A 2 A4 A6 A8

A2 S1 CN3 S1 CN5 Si S A2 S2 CN3 S2

KC 1 KC FIG. 1. Schematic presentation of the evidence for sequence assembly of cathepsin D heavy chain (HC). Bars represent isolated peptides; shaded areas indicate sequenced regions. Residue numbers indicate the position in cathepsin D molecule (see Fig. 2). Peptides were generated by cleavage with cyanogen bromide (CN), arginine cleavage by trypsin after lysine modification (A), and protease digestion with S. aureus protease (S) and endoproteinase Lys C (KC). man sequencer, model 890C, equipped with a Sequemat sylation site with a sequence of Asn-Tyr-Thr from residues model P-6 autoconverter using a dilute quadrol program in 282-284 (PN 274-276). However, no carbohydrate was ob- combination with polybrene (14). The phenylthiohydantoin served at this position. Since the positions of six out of seven amino acids were identified by HPLC (15) and occasionally cysteines are the same as in pepsin and renin (Fig. 3), it is by thin-layer chromatography (16). probable that in the cathepsin D molecule three disulfide bridges are located in the equivalent pairing as in pepsin (10, RESULTS AND DISCUSSION 17)-i.e., between cysteine-45 and cysteine-50, cysteine-206 and cysteine-210, and cysteine-250 and cysteine-283. The The complete amino acid sequence of porcine spleen cathep- free sulfhydryl nature of cysteine-27 (PN 26) in cathepsin D sin D is given in Fig. 2. A detailed study of the structure of has been established previously (8). In the entire sequence of light chain was published earlier by Takahashi and Tang (8). cathepsin D, we observed structural microheterogeneity at The discussion here, therefore, is restricted to the assembly two positions. Lysine and serine were both found at position of heavy-chain structure. The sequence is constructed from 228 (PN 219) and glycine and glutamine were at position 241 the structures of the peptides generated from the reduced (PN 233). It is interesting to note that the microheterogenei- carboxymethylated heavy chain by CNBr, trypsin, and en- ties of cathepsin D, as well as those in the pepsin molecule at doproteinase Lys C cleavages. The structural evidence is residues 230 and 255 (17) and in the molecule at summarized in Fig. 1. Sequence analysis of CNBr and tryp- residue PN 244 (18), all occur in a region between residues sin peptides placed and overlapped most of the heavy-chain PN 219 and PN 255. sequence except for residues 288-299, which was obtained Cathepsin D shows a strong homology when compared by sequence analysis of peptide KC1. There are only two with the other aspartic proteases, pepsin and renin (Fig. 3). single residue overlaps. One is located at residue 287. The Among all three proteases, 32.7% of the residues are identi- connectivity at this position is substantiated by identification cal. The homology is observed throughout the length of the of alanine (CDN 290) in the sequence analysis of peptide molecules, indicating that their three-dimensional structures CN6. The second one-residue overlap is found at residue are very similar. The major structural change unique to ca- 300. During the Edman degradation of peptide KC1, residues thepsin D is the addition of four residues at positions 91A- 301-311 were not identified because of a malfunction of the 91D (Fig. 3). Since the proteolytic processing of the single sequencer. However, the sequence from residues 312-325 of chain to two-chain cathepsin D takes place between posi- peptide KC1 confirmed the correctness of this overlap. tions 91C (asparagine) and 91D (valine)1 (Fig. 3), this addi- The cathepsin D molecule contains 339 residues in two tional structure could have a processing function that is a polypeptide chains and has a molecular weight of 36,779 common phenomenon for all lysosomal enzymes (19-23). (without carbohydrate). Light chain contains 97 residues Another four-residue addition shared by cathepsin D and re- (molecular weight, 10,548) and heavy chain contains 242 res- nin is located at residues 280A-280D (Fig. 3), which is the idues (molecular weight, 26,231). The light chain is known to region for proteolytic processing for single-chain renin (11, occupy the NH2-terminal part of the single-chain molecule 12). It is interesting that the known processing recognition (7, 8). The two previously identified glycosylation sites of cathepsin D (9) are located at cathepsin residues 70 and 192 fSince the single-chain cathepsin D has not been sequenced, it is not (PN 67 and 183). There is one additional potential N-glyco- known whether asparagine-91C and valine-91D are directly linked. Biochemistry: Shewale and Tang Proc. NaMl. Acad. Sci. USA 81 (1984) 3705 1 (1) 10 (10) 20 Gly-Pro-Ile-Pro-Glu-Val-Leu-Lys-Asn-Tyr-Met-Asp-Ala-Gln-Tyr-Tyr-Gly-Glu-Ile-Gly- (20) 30 (30) 40 Ile-Gly-Thr-Pro-Pro-Gln-Cys-Phe-Thr-Val-Val-Phe-Asp-Thr-Gly-Ser-Ser-Asn-Leu-Trp- (40) 5 0 (50) 60 Val-Pro-Ser-Ile-His-Cys-Lys-Leu-Leu-Asp-Ile-Ala-Cys-Trp-Ile-His-His-Lys-Tyr-Asn- (60) 70* (70) 80 Ser-Gly-Lys-Ser-Ser-Thr-Tyr-Val-Lys-Asn-Gly-Thr-Thr-Phe-Ala-Ile-His-Tyr-Gly-Ser- (80) 9 0 (90) 100 Gly-Ser-Leu-Ser-Gly-Tyr-Leu-Ser-Gln-Asp-Thr-Val-Ser-Val-Pro-Ser-Asn Val-Gly-Gly- LC Jo4 HC (100) 110 (110) 120 Ile-Lys-Val-Glu-Arg-Gln-Thr-Phe-Gly-Glu-Ala-Thr-Lys-Gln-Pro-Gly-Leu-Thr-Phe-Ile- (120) 130 (130) 140 Ala-Ala-Lys-Phe-Asp-Gly-Ile-Leu-Gly-Met-Ala-Tyr-Pro-Arg-Ile-Ser-Val-Asn-Asn-Val- (140) 150 (150) 160 Val-Pro-Val-Phe-Asp-Asn-Leu-Met-Gln-Gln-Lys-Leu-Val-Asp-Lys-Asp-Ile-Phe-Ser-Phe- (160) 170 (1 70)180 Tyr-Leu-Asn-Arg-Asp-Pro-Gly-Ala-Gln-Pro-Gly-Gly-Glu-Leu-Met-Leu-Gly-Gly-Ile-Asp- (180) 190 * (190)200 Ser-Lys-Tyr-Tyr-Lys-Gly-Ser-Leu-Asp-Tyr-His-Asn-Val-Thr-Arg-Lys-Ala-Tyr-Trp-Gln- (200) 210 (210)220 Ile-His-Met-Asn-Gln-Val-Ala-Val-Gly-Ser-Ser-Leu-Thr-Leu-Cys-Lys-Gly-Gly-Cys-Glu- Lys(220)230 240 Ala-Ile-Va-l-Asp-Thr-Gly-Thr-Ser-Leu-Ile-Val-Gly-Gln-Pro-Glu-Glu-Val-Arg-Glu-Leu- Gln (240) 250 (250) 260 Gly-Lys-Ala-Ile-Gly-Ala-Vdl-Pro-Leu-Ile-Gln-Gly-Glu-Tyr-Met-Ile-Pro-Cys-Glu-Lys- (260) 270 (270) 280 Val-Pro-Ser-teu-Pro-Asp-Val-Thr-Val-Thr-Leu-Gly-Gly-Lys-Lys-Tyr-Lys-Leu-Ser-Ser- (280) 290 300 Glu-Asn-Tyr-Thr-Leu-Lys-Val-Ser-Gln-Ala-Gly-Gln-Thr-Ile-Cys-Leu-Ser-Gly-Phe-Met- (290) 3 10 (300) 320 Gly-Met-Asp-Ile-Pro-Pro-Pro-Gly-Gly-Pro-Leu-Trp-Ile-Leu-Gly-Asp-Val-Phe-Ile-G (310) 330 (320) 339 (327) Arg-Tyr-Tyr-Thr-Val-Phe-Asp-Arg-Asp-Leu-Asn-Arg-Val-Gly-Leu-Ala-Glu-Ala-Ala FIG. 2. Amino acid sequence of porcine spleen cathepsin D. Glycosylated asparagines are indicated by an asterisk. Numbers in parentheses indicate the pepsin residue numbers. Positions 228 and 241 are structurally heterogeneous.

Arg-Arg sequence in renin has been changed to Ala-Gly in zymogen. However, they terminate at the same position. cathepsin D, which is not cleaved in this region by proteolyt- The inactivation of cathepsin D by active-site directed re- ic processing. In the alignment of Fig. 3, two-residue addi- agents of pepsin, diazo (25-27), and epoxide (28) compounds tions shared by cathepsin D and renin are also found at posi- has implied that the catalytic apparatus of cathepsin D is tions 47A-47B and 155A-155B. These common additions ap- similar to that of pepsin. Thus, it is not surprising that the pear to suggest that cathepsin D and renin are closest in sequence around these aspartates, residues 31-35 and 214- structural similarity. When compared individually, cathepsin 217, is strongly conserved in cathepsin D (Fig. 3). The two D shows 48.9% identical residues with renin (11, 12), 47.8% aspartyl groups at the active site are aspartic acid-32 and as- with pepsin (10, 17), 44.5% with chymosin (18), and 25.9% partic acid-215 (CDN 33 and 224). Other active center resi- with penicillopepsin (24). Furthermore, it should be noted dues, including serine-35 (CDN 36), Thr-Gly at residues 216- that in the alignment of peptide chains, all the active proteins 217 (CDN 225-226), and tyrosine-75 (CDN 78), are all con- start at different positions depending on activation from their served in cathepsin D (2, 24, 29). 3706 Biochemistry: Shewale and Tang Proc. NatL Acad Sci. USA 81 (1984)

1 10 20 Cathepsin D G P I P E V[LKN[M D A Q Y EIP[P[CG I G T Renin S S L T D L I S P V V|L|T|N Y|L N S Q YIYIGIEIIG I G T P PIQIT Pepsin I G D E P E N Y L D T E PJFGT I G I G T P AWD

30 40 47A B 50 60 F1T[V1VF D T G S S N L W V P S I HI K L L D I A W IiH]H K Y N S G K[T Y IFIKIVI|F D T G SIA|N L W V P S|T KIC S R L Y L|A C|G IIHlS L Y E S S D|S S|S Y T F WTL ILF D T G S|SIN L W V P S|V Y [S S - - LiS DIJN Q F N P D D

CH210 I 70 80 9091A B C D 100 V K N G T T F A I |H Y GS G S L S G Y L S Q DT T V S V P S N V G G I K V E R Q T M E N G D D F T I H tY S G R V K G F L S Q D S V T V -G G I T V T -Q T E A T S Q E L S WTI T S MYMTVT G I L G Q V G G I S D T N I

110 120 130 140 A T K Qr G L T F I AA KI F D G I LM A YP R I S V N N V V P V F D N L F G E I V T E I P F M L'AIQIF D G|V|L G| MG F P|A Q A V G G V TIP V F DIH F G E L|P|L L F G L S E T EWG S F L Y YAJP F D G|ILGA Y S I S A S G A T2 V F DIN

150 155A B 160A 170 Y KG M Q Q K L V D K D IFT FWL N R D P G A Q P G G1E L M -[L G G1I 1SKK V E L S Q G V L K E K V|F SIVIY Y N R G P H L L - GIGIE V -|L G G S|DP H|YIQIGI - - - V V IWS S YWTW W DWQG L V S Q D L JV WL S S N D 0 SL S LIL G GJ

CH20 180 1 190 200 210 S L D Y H N V T R K A Y W Q I H M N Q V A V G S S L T L C K G G C E A I V D T G D F H Y V S L S K T D S W Q I T M K G V S V G S S T L L C E E G C E V V V D T G S L N W V P V S V E G Y W Q I T L D S I T M D G E T I A C S G G CQ A IlV D T G

220 230 240 250 Y M I PC E K V P S L P T [§L I V G Q P E E V R - E L G K A I[ V P L I Q GE V V S C IS Q V P TIL P F A P T S S L K - L I M Q A L|G A K E K R L HIE|Y S |S I S D TSIS-FIS L L T G P T S A I A(I)N I Q S D I S E N S D GWM V I S[S S I SLP

290 260 270 280A B C D [V T V T L G K K Y K LS E N[ T L K V S Q A G Q T I[C1L S G F M GF7I K V A L H Pj D I S F N L G[R AJ JtS T DVQ Y P N R R D L|C|T AjM DI I V F T~~~LILiS S P 5 A ILMjQ D D - - - - D S CT S G F E GM D IV

300 3103 032 P P G GP L V~I LGD V 310YYtV D L N R V GL A E A A P P TIGIP VIWIVILGA T IF IR K FIY TIE FDRIH N NR IIiGFLAJL A R V A T S SLEJE LWJI L D VL R Q Y VIIF D RIA N NK V GL ALP to maximize the homology. Boxed FIG. 3. Comparison of amino acid sequence of cathepsin D, renin, and pepsin. Gaps have been placed Amino areas contain residues that are identical in all three proteins. Residues are numbered according to the numbering system of pepsin (10). acids are represented by standard one-letter abbreviations. We thank Azar Fesmire and Abdul Dehdarani for their skilled 4. Tang, J. (1977) Acid Proteases: Structure, Function and Biolo- technical assistance. This study was supported by Research Grant gy (Plenum, New York). GM 20212 from the National Institutes of Health. 5. Neufeld, E. F. & Ashwell, 0. (1980) in The Biochemistry of Glycoproteins and Proteoglycans, ed. Lennarz, W. J. (Plen- 1. Barrett, A. J. (1977) in Proteinases in Mammalian Cells and urn, New York), pp. 241-246. Tissues, ed. Barrett, A. J. (North-Holland, Amsterdam), pp. 6. Sly, W. S. & Fisher, D. (1982) J. Cell. Biochem. 18, 67-85. 209-248. 7. Huang, J. S., Huang, S. S. & Tang, J. (1979) J. Biol. Chem. 2. Tang, J. (1979) Mol. Cell. Biochem. 26, 93-109. 254, 11405-11417. 3. Poole, B. (1975) in Intracellular Protein Turnover, eds. 8. Takahashi, T. & Tang, J. (1983) J. Biol. Chem. 258, 6435 6443. Sckimke, R. & Katsunuma, N. (Academic, New York), pp. 9. Takahashi, T., Schmidt, P. G. & Tang, J. (1983) J. Biol. Chem. 249-264. 258, 2819-2830. Biochemistry: Shewale and Tang Proc. NaML Acad. Sci. USA 81 (1984) 3707

10. Tang, J., Sepulveda, P., Marciniszyn, J., Jr., Chen, K. C. S., 19. Takio, K., Towatari, T., Katunuma, N., Teller, D. C. & Ti- Huang, W.-Y., Tao, N., Liu, D. & Lanier, J. P. (1973) Proc. tani, K. (1983) Proc. Natl. Acad. Sci. USA 80, 3666-3670. Natl. Acad. Sci. USA 70, 3437-3439. 20. Hasilik, A. & Neufeld, E. F. (1980) J. Biol. Chem. 255, 4937- 11. Misono, K. S., Chang, J.-J. & Inagami, T. (1982) Proc. Natl. 4945. Acad. Sci. USA 79, 4863-4867. 21. Erickson, A. H., Conner, G. E. & Blobel, G. (1981) J. Biol. 12. Panthier, J. J., Foote, S., Chambraud, B., Strosberg, A. D., Chem. 256, 11224-11231. Corvol, P. & Rougeon, F. (1982) Nature (London) 298, 90-92. 22. Skudlarek, M. D. & Swank, R. T. (1979) J. Biol. Chem. 254, 13. Spackman, D. H., Stein, W. H. & Moore, S. (1958) Anal. Bio- 9939-9942. chem. 30, 1190-1206. 23. Myerowitz, R. & Neufeld, E. F. (1981) J. Biol. Chem. 256, 14. Brauer, A. W., Margolies, M. N. & Haber, E. (1975) Biochem- 3044-3048. istry 14, 3029-3035. 24. Hsu, I.-N., Delbaere, L. T. J., James M. N. G. & Hofmann, 15. Henderson, L. E., Copeland, T. D. & Oroszlan, S. (1980) T. (1977) Nature (London) 266, 140-145. Anal. Biochem. 102, 1-7. 25. Keilova, H. (1970) FEBS Lett. 6, 312-314. 16. Summers, M. R., Smythers, G. W. & Oroszlan, S. (1973) 26. Ferguson, J. B., Andrews, J. R., Voynick, I. M. & Fruton, Anal. Biochem. 53, 624-628. J. S. (1973) J. Biol. Chem. 248, 6701-6708. 17. Sepulveda, P., Marciniszyn, J., Jr., Liu, D. & Tang, J. (1975) 27. Fruton, J. S. (1976) Adv. Enzymol. 44, 1-36. J. Biol. Chem. 250, 5082-5088. 28. Cunningham, M. & Tang, J. (1976) J. Biol. Chem. 251, 4528- 18. Foltmann, B., Pedersen, V. B., Jacobsen, H., Kauffman, D. & 4536. Wybrandt, G. (1977) Proc. NatI. Acad. Sci. USA 74, 2321- 29. James, M. N. G., Hsu, I.-N. & Delbaere, T. J. (1977) Nature 2324. (London) 267, 808-813.