Proc. Natl. Acad. Sci. USA Vol. 74, No. 6, pp. 2321-2324, June 1977 Biochemistry The complete amino acid sequence of prochymosin (/primary structure/homology) BENT FOLTMANN, VIBEKE BARKHOLT PEDERSEN, HENNING JACOBSEN*, DOROTHY KAUFFMANt, AND GRITH WYBRANDTf Institute of Biochemical Genetics, University of Copenhagen, 0. Farimagsgade 2A, DK-1353 Copenhagen K, Denmark Communicated by Hans Neurath, March 18,1977

ABSTRACT The total sequence of 365 amino acid residues order to avoid unspecific, chymotrypsin-like cleavages (13). in bovine prochymosin is presented. Alignment with the amino After such treatment the large fragments were purified by gel acid sequence of porcine pepsinogen shows that 204 amino acid with residues are common to the two zymogens. Further comparison filtration on Sephadex G-100 in 0.05 M NH4HCO3, pH 8, and alignment with the amino acid sequence of 8 M urea. After cleavage of with cyanogen bromide shows that 66 residues are located at identical positions in all the fragments were purified by gel filtration on Sephadex G-100 three . The three belong to a large group of in 25% acetic acid. The best results were obtained if cleavage proteases with two aspartate residues in the active center. This was performed on with intact disulfide bridges. By such group forms a family derived from one common ancestor. treatment two of the large fragments, CB(211-302) and CB(314-373), are held together and separated from the frag- Chymosin (EC 3.4.23.4) is the major proteolytic enzyme in the ment next in size, CB(45-126). After reduction and amino stomach of the preruminant calf. Like , chymosin belongs ethylation the two former fragments were separated by re- to a group of acidic proteases in which two aspartate residues peated gel filtration. participate in the catalytic mechanism (1, 2). In analogy with Each of the large fragments was digested either with chy- the term serine proteases, the term ispartate proteases has been motrypsin, elastase, thermolysin, or staphylococcal protease, suggested (3). or, after deblocking of amino groups, with trypsin or Armillaria Chymosin is secreted as an inactive precursor, prochymosin, with one of the other consisting of a single peptide chain with 365 amino acid resi- mellea protease and then proteases. dues. The zymogen is irreversibly converted into active enzyme Low-molecular-weight peptides were mainly purified by by limited proteolysis during which a total of 42 amino acid high-voltage paper electrophoresis/paper chromatography. residues are released from the amino-terminal part of the Amino acid analyses were performed with BioCal 201 or peptide chain. Durrum D-500 amino acid analyzers. The sequences of the Prochymosin contains six half-cystine residues forming three purified peptides were determined by Edman degradation (14) disulfide bridges, all located in the enzyme part of the molecule. followed either by dansylation (15) or by identification of lib- Investigations on the primary structure began with determi- erated phenylthiohydantoin-derivatives (16, 17). Location of nation of the amino acid sequences around the disulfide bridges amides were obtained from electrophoretic mobilities of pep- (4). In subsequent papers (5, 6) we reported the sequence of the tides (18) or by thin-layer chromatography of phenylthiohy- activation segment and the sequence of the first 61 residues of dantoin-derivatives (16). the active enzyme. RESULTS This communication presents the complete amino acid se- quence of prochymosin B. It is shown that this zymogen is Fig. 1 shows in schematic form the relative location of frag- highly homologous to porcine pepsinogen A (7-10). To facilitate ments obtained after cyanogen bromide cleavage or tryptic the comparison, we have chosen to number the amino acid digestion of chymosin with blocked amino groups. The final residues from the NH2-terminus of prochymosin and then sequence of prochymosin is presented in Fig. 2. continue by counting gaps where such occur in prochymosin Throughout the structure most of the amino acid residues relative to pepsinogen. The comparison is further extended to have been identified in two or more independent peptides. penicillopepsin, the only other aspartate protease of which the Exceptions are some residues from no. 230 to no. 250. Due to sequence is almost completely known (11). the apolar character of this region the peptides have been dif- ficult to isolate. Except for phenylalanine no. 163, all residues METHODS have been located in peptides with overlaps of at least two residues. However, overlaps around phenylalanine no. 158 and Prochymosin and chymosin used in these investigations were the amino acid composition of CB(127-169) make no other prepared in our laboratory according to the methods described sequence possible. previously (12). If we add up the molecular weights of all residues in pro- The sequence was obtained after a series of degradation ex- chymosin and chymosin, molecular weights of 40,777 and periments carried out in parallel. In most cases the first steps 35,652 are found. These are higher than the molecular weights were enzymatic cleavage with trypsin or chemical cleavage published previously (19); but if the original amino acid com- with cyanogen bromide. To improve solubility of the digest and positions (20) are recalculated on the basis of the molecular to reduce the number of fragments, we used maleylated or ci- weights found by determination of the sequence, there is a traconylated preparations for the digestions with trypsin. The satisfactory agreement (Table 1). digestions were carried out at 120 (pH 8, for 15-30 min) in DISCUSSION Present addresses: *Novo Research Institute, Bagsvaerd, Denmark; tDept. of Oral Biology, University of Washington, Seattle; and tDept. Crystalline chymosin is resolved into three different compo- of Zoophysiology, University of Copenhagen. nents by chromatography on columns of DEAE-cellulose (21). 2321 Downloaded by guest on September 27, 2021 2322 Biochemistry: Foltmann et al. Proc. Natl. Acad. Sci. USA 74 (1977) CB TM Table 1. Amino acid compositions of prochymosin B and chymosin B* '5I Gly 5 Gly Prochymosin Bt Chymosin Bt AAA Seq. AAA Seq.

S-101 Asx 37.6 37 34.8 36 24 23 [02 Thr 23.0 21.4 Phe Ser 34.4 35 31.0 31 Glx 40.9 39 33.6 33 106 Lys Pro 15.7 16 14.0 15 127 Gin Gly 32.1 32 28.4 28 Ala 16.9 17 14.9 15 ½Cys 5.7 6 5.7 6 Val 25.7 26 24.3 26 Met 7.6 8 7.7 8 Ala Ile 21.4 22 17.6 19 Leu 28.8 29 21.5 23 189 Tyr 20.7 22 17.9 19 188 Asn Phe 18.6 19 16.0 17 201 His His 5.6 6 4.8 5 2 Asp Lz20J Lys 14.6 15 9.4 9 210 Asp Arg 7.9 8 5.6 6 Fy211 LeuI Trp 4.6 4 4.8 4 s- Amide 38.3 39 35.6 36 * Comparison between amino acid analyses (AAA) (20) and compo- sitions determined from the sequences (Seq.). t Molecular weight: 40,777. t Molecular weight: 35,652. _ 302 However, more important features of an investigation like this appear when the primary structure is compared with pri- s-7 Pro mary structures of different but related enzymes. In Fig. 2 we 31-31 indicate residues that are common to bovine prochymosin B, Tyr porcine pepsinogen A, and penicillopepsin. Out of 373 positions for amino acid residues in the two gastric zymogens, 204 are Glu occupied by common residues. In chymosin B, 24 out of 28 glycines, 12 out of 15 prolines, and all three disulfide bridges 363 Ala are located in the same positions as in . The similarity 373 lie 373 lie goes even further than demonstrated in Fig. 2. Of the remaining FIG. 1. Diagrammatic presentation of fragments of chymosin residues a large part represents conservative substitutions in obtained by cyanogen bromide cleavage (CB) and by tryptic digestion which the polarity of the side chains is maintained, and two gaps of chymosin with modified amino groups (TM). The fragments are in the peptide chain of prochymosin (41-42 and 338-340) characterized by the number of first and last residues (for zymogen correspond to proline sequences in pepsinogen. All this implies numbering, see Fig. 2). Residue no. 187 is methionine. It was recovered that the tertiary structures of the two proteins are very similar, as free homoserine after cleavage with cyanogen bromide but is not and that the gaps occur in bendings of the peptide chain which marked as a separate fragment. NH2-termini of the fragments are in prochymosin are shorter by a few residues. shown. All CB fragments except CB(314-373) have COOH-terminal homoserine; all TM fragments except TM(363-373) have COOH- As indicated in Fig. 2, the comparison moreover shows that terminal arginine. penicillopepsin has 66 residues identical to those in the two gastric proteases. These identities are not scattered at random Of these, the component designated chymosin C is partly au- along the peptide chain. They are mainly clustered in six short tolyzed enzyme, while separate zymogens exist for chymosins sections with extensive homology: Phe77 to Serg8; Ser118 to A and B. Prochymosin B is most abundant of the two, and all Ser,25; Asp164 to Ala170; Gly214 to Leu22s; Ile2s9 to Thr264; and residues have been identified in this protein. The fragments Gly349 to Asp361. All together these six sections account for about obtained by tryptic digestion and by cyanogen bromide 20% of the peptide chain, but they include two-thirds of the cleavage of chymosin A separate on gel filtration in a manner identical residues. that is indistinguishable from fragments of chymosin B. Fur- If we compile the information from determination of se- thermore, the individual fragments from chymosins A and B quence, activation of zymogens, and inactivation of the en- have very similar amino acid compositions. A total of 119 zymes, we can begin to discern outlines of a relationship be- residues have been identified in chymosin A, and among these tween structure and function of the aspartate proteases and their we have found only one that is different from the corresponding zymogens. A zymogen has not been found for penicillopepsin residue in chymosin B: at position no. 290 chymosin A has as- (22). With regard to the zymogens for the gastric proteases, we partic acid, while chymosin B has glycine. There may be more shall briefly draw attention to some points raised in other papers differences, but we can conclude that the two structures are (7, 21, 23, 24). In the amino-terminal part of the peptide chain very similar. (the so-called activation segment or propart) a pattern of basic Downloaded by guest on September 27, 2021 Biochemistry: Foltmann et al. Proc. Natl. Acad. Sci. USA 74 (1977) 2323

5 -. 10 15 20 Ala -Glu- Ile -Thr-Arg- Ile -Pro-Leu-Tyr-Lys-Gly-Lys-Ser-Leu-Arg-Lys-Ala-Leu-Lys-Glu- 25 30 35 40 His -Gly-Leu-Leu-Glu-Asp-Phe-Leu-Gln-Lys-Gln-Gln-Tyr-Gly- Ile -Ser-Ser-Lys-Tyr-Ser- 45 50 55 60 -Gly-Phe-Gly-Glu-Val-Ala-Ser-Vai -Pro -Leu-Thr-Asn-Tyr-Leu-Asp-Ser-Gln-Tyr- 65 70 75 80 Phe -Gly -Lys- Ile -Tyr-Leu-Gly -Thr-Pro - Pro-Gln -Glu -Phe-Thr- Val-Leu-Phe -Asp-Thr-Gly - 85 90 95 100 Ser -Ser-Asp-Phe-Trp-Val-Pro-Ser- Ile -Tyr-Cys-Lys-Ser-Asn-Ala-Cys-Lys-Asn-His-Gln- 105 110 115 120 Arg-Phe-Asp- Pro-Arg-Lys -Ser - Ser -Thr-Phe -Gln -Asn-Leu-Gly -Lys- Pro -Leu- Ser - Ile - His - 125 130 135 140 Tyr-Gly-Thr-Gly-Ser-Met-Gln-Gly- Ile -Leu-Gly-Tyr-Asp-Thr-Val-Thr-Val-Ser-Asn- Ile - 145 150 155 160 Val -Asp- Ile -Gln -Gln -Thr- Val -Gly -Leu-Ser -Thr-Gln -Glu -Pro -Gly -Asp- Val -Phe -Thr-Thr- 165 170 175 180 Ala -Glu -Phe -Asp-Gly - Ile -Leu-Gly-Met- Ala -Tyr-Pro - Ser -Leu- Ala- Ser -Glu -Tyr- Ser - Ile - 185 190 195 200 Pro- Val-Phe-Asp-Asn-Met-Met-Asn-Arg-His-Leu- Val-Ala-Gln-Asp-Leu-Phe-Ser- Val-Tyr- 205 210 215 220 Met-Asp-Arg-Asp-Gly -Gln-Glu- -Ser -Met-Leu-Thr-Leu-Gly-Ala- Ile -Asp-Pro-Ser -Tyr- 225 230 235 240 Tyr-Thr-Gly - Ser -Leu- His -Trp - Val-Pro - Val -Thr- Val -Gln -Gln -Tyr-Trp-Gln - Phe-Tyr- Val - 245 250 255 260 Asp-Ser - Val -Thr- Ile - Ser -Gly - Val - Val - Val -Ala-Cys-Glu-Gly -Gly -Cys -Gln -Ala- Ile -Leu- 265 270 275 280 Asp-Thr-Gly-Thr-Ser -Lys-Leu- Val -Gly -Pro - Ser -Ser -Asp- Ile -Leu- -Asn- Ile -Gln -Gln - 285 290 295 300 Ala - le -Gly -Ala-Thr-Gln -Asn-Gln-Tyr-GAPGly -Giu -Phe-Asp- le -Asp-Cys-Asp-Asn-Leu- Ser - 305 310 315 310 Tyr-Met-Pro -Thr- Vai - Val -Phe -Glu - Ile -Asn-Gly -Lys -Met-Tyr- Pro-Leu-Thr-Pro -Ser -Ala -

325 330 335 340 Tyr-Thr - Ser -Gln -Asp-Gin -Gly -Phe-Cys-Thr - Ser -Gly -Phe -Gln - Ser -Glu -Asn- - -

345 350 355 360 His -Ser - -Gln-Lys-Trp- Ile -Leu-GIy-Asp- Val-Phe- Ile -Arg-Glu-Tyr-Tyr- Ser -Val-Phe- 365 370 Asp-Arg-Ala -Asn-Asn-Leu- Val -Gly -Leu- Ala-Lys- Ala- Ie FIG. 2. The amino acid sequence of prochymosin. To facilitate the comparison with porcine pepsinogen A, gaps are marked and counted where such occur relative to porcine pepsinogen A. The NH2-terminus of active chymosin is Gly45. Disulfide bridges connect Cys9j and Cys96, Cys252 and Cys256, and Cys296 and Cys329. Residues in italics are common to prochymosin and porcine pepsinogen. Residues in boldface are common to the gastric enzymes and to penicillopepsin. (Gaps or insertions in the latter alignment are not-marked.) amino acid residues with spacers of apolar residues is common pH is lowered, the electrostatic interactions are weakened, the for all zymogens of this group analyzed so far. This part of the zymogen molecules rearrange into an active conformation (21, structure is essential to anchor the zymogen molecule in inactive 25, 26), and irreversible activation takes place by limited pro- conformation at neutral pH. The conformation is mainly sta- teolysis. bilized by electrostatic interactions between positive charges In the alignment of peptide chains the NH2-termini of the of basic amino acids in the propart of the peptide chain and active enzymes start at different positions: penicillopepsin at negative charges in the enzyme part of the molecule. When the residue no. 43, chymosin at no. 45, and porcine pepsin at posi- Downloaded by guest on September 27, 2021 2324 Biochemistry: Foltmann et al. Froc. Natl. Acad. Sci. USA 74 (1977)

tion no. 47. There is only little homology in this region of the 7. Ong, E. B. & Perlmann, G. E. (1968) J. Biol. Chem. 243, primary structure. This indicates that the NH2-terminus of the 6104-6109. active enzyme does not participate directly in formation of the 8. Pedersen, V. B. & Foltmann, B. (1973) FEBS Lett. 35, 255- active center. 256. 9. Sepulveda, P., Marciniszyn, J., Liu, D. & Tang, J. (1975) J. Biol. The enzymes are inhibited when aspartate no. 78 is esterified Chem. 250, 5082-5088. by reaction with substrate-like epoxides (27). Inhibition also 10. Moraivek, L. & Kostka, V. (1974) FEBS Lett. 43,207-211. occurs if asparate no. 261 is esterified by reaction with active- 11. Cunningham, A., Wang, H-M., Jones, S. R., Kurosky, A., Rao, site-directed diazo compounds (28). For this reaction diazo- L., Harris, C. I., Rhee, S. H. & Hofmann, T. (1976) Can. J. Bio- acetyl-norleucine-methylester has been used in many experi- chem. 54, 902-914. ments (29). In all three enzymes these aspartate residues are 12. Foltmann, B. (1970) in Methods in Enzymology, eds. Perlmann, located in sections of highly homologous sequences. This is G. E. & Lorand, L. (Academic Press, New York), Vol. 19, pp. consistent with current concepts of protein evolution, that 421-436. K. D. & P. E. (1970) Biochemistry 9, 4470- residues that carry a special function are conserved in homol- 13. Hapner, Wilcox, 4480. ogous surroundings throughout the evolution. 14. Gray, W. R. (1967) in Methods in Enzymology, ed. Hirs, C. H. If we extend this reasoning to the four other sections of ex- W. (Academic Press, New York), Vol. 11, pp. 469-475. tensive homology, we may expect that they, in some way, are 15. Hartley, B. S. (1970) Biochem. J. 119,805-822. essential for the tertiary structure and thereby for the biological 16. Kulbe, K. D. (1974) Anal. Biochem. 59,564-573. activity. X-ray crystallography on the aspartate proteases is in 17. Smithies, O., Gibson, D., Fanning, E. M., Goodfliesh, R. M., progress (30-32) and will undoubtedly shed light on these Gilman, J. G. & Ballantyne, D. L. (1971) Biochemistry 10, problems. 4912-4921. In addition to the three enzymes considered in this paper, we 18. Offord, R. E. (1966) Nature 211,591-593. have fragmentary information about sequences of human (33), 19. Djurtoft, R., Foltmann, B. & Johansen, A. (1964) C. R. Trav. Lab. (24) pepsinogens Carlsberg 34,287-298. bovine (34), horse (35), seal (24), and dogfish 20. Foltmann, B. (1964) C. R. Trav. Lab. Carlsberg 34,275-286. or ; all results show a high degree of homology. Fur- 21. Foltmann, B. (1966) C. R. Trav. Lab. Carlsberg 35, 143-231. thermore, proteases like D (36), (37), and several 22. Hofmann, Th. (1974) Adv. Chem. Ser. 136, 146-185. microbial proteases (22) are inhibited by diazo-acetyl-norleu- 23. Harboe, M., Andersen, P. M., Foltmann, B., Kay, J. & Kassell, B. cine-methylester, and where the sequences of peptides con- (1974) J. Biol. Chem. 249,4487-4494. taining the reactive aspartate have been determined, the results 24. Klemm, P., Poulsen, F., Harboe, M. K. & Foltmann, B. (1976) indicate homology. Acta Chem. Scand. Ser. B 30,979-984. It is premature to discuss the evolutionary relationship in 25. McPhie, P. (1972) J. Biol. Chem. 247,4277-4281. detail, but we may conclude that all these enzymes are derived 26. Al-Janabi, J., Hartsuck, J. A. & Tang, J. (1972) J. Biol. Chem. 247, 4628-4632. a common ancestor, and we may add the asparate pro- from 27. Chen, K. C. S. & Tang, J. (1972) J. Biol. Chem. 247, 2566- teases to the list of protein superfamilies. 2574. Armillaria mellea protease was obtained as a gift from Dr. D. Smyth; 28. Knowles, J. R. & Wybrandt, G. B. (1968) FEBS Lett. 1, 211- staphylococcal protease was a gift from Dr. G. Drapeau. Both are 212. gratefully acknowledged. The investigations have been supported by 29. Rajagopalan, T. G., Stein, W. H. & Moore, S. (1966) J. Biol. Chem. the Carlsberg Foundation and by the Danish Natural Science Research 241,4295-4297. Council. 30. Subramanian, E., Swan, I. D. A. & Davies, D. R. (1976) Biochem. The costs of publication of this article were defrayed in part by the Biophys. Res. Commun. 68,875-880. payment of page charges from funds made available to support the 31. Jenkins, J. A., Blundell, T. L., Tickle, I. J. & Ungaretti, L. (1975) research which is the subject of the article. This article must therefore J. Mol. Biol. 99,583-590. be hereby marked "adveri'isement" in accordance with 18 U. S. C. 32. Andreeva, N. S., Fedorov, A. A., Gushchina, A. E., Shutskever, §1734 solely to indicate this fact. N. E., Riskulov, R. R. & Vol'nova, T. V. (1976) Dokl. Akad. Nauk SSSR 228,480-483. 1. Knowles, J. R. (1970) Phil. Trans. Roy. Soc. (London) B 257, 33. Sepulveda, P., Jackson, K. W. & Tang, J. (1975) Biochem. Bio- 135-146. phys. Res. Commun. 63, 1106-1112. 2. Fruton, J. S. (1976) Adv. Enzymol. 44, 1-44. 34. Harboe, M. K. & Foltmann, B. (1975) FEBS Lett. 60, 133- 3. Kovaleva, G. G., Shimanskaya, M. P. & Stepanov, V. M. (1972) 136. Biochem. Biophys. Res. Commun. 49, 1075-1081. 35. Stepanov, V. M., Lavrenova, G. I., Rudenshaya, G., Gouschar, 4. Foltmann, B. & Hartley, B. S. (1967) Biochem. J. 104, 1064- M. V., Lobareva, L. C., Kotolova, E. K., Strongin, A. Ya., Bara- 1074. tova, L. A. & Belyanova, L. P. (1976) Biokhimiya 41, 1285- 5. Pedersen, V. B. & Foltmann, B. (1975) Eur. J. Biochem. 55, 1290. 95-103. 36. Keilova, H. (1970) FEBS Lett. 6,312-314. 6. Pedersen, V. B. & Foltmann, B. (1973) FEBS Lett. 35, 250- 37. Inagami, T., Misono, K. & Michelakis, A. M. (1974) Biochem. 254. Biophys. Res. Commun. 56,503-509. Downloaded by guest on September 27, 2021