Proc. Nati. Acad. Sci. USA Vol. 77, No. 6, pp. 3388-3392, June 1980 Biochemistry Covalent structure of human haptoglobin: A protease homolog ( sequence/plasminogen/prothrombin/protein evolution) ALEXANDER KUROSKY*, DON R. BARNETT*, TONG-HO LEE*t, BILLY TOUCHSTONE*, REGINE E. HAY*, MARILYN S. ARNOTT**, BARBARA H. BOWMAN*, AND WALTER M. FITCH§ *Division of Human Genetics, Department of Human Biological Chemistry and Genetics, The University of Texas Medical Branch, Galveston, Texas 77550; and §Department of Physiological Chemistry, University of Wisconsin Medical Center, Madison, Wisconsin 53706 Communicated by Alexander G. Beam, February 11, 1980

ABSTRACT The complete amino acid sequences and the arrangements. We have repeated the sequence analysis of disulfide arrangements of the two chains of human haptoglobin human hp a1 chain reported by others (21, 22) because our 1-1 were established. The al and P chains of haptoglobin con- tain 83 and 245 residues, respectively. Comparison of the pri- analysis revealed a number of differences in the amino-terminal mary structure of haptoglobin with that of the chymotrypsino- region. The completed a' and ,3 chain sequences of Hp were gen family of serine proteases revealed a significant degree of extensively compared with the chymotrypsinogen family of chemical similarity. ITe probability was less than 10-5that the serine proteases to establish their evolutionary relationship. chemical similarity of the P chain of haptoglobin to the pro- teases was due to chance. The amino acid sequence of the P METHODS chain of haptoglobin is 29-33% identical to bovine , bovine , porcine elastase, human thrombin, or Hp and its component chains were purified from human ascites human plasmin. Comparison of haptoglobin al chain to acti- fluid and from plasma as described (23). Purification and vation regions of the zymogens revealed an identity of characterization of the CNBr were reported by Ku- 25% to the fifth "kringle" region of the activation peptide of rosky et al. (24). Purified CNBr fragments II, III, IV, and V plasminogen. The probability was less than 0.014 that this were subjected to hydrolysis* with chymotrypsin, trypsin, similarity was due to chance. These results strongly indicate haptoglo in to be a homolog of the chymotrypsinogen family staphylococcal protease, and thermolysin. Tryptic and chy- of serine proteases. Alignment of the ,Bchain sequence of hap- motryptic hydrolyses were also performed on whole ,B chain. toglobin to the serine proteases is remarkably consistent except The primary structure of the al chain of Hp was established for an insertion of 16 residues in the region corresponding to the by automated sequence analysis of intact a1 chain and by methionyl loop of the serine proteases. The active-site residues characterization of tryptic and staphylococcal protease peptides. typical of the serine proteases,-histidine-57 and serine-195, are In addition, fragments obtained by arginyl hydrolysis by trypsin replaced in haptoglobin by lysine and alanine, respectively; however, aspartic acid-102 and the trypsin specificity residue, of succinylated a1 chain and by prolonged treatment of a1 aspartic acid-189, do occur in haptoglobin. Haptoglobin and the chain with CNBr (1000-fold molar excess for 72 hr) were also serine proteases represent a striking example of homologous characterized. Methods of peptide purification and dansyl(5- proteins with different biological functions. dimethylaminonaphthalene-l-sulfonyl)-Edman were con- ducted as described (25). Automated sequence analyses were Human haptoglobin (Hp) is a plasma glycoprotein composed performed on the Beckman Sequencer (updated model 890B) of two types of polypeptide chains, a and ,3, that are covalently as reported (23). Limited proteolytic cleavage of Hp 1-1 by associated by disulfide bonds. In humans, three major pheno- plasmin was done as described (26). types, designated Hp 1-1, Hp 2-1, and Hp 2-2 (1-3), involve Comparison of the primary structures of the a1 and ,B chains considerable molecular variation due to an almost complete of Hp with the chymotrypsinogen family of serine proteases duplication of the Hp' allele that resulted in a Hp2 allele (4, 5). was according to the procedures of Fitch (27, 28) or Needleman During gel electrophoresis Hp 2-1 and 2-2 both exhibit a and Wunsch (29). polymeric series mediated by disulfide bonding (6-9) that can be formulated as (a23)on (n = 3, 4, 5, . . .) for Hp 2-2 and RESULTS (a'f3)2-(a2f3). (n = 0, 1, 2,. . .) for Hp 2-1 (10-13). The structure The amino acid sequences of theal and ,3 chains of human Hp of Hp 1-1 is represented simply as (al'f)2. Comprehensive re- are given in Fig. 1. Limited proteolytic cleavage of intact views on Hp are available (14-16). human Hp 1-1 by human plasmin gave a major cleavage be- Interest in Hp is usually related to its unique ability to bind tween residues 130 and 131 (90% yield). A minor cleavage be- hemoglobin (Hb). An af3 subunit of Hb binds to each of two tween 161 and 162 (10% yield) was also elicited with use of independent binding sites on a molecule of Hp 1-1 with an af- higher ratios of plasmnin to Hp. The disulfide linkages in Hp 1-1 finity that is too high to measure precisely (17). Binding of Hb are described in the legend to Fig. 1. A summar representation af3 subunits is reported to occur on the ,B chain of Hp (18). of the tetrachain arrangement of human Hp 1-1, including the Partial amino acid sequence analysis of the haptoglobin 3 disulfide assignments, is illustrated in Fig. 2. polypeptide (hp ,B) chain indicated it to be chemically similar Comparison of the sequence of the hp ,B chain with sequences to the chymotrypsinogen family of serine proteases (19, 20). of representative serine proteases from the chymotrypsinogen This suggests that Hp is evolutionarily related to the serine family is shown in Fig. 3. Computer analysis by the method of proteases as a result of gene duplication of a serine protease gene Fitch (27, 28) revealed that the probability was <10-5 that the (or a precursor thereof) and subsequent divergence to its present chemical similarity between the ,B chain of Hp and the plas- structure. This report presents the complete primary structure of human Hp 1-1 (a' and ,3 chains), including the disulfide Abbreviations: Hp, intact haptoglobin; hp, used in references to hap- toglobin polypeptide chaini; Hb, hemoglobin. The publication costs of this article were defrayed in part by page t Present address: Department of Biochemistry, Catholic Medical charge payment. This article must therefore be hereby marked "ad- College, Seoul, Korea. vertisement" in accordance with 18 U. S. C. §1734 solely to indicate t Present address: Department of Biology, Environmental Biology this fact. Section, M.D. Anderson, Houston, Texas 77030. 3388 Downloaded by guest on October 1, 2021 Biochemistry: Kurosky et al. Proc. Nati. Acad. Sci. USA 77 (1980) 3389

I 1 5 10 15 20 25 a CHAIN: *Val-Asp-Ser-Gly-Asn-Asp-Val-Thr-Asp-Ile-Ala-Asp-Asp-Gly-Cys-Pro-Lys-Pro-Pro-Glu-Ile-Ala-His-Gly-Tyr- 26 31 35 40 45 50 -Lys- 55 Val -Gl u-Hi s-Ser-Val -Arg-Tyr-Gl n-Cys-Lys-Asn-Tyr-Tyr-Lys-Leu-Arg-Thr-G1 u-Gly-Asp-Gly-Val -Tyr-Thr-Leu-Asn-Asn-Glu-Lys-Gl n- 56 61 65 70 75 80 Trp- Il1e-Asn-Lys-Al a-Val -Gly-Asp- Lys- Leu-Pro-Gl u-Cys- G u -Al a-Val1-Cys-Gl y- Lys-Pro-Lys-Asn-Pro-Al a-Asn- Pro-Val1-Gl n

1 5 10 15 20 CHO 25 k CHAIN: I1e-Leu-Gly-Gly-His-Leu-Asp-Ala-Lys-Gly-Ser-Phe-Pro-Trp-Gln-Ala-Lys-Met-Va1-Ser-His-His-Asn-Leu-Thr- 26 31 35 40 45 CH0 CHO 55 Thr-Gly-Ala-Thr-Leu- Ile-Asn-Gl u-Gl n-Trp-Leu-Leu-Thr-Thr-Al a-Lys-Asn-Leu-Phe-Leu-Asn-Hi s-Ser-Gl u-Asn-Ala-Thr-Al a-Lys-Asp- 56 61 65 70 75 CHO 85 Ile-Ala-Pro-Thr-Leu-Thr-Leu-Tyr-Val -Gly-Lys-Lys-Gln-Leu-Val -Glu-Ile-Glu-Lys-Val -Val -Leu-His-Pro-Asn-Tyr-Ser-Gln-Val-Asp- 86 91 95 100 105 110 115 Il e-Gly-Leu-Ile-Lys-Leu-Lys-G1 n-Lys-Val -Ser-Val -Asn-Gl u-Arg-Val -Met-Pro-Ile-Cys-Leu-Pro-Ser-Lys-Asp-Tyr-Ala-Glu-Val -Gly- 116 121 125 130 135 140 145 Arg-VaI -Gly-Tyr-Va I -Ser-Gly-Trp-Gly-Arg -As n-Al a -Asn-Phe-Lys-Phe-Thr-Asp-His-Leu-Lys-Tyr-Val -Met-Leu-Pro-Val -Ala -Asp-Gl n- 146 151 155 160 165 170 175 Asp-Gln-Cys-Ile-Arg-His-Tyr-Glu-Gly-Ser-Thr-Va I -Pro-GI u-Lys-Lys-Thr-Pro-Lys-Ser-Pro-Val -Gly-Val-Gln-Pro-fle-Leu-Asn-Glu- 176 181 185 190 195 200 205 His-Thr-Phe-Cys-Ala-Gly-Met-Ser-Lys-Tyr-Gln-Glu-Asp-Thr-Cys-Tyr-Gly-Asp-Ala-Gly-Ser-Ala-Phe-Ala-Val-His-Asp-Leu-Glu-Glu- 206 211 215 220 225 230 235 Asn6-Thr-Trp-Tyr-Al a-Thr-Gly-Il e-Leu-Ser-Phe-Asp-Lys-Cys-Ser-Al a-Val -Al a-Gl u-Tyr-Gly-Val -Tyr-Val -Lys-Val -Thr-Ser-Il e-Gl n- 236 241 245 Asn-Trp-Val -Gln-Lys-Thr-Ile-Ala-Glu-Asn* FIG. 1. Amino acid sequences ofthe a' and (3 chains ofhuman Hp 1-1. Carbohydrate attachment is indicated by CHO. The interchain disulfide linkages are a'15-a115 and a172-(3105 and the intrachain linkages are a134-a168, fl148-#179, and /3190-(3219 (8, 9, 24). A polymorphism of the al chain has either Glu or Lys at position 53 (5).

minogen B chain was due to chance (length of subsequence Our sequence studies establish a molecular weight of 9189 examined = 15, average minimal base difference = 1.52, and for hp a1 chain and 27,259 for the peptide portion of hp ( number of sequences compared was >12,750). A similar cal- chain. Because the carbohydrate content of the (3 chain is 19.4% culation of the comparison of hp (3 chain and thrombin B chain (38), the total molecular weights of the # chain and intact Hp gave a probability of <10-8. Similarly, comparison of hp al 1-1 were calculated to be 33,820 and 86,018, respectively. These chain with activation peptides of serine proteases is given in Fig. molecular weight values fall within the ranges reported by ul- 4. Computer analysis according to the method of Needleman tracentrifugational analysis (reviewed in ref. 24). and Wunsch (29), using maximum base matches from the ge- Comparison of the hp (3 chain to the chymotrypsinogen netic code and a gap penalty of 1.999, identified a significant family of serine proteases (Fig. 3) strongly indicates Hp to be degree of similarity between hp al chain and the fifth kringlel a homolog of this family of proteins. The probability is <10-5 region of plasminogen (P < 0.014 that the similarity was due that the similarity is due to chance. The hp al chain also shows to chance). The relationship of Hp to the chymotrypsinogen significant chemical similarity to the activation peptides of the family of serine proteases is represented by the evolutionary serine proteases and especially to the fifth kringle region of tree given in Fig. 5. plasminogen. The probability is <0.014 that this similarity is due to chance. A charge relay system, as has been described for DISCUSSION the serine proteases, could not occur in Hp because the positions Amino acid sequence analysis of hp al chain revealed a number corresponding to the proteolytic active site residues, histidine-57 of differences when compared to the previously reported se- and serine-195, are replaced in Hp by lysine and alanine, re- quence (21, 22). Our results indicated one less residue; aspar- spectively. This is consistent with the fact that Hp has no known agine at position 2 was not confirmed. Moreover, we have de- proteolytic function. Aspartic acid-102 and the trypsin speci- termined residues 15-20 to be Cys-Pro-Lys-Pro-Pro-Glu rather ficity residue aspartic acid-189 are found in Hp as well as as- than Gln-Pro-Pro-Pro-Lys-Cys as previously assigned. In ad- partic acid-194, which forms an internal ion pair with the dition, we find position 43 to be Glu rather than Gln. Repeated a-amino group of the amino-terminal residue in a number of sequence analysis using a1 chain alkylated with iodo[1-14C]- serine proteases (34). Of the three intrachain disulfide loops acetamide clearly identified radiolabeled Cys at position 15 common to all serine proteases (histidyl, 42-58; methionyl, rather than at 20. These same sequence differences were also 168-182; and seryl, 191-220), only the methionyl and the seryl confirmed in partial sequence analysis of the hp a2 chain. loops are found in the hp ( chain. However, the interchain disulfide between the attached activation peptide portions and the portions in the serine proteases so far characterized CHOD I PLASMIN| COCHOI H aligns precisely with the disulfide attaching hp a and ( chains. N C CH0 1105 148 179 190 219 C i CHAIN Taken together, these data further support our earlier proposal 1 23 46 50 80 I 130 sL s Ls-s5 J 245 S (39) that Hp is initially synthesized as a single chain and sub- I sequently cleaved. Typically in eukaryotic multicellular or- 15 34 N1I 88" ganisms, the serine proteases are synthesized as single-chain LS-sJ72 83 precursors and subsequently activated by proteolysis. In some s aI CHAINS cases, as for example Factor XI (40), the precursor chain SlI S-S-1 83C structure is further elaborated by disulfide bonding. Strikingly, S the tetrachain chain structure of Factor XIa, which is a serine S 182 protease, is identical to that of Hp and, like has two func- 18 55- Hp, N1N 1021 139r-s-s r 245 [CHAIN tional sites per tetrachain. It is highly possible that the precursor I I MET MET MET MET structure of Hp resembles Factor XI-i.e., two identical chains FIG. 2. Summary representation of the major features of the covalently attached by disulfide bonding. This hypothesis does human Hp 1-1 tetrachain structure. CHO indicates carbohydrate attachment to Asn. Plasmin cleaved at Lys-130 of both P chains. I Defined in ref. 37. Downloaded by guest on October 1, 2021 3390 Biochemistry: Kurosky et al. Proc. Nati. Acad. Sci. USA 77 (1980)

a 16 a 63 6~~~** * * 0 t * * .. . . HuHp o chain .I L GGH]DAKi PTWQAKMV- 0HH N[DT T G A T L I N E Q WiLL TT K NWTN H[m

BoTr .I V G G Y T CGA,]TgIP Y Q V S L N - S G Y H F C G G S5L1 N JO WVVSAAHC K SGI BoCh A L GGGSLG S L INENI N E N WV V TA'A'H C TG .IVNGEEAVPIGSWPWQV S QD K-TGFHFC T8 HuThr B chain .IVE G SN A EI IP PWOQVM L FR KEP ELC G A S L I S N R 1WV L TA A H CL VYP P W HuPI B Chain .VV AHP SWPWQ'V S LRT R F GMH F C G G T L I S P W V L TA H CE K S P R

64. b c d * f g h a a 103 t *00 t . . HuHp p chain E N A ------T AND I L LKMLV - EKVVLHPN[1S QJ----- *BoTr V L G Q D EV E FIS Sl KvEHPIS|Y|NSIVHPSYNSN- T L N NID I D - BoCh A V V A G E F D GS S STJIIQIKILIK - K V|F K N S K|Y|N SE T I N N|D I t HuThr B chain N K N F T E N D L L VHI G K Hj1REjR3JE R N IEKISMLE K IVIHPR N W R EN DRDI HuPI B chain P S S Y K - -- - - V I L Gp H Q E V N L E P H V EI *VRi F -PT K.R

104 a b c a 147b

HuHp P chain LGilI KKLKOK~~VSVNL KQ K ERVMLM~7L P - - KD VU¶VW~~V S G W GN---[N BoTr ML I K L K AA S L N R V AS IB[L P- -- -T S C A G T Q C 1 S G W G N T K S S- BoCh A T L L K LI T A A F SO T V S A V C L P - - -S AS DDFA A - G T T C V T G W G L T R TT - HuThr B chain LFl K K P VAFSD Y I HVCLPIN R E TA AS iGA - G YKG TGYG NL KSTVT HuPI B chain L LLJSSPAVEJTL I A - SP N YV [ I R T E C F I T G WGIE T -

147c d a a b c d a f g h j k I I80m HuHp P chain FKIT HKVVMETLM ADOD I [HY E G TV P K KEP K S P V G V Q P I L N

BoTr .G T SYNP D V L K CLK JP II -N0S1CKS Y G T SNM - BoCh A E ANTPDRiLOOA SlPLLS T NCKEKKI-KGAMT------HuThr B chain A D V G K G P S ViLO VEIN LE L VO R P VCK D -ST RI RI TDNM-M HuPI B chain G T F G A G LI KE A QI P VIK V NE-1FLNG RVQ E

I SOn o p a b c a b a b 220

HuHp B chain E H T F C AG- M S K Y 0 E . -ID T C Y GD A S A A V H D LjE TE A T G I L S F D K C

BoTr - C A GIY - L E G G K S CI|G DlSIG|G P VV CSGK QIG I V S WIG S G C

BoCh A C A - - A S G V S S C M G D S G G P L V C K K N G VG I V S W G S S T C

HuThr B chain - FC A GGI-Y K P D EGCGG K R G D C E G D S GG PEIV M K S P F N N ATiLR WV0 M|G I V S W|G E G - HuPI B chain -LCAGIH- LA GG T- -D S CIO| DI SIGG P L V - C FKK IL Q G L G-

221 a a 245b

HuHpochain E A VEE G V Y V K V T S[Q N VQKTIA E W. BoTr A G K N K -GV Y K V N I I S N BoChA Fj1- T S T G V Y A R V A L V WV 0 T L A A N. . HuThr B chain D R D K G EY T V R L K KW I Q K VI D 0 F G E- HuPIBchain A R P N K P|G V Y V R V SIR F V TLW I E V M R N FIG. 3. Comparison ofthe ft chain ofhuman Hp (HuHp) to bovine trypsin (BoTr) (30), bovine chymotrypsin A (BoCh A) (31), human thrombin B chain (HuThr B) (32), and human plasmin B chain (HuPl B) (33). Numbering is that of bovine chymotrypsinogen A with insertions indicated by letters as 62A, 62B, etc. Residues in the serine proteases that are chemically similar to residues in hp f chain are boxed in. Single-letter amino acid abbreviations are: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; Y, Tyr. Chemically similar residues are: I = L = V, G = A, T = S, Y = F = W, D = E, D = N, Q = N, E = Q, and K = R. *Residues determined to be internal and inaccessible to water molecules in the three-dimensional structures of a-chymotrypsin, trypsin, and elastase (34). tCovalent attachment of carbohydrate.

not agree with the conclusions from a single family study of Hp structures of trypsin, elastase, and a-chymotrypsin by Greer Bellevue by Javid (41), which suggested that the genes coding (42) was in excellent agreement and was significantly better for the a and ,B chains are not linked. Javid's results, however, than a similar comparison between microbial and mammalian can also be explained by other genetic mechanisms such as serine proteases. crossing-over. Comparison of the hp chain to various serine proteases (Fig. Homology of Hp to the serine proteases is also supported by 3) revealed a fairly consistent degree of identity (29-3%) with comparative model building of Hp to known three-dimensional no clear indication of a preferred comparison. Comparison of structures of the serine proteases. Computer fitting of the pri- the hp a1 chain to activation peptides of the serine proteases mary structure of the hp chain to the three-dimensional demonstrated the greatest degree of identity (excluding the Downloaded by guest on October 1, 2021 Biochemistry: Kurosky et al. Proc. Natl. Acad. Sci. USA 77 (1980) 3391

a a a b c d 39

HuProThr A kr /C A E G L G T N Y R G N V S I T R S G I E CO L W R S R YP H K P E I N S T T H P G A D L 0 E N F C HuProThr S kr /C V P D R G 0 0 Y 0 G R L A V T T H G L P C L A W A S A QA K A L S K - HO3 D F N S A V O L V E N F C R HuPigen I kr /C K T G D G K N Y R G T M S K T K N G I T C OJJW SS T S P H R P R F- S P A T H P S E G L E E N Y C R HuPlgen 2 kr /C M H C S G E N Y D G K I S K T M S G L E C O A W D S a S P H A G Y - I P SK F P N K N L K N Y C R HuPlgen 3 kr /C L K G T G E N Y R G N V A V TE[S G H T C O H W S A 0 T P H T HN R - T P EN F PJK N L DE N Y C HuPlgen 4 kr /C Y H G D G 0 S Y R G T S S T T T T G K K C O S W S S M T P H R H O K - T P E N P N A G L T M N Y C R R HuPlgen 5 kr /C M F G D G K G Y R G K R A T TV T G T P C 0 D W A A[] E PH R H S I F T P E T N P R A G LE K YNY C K HuHp a chain .V D S G D V T DUA D D G C P~g P El A H Y V E - HS E3]OQ - - - - K NY V

40 a a b a b c a 83

HuProThr A kr N PD!SS IT PWC THP TAR R -[jE C ---S PVC D V TV MV T P R/ HuProThr S kr N P D G D E GW C Y A G K P G D F - G Y C - - - D Y C E E A V EEE T G D G L rDJE D S D EE/

HuPigen 1 kr N P D N D P OSPWC Y T D P7R Y VY C - -D ECE EE/

HuPlgen 2 kr N P D R E L R P W C F T T ID PN KR W -E L C - -D I P R C T PP P S S[3P T Y Q/

HuPlgen 3 kr N P D G K RAP W C HT TNSVR W- E Y C K I PS|C DS S P V S TEE L A P T A PP E T/

HuPlgen 4 kr N P D A KGPW CTTIDP S V R W E Y C - -L CSGT E A SVVP PPV V L LfNV/ HuPlgen 5 kr N P D G N V JGKW C|Y T T|N P .[Y C DV P Q CAAP S F DCG K P0 VE K K C P G/

HuHp a chain L R -GfDG V--T L N NE K W I N K A V G DK L P E C E A- - -V C G K P K-P A N P VO HuThr A chain .T A TSE T f F N PR T f G S G .- .NC GIL R P L F E K K SL BoCh A chain C6VHA I EEJV L S G LI FIG. 4. Comparison of the a' chain of human Hp to the activation peptide regions of human prothrombin (HuProThr) (35), human plas- minogen (HuPIgen) (36), and bovine chymotrypsinogen A (BoCh A) (31). A kr and S kr are the kringle regions of prothrombin and 1 kr-5 kr are the five kringle regions ofplasminogen (36). Residues in the activation peptide regions chemically similar to residues in hp a1 chain are boxed (see Fig. 3). The numbering is that of human hp a' chain with insertions indicated by letters. *Demarcates the actual kringle structures (see ref. 37). tCovalent attachment of carbohydrate.

short A chain of chymotrypsin) with the fifth kringle region of partial gene reiteration in their activation peptide region. If one plasminogen (25% identity). This is interesting in view of the assumes a single-chain structure for Hp, the duplication of fact that, like Hp, plasminogen and prothrombin both reflect human hp al to give rise to a2 is similar to the partial gene

...... 1 A ...... aWt-l z .....j |k k .... k Haptoglobin Prothrombin Factor X Chymotryp- Plasminogen Plasminogen sinogen FIG. 5. Two most parsimonious trees (A and B) relating Hp to the chymotrypsinogen family of serine proteases. Only the kringle region of plasminogen is shown for tree B because the trees were otherwise virtually identical. The sequences comprising the kringle regions are as defined in Fig. 4 and are represented by the smaller boxes. The enzyme regions are the sequences given in Fig. 3 and are represented by the larger boxes. Intervening and additional sequences are denoted by a dashed line. Empty solid boxes represent the sequences used for computation. Filled boxes are ancestral elements that undergo a partial gene duplication at that point. The numbers on the legs are the minimal number of nucleotide substitutions during the interval between two solid boxes (empty or filled). Because they are uncorrected for multiple substitutions at the same site, no estimates oftime or rate are warranted. Accordingly, node positions are arbitrary. Question mark denotes that the presence of a linkage between two gene fragments is uncertain. Tree A required a total of 1222 nucleotide substitutions; tree B required 1221. Both trees required 776 substitutions in the enzyme region. Downloaded by guest on October 1, 2021 3392 Biochemistry: Kurosky et al. PProc. Natl. Acad. Sci. USA 77 (1980) duplication that resulted in the kringle regions in both pro- 6. Smithies, O., Connell, G. E. & Dixon, G. H. (1962) Am. J. Hum. Genet. 14, 14-21. thrombin and plasminogen. Moreover, there is also evidence 7. Beam, A. G. & Franklin, E. C. (1958) Science 128,596-597. of a triplication of hp a chain (Hp Johnson) in the human 8. Malchy, B. & Dixon, G. H. (1973) Can. J. Biochem. 51, 249- population (4, 43). 264. Two evolutionary trees (A and B) relating Hp to the serine 9. Malchy, B., Rorstad, 0. & Dixon, G. H. (1973) Can. J. Biochem. 51,265-273. proteases based upon computer comparison of nucleotide base 10. Ogawa, A. & Kawamura, K. (1966) Proc. Jpn. Acad. 42,413- sequences (Fig. 5) reveal several interesting associations. The 417. most parsimonious phylogenies for the activation peptide region 11. Ogawa, A., Kagiyama, S., Kawamura, K. & Yanase, T. (1970) (Fig. 4) and the enzyme region (Fig. 3) were constructed sep- Proc. Jpn. Acad. 46,814-819. 12. Fuller, G. M., Rasco, M. A., McCombs, M. L., Barnett, D. R. & arately, principally because not all sequences have portions Bowman, B. H. (1973) Biochemistry 12,253-258. belonging to each region, to determine whether the two resul- 13. Pastewka, J. V., Ness, A. T. & Peacock, A. C. (1975) Biochim. tant best trees are mutually consistent. With one exception to Biophys. Acta 386,530-537. were consistent, and they are 14. Sutton, H. E. (1970) in Progress in Medical Genetics, eds. be discussed later, the two trees Steinberg, A. G. & Beam, A. G. (Grune & Stratton, New York), presented jointly in Fig. 5. The order of divergence of trypsi- Vol. 7, pp. 163-216. nogen and chymotrypsinogen from the plasminogen line is not 15. Putnam, F. W. (1975) in The Plasma Proteins, ed. Putnam, F. shown because the orders differed by only one nucleotide W. (Academic, New York), Vol. 2, 2nd Ed., pp. 1-50. 16. Giblett, E. R. (1974) in Structure and Function of the Plasma substitution; however, this tree presents the possibility that the Proteins, ed. Allison, A. C. (Plenum, New York), Vol. 1, pp. divergence of trypsinogen and chymotrypsinogen (also pro- 55-72. elastase) included a deletion of a kringle-like segment. Among 17. Chiancone, E., Alfsen, A., Toppolo, C., Vecchini, P., Finazzi Agr, the observations of particular note are the partial internal gene A., Wyman, J. & Antonini, E. (1968) J. Mol. Bid. 34,347-356. 18. Gordon, S. & Beam, A. G. (1966) Proc. Soc. Exp. Biol. Med. 121, duplications of the kringle regions of plasminogen and pro- 846-850. thrombin. Strikingly, the A kringle of prothrombin is much 19. Barnett, D. R., Lee, T.-H. & Bowman, B. H. (1972) Biochemistry more similar to the kringles of plasminogen than to the S krin- 11, 1189-1194. gle. This suggests that the A kringle of prothrombin was derived 20. Kurosky, A., Barnett, D. R., Rasco, M. A., Lee, T.-H. & Bowman, B. H. (1974) Biochem. Genet. 11, 279-293. from a plasminogen kringle long after both gene functions had 21. Black, J. A. & Dixon, G. H. (1970) Can. J. Biochem. 48, 133- been established. Finally, the phylogenetic consistency of the 146. two regions examined (activation peptide region and enzyme 22. Malchy, B1. & Dixon, G. H. (1973) Can. J. Biochem. 51, 321- region) gives further evidence that these sequences have been 322. 23. Kuiosky, A., Kim, H.-H. & Touchstone, B. (1976) Comp. Bio- genetically linked for an extended period of their evolutionary chem. Physiol. 55B, 453-459. history and suggests that this is also true for the a and 3 chains 24. Kurosky, A., Hay, R. E., Kim. H.-H., Touchstone, B., Rasco, M. of Hp. The only exception to the mutual consistency of the trees A. & Bowman, B. H. (1976) Biochemistry 15,5326-5336. for the two regions is kringle A of prothrombin as noted 25. Kurosky, A. & Hofmann, T. (1976) Can. J. Biochem. 54,872- 884. above. 26. Kurosky, A., Barnett, D. R., Rasco, M. A., Lee, T.-H. & Bowman, The present evidence strongly indicates that the Hp gene B. H. (1974) in Protides of the Biological Fluids, Proceedings resulted from a duplication of a serine protease precursor gene of the Colloquium, ed. Peeters, H. (Pergamon, New York), Vol. that subsequently diverged, resulting in a loss of proteolytic 22, pp. 597-602. 27. Fitch, W. M. (1966) J. Mol. Biol. 16,9-16. function and acquisition of a new function. Because Hp has 28. Fitch, W. M. (1970) J. Mol. Biol. 49, 1-14. been reported to occur in early vertebrates such as eel (44), it 29. Needleman, S. & Wunsch, C. D. (1970) J. Mol. Biol. 48, 443- cannot be considered a new protein provided that the eel and 453. human sequences are orthologous. Therefore, why should the 30. Mikes, O., Holeysovsky, V., Tomasek, V. & Sorm, F. (1966) Biochem. Biophys. Res. Commun. 24,346-352. structure of Hp have been so well conserved over such a large 31. Blow, D. M., Birktoft, J. J. & Hartley, B. S. (1969) Nature (Lon- evolutionary time span? One obvious possibility is that the new don) 221, 337-340. function acquired by Hp involves a considerable portion of the 32. Butkowski, R. J., Elion, J., Downing, M. R. & Mann, K. G. (1977) molecule, resulting in a low substitution rate. Some evidence J. Biol. Chem. 252,4942-4957. j3 33. Wiman, B. (1977) Eur. J. Biochem. 76,129-137. of a moderately low substitution rate for the hp chain was 34. Stroud, R. M., Kay, L. M. & Dickerson, R. E. (1971) Cold Spring reported (23). The strong binding of Hp to Hb could explain Harbor Symp. Quant. Biol. 36,125-140. such a molecular constraint and thereby emphasizes Hb binding 35. Walz, D. A., Hewett-Emmett, D. & Seegers, W. H. (1977) Proc. as an important Hp function. Natl. Acad. Sci. USA 74,1969-1972. 36. Sottrup-Jensen, L., Claeys, H., Zajdel, M., Peterson, T. E. & We thank Dr. Jonathan Greer for allowing us to preview his man- Magnusson, S. (1978) in Progress in Chemical Fibrinolysis and uscript. We are indebted to Linda Merryman, Horace D. Kelso, Fu-Mei Thrombolysis, eds. Davidson, J. F., Rowan, R. M., Samama, M. Lo, Terry M. Ward, and Ronald Niece for excellent technical assis- M. & Desnoyers, P. C. (Raven, New York), VoL 3, pp. 191-209. 37. Magnusson, S., Peterson, T. E., Sottrup-Jensen, L. & Claeys, H. tance. We thank Dr. Julian Smith (Department of Gynecology, M.D. (1975) in Proteases and Biological Control, Cold Spring Harbor Anderson Hospital, Houston, Texas) for ascites fluid. This work was Conferences on Cell Proliferation, eds. Reich, E., Rifkin, D. B. supported by Grant HD 03321 from the National Institute of Child & Shaw, E. (Cold Spring Harbor Laboratory, Cold Spring Harbor, Health and Human Development, by Grant H-378 from the Robert NY), Vol. 2, pp. 123-149. A. Welch Foundation, by Grant CA 17701 from the National Cancer 38. Black, J. A., Chan, G. F. Q. & Dixon, G. H. (1970) Can. J. Bio- Institute, by the Harris and Eliza Kempner Fund, by the Burkitt chem. 40, 123-132. Foundation, and by Grant DEB-7814197 from the National Science 39. Barnett, D. R., Kurosky, A., Fuller, G. M., Kim, H.-H., Rasco, M. A. & Bowman, B. H. (1974) in Protides of the Biological Fluids, Foundation. Proceedings of the Colloquium, ed. Peeters, H. (Pergamon, New 1. Smithies, 0. (1955) Biochem. J. 61, 629-641. York), Vol. 22, pp. 589-595. 2. Smithies, 0. &,Walker, N. F. (1955) Nature (London) 176, 40. Kurachi, K. & Davie, E. W. (1977) Biochemistry 16, 5831- 1265-1266. 5839. 3. Smithies, 0. & Walker, N. F. (1956) Nature (London) 178, 41. Javid, J. (1967) Proc. Natl. Acad. Sci. USA 57,920-924. 694-695. 42. Greer, J. (1980) Proc. Natl. Acad. Sci. USA 77,3393-397. 4. Smithies, 0., Connell, G. E. & Dixon, G. H. (1962) Nature 43. Smithies, 0. (1964) Cold Spring Harbor Symp. Quant. Biol. 29, (London) 196,232-236. 309-319. 5. Black, J. A. & Dixon, G. H. (1968) Nature (London) 218,736- 44. Kodama, M., Hashimoto, K. & Matsuura, F. (1975) Bull. Jpn. Soc. 741. Sci. Fish. 41, 1015-1019. Downloaded by guest on October 1, 2021