Volume 14 Number 2 1986 Nucleic Acids Research

A new mamber of the plasma protease inhibitor famfly

Hermann Ragg

Hoechst AG, D-6230 Frankfurt 80, FRG

Received 30 September 1985; Revised and Accepted 2 December 1985

ABSTRACT A 2.1-kb cDNA clone representing a new member of the prote- ase inhibitor family was isolated from a human liver cDNA libra- ry. The inhibitor, named human Leuserpin 2 (hLS2), comprises 480 amino acids and contains a leucine residue at its putative reac- tive center. HLS2 is about 25-28% homologous to three human mem- bers of the plasma protease inhibitor family: III, o(l-antitrypsin and ol-antichymotrypsin. A comparison with publi- shed partial amino acid sequences shows that hLS2 is closely re- lated to the inhibitor cofactor II.

INTRODUCTION Plasma protease inhibitors are involved in the control of blood , complement activation and aspects of inflam- mation (1). Most of these inhibitors interact with their protea- se counter parts by forming 1:1 complexes. The inhibitors are believed to serve as bait by presenting their reactive site as a substrate to the appropriate protease (1,2), a process followed by a strong association between the inhibitor and the protease with the consequence that both molecules become inactive. It has been suggested that the protease specificity of the inhibitors is determined, at least in part, by a single amino acid (P1) at the reactive center (2). This view is strongly sup- ported by the recent identification of a naturally occuring mu- tant of o(l-antitrypsin (an inhibitor of elastase), in which a methionine residue at the reactive center has been changed to arginine. This amino acid is usually found in the Pl-position of antithrombin III (ATIII), and consequently the mutated o(l- antitrypsin molecule acts as a thrombin inhibitor (3). It has recently been proposed that several inhibitors including ATIII, .ol-antitrypsin,o(1-antichymotrypsino-

C IRL Press Umited, Oxford, England. 1073 Nucleic Acids Research

gen and mouse contrapsin (4) belong to a family of which has evolved from a common ancestor over about 500 million years (5-7). Surprisingly, and angiotensinogen are also mem- bers of this family (5,8,9). Angiotensinogen is the precursor of I and II, which participate in the regulation of blood pressure and water balance (10). An inhibiting function, however, has not been attributed to the parent molecule. The biological function of ovalbumin is unknown. The plasma protease inhibitors perform important physiolo- gical functions. o(l-antitrypsin and ATIII deficiencies, which may be inherited or acquired, can result in serious illnesses. Lack of o(l-antitrypsin may be associated with emphysema, a con- dition which is probably due to the uncontrolled proteolytic ac- tion of elastase - a target enzyme of 0(l-antitrypsin (11,12). ATIII inhibits thrombin and several other coagulation factors (13,15). Individuals displaying ATIII deficiency are prone to spontaneous and the risks associated with it. Many of the reactions which involve ATIII, are accelerated by the mucopolysaccharide heparin (16) and, until recently, the anticoagulant effect of heparin has been attributed to ATIII (13). Increasing evidence that heparin activates another inhibi- tor of thrombin has resulted in the isolation and characteriza- tion of heparin cofactor II - also termed antithrombin BM - (17,18). Both ATIII and heparin cofactor II are thrombin in- hibitors, however, in contrast to ATIII, the activity of heparin cofactor II against other blood coagulation factors is limited. In two recent reports (19,20) heparin cofactor II deficiency has been shown to be associated with thrombosis. This article describes the isolation and characterization of a cDNA clone which encodes a new member of the plasma protea- se inhibitor family which is closely related to heparin cofactor II.

MATERIALS AND METHODS Synthesis of oligonucleotides. Oligonucleotides were synthesized as previously described (21). They were purified by preparative poylacrylamide gel elec- trophoresis, followed by HPLC on a Du Pont Zorbax C8-column (22).

1074 Nucleic Acids Research

RNA preparation and analysis. Total RNA was isolated from frozen human liver biopsy mate- rial using a modified version of the guanidinium thiocyanate me- thod (23). After pelleting through a cushion of 5.7 M cesium chlo- ride, the RNA was dissolved in 50 mM Tris-HCl, 10 mM EDTA, 0.2 % SDS, pH 7.4, extracted with chloroform/l-butanol and reprecipita- ted twice with ethanol. The precipitate was redissolved and ex- tracted twice with phenol and once again with chloroform. One vo- lume consisting of 8 M urea and 4 M LiCl was added to the final aqueous phase and the RNA precipitated at o0C overnight (24). To- tal RNA was fractionated by two passages over oligo(dT)-cellulose to obtain poly(A)+RNA (25). Agarose and trans- fer of RNA to nitrocellulose paper were performed as described (26). cDNA cloning Synthesis of cDNA was basically performed as described (27). One unit RNAsin (Biotec) was included per p1 final volume during first strand synthesis. The Klenow fragment (Boehringer Mannheim) of DNA polymerase I (100 units/200 p1) was used for second strand synthesis. The reaction was incubated for 4 h at 15°C and for a further 2 h at 20°C. Single-stranded ends and loop structures were digested with nuclease S1 and the cDNA was subsequently repaired with the Klenow fragment of DNA polymerase I (28). The double- stranded cDNA was ligated into the vector pUC 13, a derivative of the pUC plasmid series (29), which had been treated with SmaI and calf intestine phosphatase. E. coli HB101 was transformed and se- lected for ampicillin resistance. For synthesis of cDNA by speci- fic priming, 10 pg of poly(A)+RNA and 500 pmol of 5'-end-labeled primer (5'CCCGGGGTGTCAGTTGCGCTTCGA) were heated for 3 min at 70°C in 1 mM Tris-HCl pH 7.5, 0.1 mM EDTA, 0.5 M KC1 (total volume - 52 p1) and subsequently cooled for 30 min to 43°C. The subse- quent reactions were in accordance with published procedures (30). T4 DNA polymerase (PL-Biochemicals) was used to produce blunt ends (31). 32p 5'-end-labeled HindIII-linkers (5'pGCAAGCTTGC, BRL) were attached (32) and after cleaving with HindIII, the cDNA was liga- ted into pAT1S3 (33) which had been cleaved with HindIII and treated with phosphatase.

1075 Nucleic Acids Research

Screening of the cDNA libraries Colonies were grown on nitrocellulose f.ilters, incubated with chloramphenicol and prepared for hybridization as described (34). Prehybridization (4-8 h) and hybridization (15-24 h) were carried out in 6 x SET (1 x SET = 0.15 M NaCl, 30 mM Tris-HCl, 1 mM EDTA, pH 8.0) (34), 5 x Denhardt's solution, 0.2% SDS, 200 pg/ml calf thymus DNA, 200 pg/ml yeast RNA, 0.5% (R) Nonidet P-40 at 420C with 1.25 pmol/ml 32p 5'-end-labeled oligonucleotide probe (8 x 104 c.p.m./pmol) or with nick-translated restriction frag- ments ( ) 108 c.p.m./pg). Oligonucleotides used for hybridization and primer extension were labeled by phosphorylation of the 5'-end with ( J-32p) ATP (New England Nuclear) using T4 polynucleotide kinase (28). The filters used for screening with oligonucleotides were washed twice in 6 x SSC (1 x SSC = 0.15 M NaCl, 15 mM sodium citrate, pH 7.0) for 20 min at room temperature, and twice in the same solution at 330C. The filters used for hybridization with nick-translated restriction fragments were washed for one 15-minute period in each of the following: 6 x SSC at room tempe- rature, 2 x SSC at 420C, 1 x SSC at 50°C and finally in 0.1 x SSC at 50°C. Sequence determinations. Plasmid DNA was cleaved with the appropriate restriction en- zymes and the fragments were labeled at the 3'-end by filling in or at the 5'-end by treating with kinase (28). After secondary cleavage and agarose gel electrophoresis, the DNA fragments were electroeluted, subjected to the chemical degradation procedure (35) and analyzed on thin sequencing gels (36).

RESULTS In an attempt to isolate cDNA sequences which encode plasma protease inhibitors, a human liver cDNA library was screened using an oligonucleotide (5'-GGGTTGGCTACTCTGCCCATGAAGA) complementary to the coding strand of an amino acid sequence which is strongly con- served in ATIII, o(l-antitrypsin and ovalbumin (Figure 1). Under moderately stringent hybridization and washing conditions, 58 hy- bridization-positive clones were isolated from about 24 000 trans- formants. Plasmid DNA from these clones was analyzed using re-

1076 Nucleic Acids Research

o( 1-antitrypsin: -Leu-Phe-Met-Gly-Lys-Val-Val-Asn-Pro- ATIII: -Ile-Phe-Met-Gly-Arg-Val-Ala-Asn-Pro- Ovalbumin: -Leu-Phe-Phe-Gly-Arg-Cys-Val-Ser-Pro- Figure 1. Alignment of the conserved amino acid sequences of uj- antitrypsin, ATIII and ovalbumin (7), from which the oligonucleo- tide probe used for screening the cDNA library was deduced. The actual positions of the amino acid blocks in the respective pro- teins are 383-391 ( ol-antitrypsin) 421-429 (ATIII) and 377-385 (ovalbumin). striction enzymes and one of the clones (pL10/2), containing a 1.5 kilobase (kb) insert, displayed none of the restriction frag- ments to be expected in ATIII or o(l-antitrypsin cDNA (56, 57). Southern analysis of a series of double digestions using various restriction enzymes localized the hybridizing sequence in a 160-bp PvuII-EcoRI fragment (data not shown). Subsequent sequencing of this region revealed a 24-base sequence which, with the exception of two mismatch positions was complementary to the oligonucleotide used for hybridization. This sequence could potentially encode the amino acids Phe-Met-Gly-Arg-Val-Ala-Asn which show a high degree of to the conserved C-terminal amino acid sequence in ATIII, o(l-antitrypsin and ovalbumin (Figure 1). Furthermore, plasmid pL10/2 showed an in-frame stop codon close to the homology region in an identical position to that in ATIII and o(l-antitryp- sin (Figure 5). Further sequencing identified an open reading frame coding for 285 amino acids. Multiple stop codons occurred in both of the other reading frames. The TAG termination codon was followed by a 3'-untranslated region, which was 636 nucleotides in length and contained a single poly(A) addition signal AATAAA (positions 2075-2080) (37,38). No poly(A)-tail could be found, possibly due to the action of RNase H during reverse transcrip- tion (58). Since the members of the plasma protease inhibitor gene fami- ly comprise between 385 and 452 amino acid residues (7,9), and no initiator ATG codon could be found in the vicinity of the 5'-end of the cDNA present in plasmid pL10/2, the size of the correspon- ding mRNA was determined by Northern blotting. Poly(A)IRNA was denatured, run on a 1.1% agarose gel and transferred to nitrocel- lulose (26). A 788-bp AccI fragment from plasmid pL10/2 (the se-

1077 Nucleic Acids Research

0 0,5 1,0 1,5 2,0 [kbl I I I I I

u E0 E D>CE ( Ma )

%r II I ] I IQ.

pH14

pL 1012

Figure 2. Restriction map representing the cDNA of hLS2. The map displays the major restriction sites used for sequencing: - closed box: coding region; open box: 3'-untranslated region. The cDNA in- serts of the plasmids pH14 and pL10/2, from which the hLS2 cDNA was assembled, are represented by thin lines. cond AccI site is located in the polylinker region of the vector pUC13), which did not comprise the nucleotide sequence used for screening the cDNA library was nick-translated (28) and used for hybridization under high stringency conditions. A single signal corresponding to about 2.3-kb appeared (Figure 4), indicating that there was no cross-hybridization with related RNAs of different length. Since previous rescreening of the cDNA library yielded no further clones which extended beyond the 5'-end of the cDNA of clone pL10/2, a second round of cDNA synthesis was undertaken. For specific priming of cDNA synthesis another oligonucleotide (3'AGCTTCGCGTTGACTGTGGGGCCC) was used which was complementary to nucleotides 1051-1074 (Figure 3). After producing blunt ends with T4 DNA polymerase, HindIII linkers were added to the double stran- ded DNA and after cleaving with HindIII the cDNA was ligated into the vector pAT1S3 which had been cleaved by HindIII and treated with phosphatase. Two clones with inserts of about 1060 and 450-bp were isolated by hybridization with the aforementioned nick-translated AccI fragment. The larger clone, designated pH14, was analyzed by restriction mapping and DNA sequencing. Inspection of the DNA sequence revealed that the insert of plasmid pH14 par- tially overlapped with the cDNA of pL10/2 and extended beyond the

1078 Nucleic Acids Research

5'-end of the latter clone. The nucleotide sequence in the over- lapping region of both clones was identical. The whole cDNA se- quence was assembled by ligating appropriate restriction fragments from clones pH14 and pL10/2 (Figure 2) into the vector pUC 13. The result of this was a plasmid (phLS2) with an insert of 2081-bp, which is close to the value expected from the Northern blotting experiment. The reading frame shown in Figure 3 is the only one without termination codons and comprises 480 amino acids specified by 1440 nucleotides. Neither a translational initiation site nor any sign of a putative signal could be found, indicating that some 100-200 nucleotides are missing at the 5'-end of the clone. Inspection of the deduced amino acid sequence (Figure 3) re- vealed that the coding region of hLS2 contained a block of acidic amino acids (positions 53-64), a shortened version of which reoc- curred with little variation after a short distance (amino acid residues 69-76). The presence of 3 cysteine residues located at the positions 273, 323 and 467 might indicate the existence of a single intramolecular bond (see discussion). There are three potential N-glycosylation signals (positions 30-32, 169-171 and 368-370) corresponding to the canonical Asn-X-Ser/Thr sequen- ce. The amino acid sequence Asn-Pro-Ser (residues 476-478) is probably not suited for the attachment of carbohydrate chains (39). The amino acid sequences of three human members of the plasma protease inhibitor gene family and the coding region of hLS2 were aligned with the aid of a computer program. Figure 5 clearly de- monstrates the relationship of the . 55 amino acids (marked by an asterisk) are invariant in all four sequences. The homology between hLS2 and each of the other three proteins ranges between 25 and 28%. The reactive site region of the plasma protea- se inhibitors is located close to the COOH-terminus of these proteins (Figure 5). For homology reasons, it is suggested, that the amino acids Leu-Ser (positions 444-445) are situated in the reactive center of hLS2.

DISCUSSION Using an oligonucleotide (2Smer) derived from a region in AT III, (l-antitrypsin and ovalbumitn, which is conserved both on

1079 Nucleic Acids Research

GT 2 1 10 20 Gly Ser Lys Gly Pro Leu Asp Gln Leu Glu Lys Gly Gly Glu Thr Ala Gln Ser Ala Asp GGG AGC AA GGC CCG CTG GAT CAG CTA GAG MA GGA GGG GAA ACT GCT CAG TCT GCA GAT 62 21 30 40 Pro Gln Trp Glu Gln Leu Asn Asn Lys Asn Leu Ser Met Pro Leu Leu Pro Ala Asp Phe CCC CAG TGG GAG CAG TTA AAT MCMAAA C CTG AGC ATG CCT CTT CTC CCT GCC GAC TTC 122 41 S0 60 His Lys Glu Asn Thr Val Thr Asn Asp Trp Ile Pro Glu Gly Glu Glu Asp Asp Asp Tyr CAC MG GAA MC ACC GTC ACC MC GAC TGG ATT CCA GAG GGG GAG GAG GACGGACGAC TAT 182 61 70 80 Leu Asp Leu Glu Lys Ile Phe Ser Glu Asp Asp Asp Tyr Ile Asp Ile Val Asp Ser Leu CTG GAC CTG GAG MG ATA TTC AGT GAA GAC GAC GAC TAC ATC GAC ATC GTC GAC AGT CTG 242 81 90 100 Ser Val Ser Pro Thr Asp Ser Asp Val Ser Ala Gly Asn Ile Leu Gln Leu Phe His Gly TCA GTT TCC CCG ACA GAC TCT GAT GTG AGT GCT GGG AAC ATC CTC CAG T T[T CAT GGC 302 101 110 120 Lys Ser Arg Ile Gln Arg Leu Asn Ile Leu Asn Ala Lys Phe Ala Phe Asn Leu Tyr Arg MG AGC CGG ATC CAG CGT CTT MC ATC CTC MC GCC MG TTC GCT TTC MC CTC TAC CGA 362 121 130 140 Val Leu Lys Asp Gln Val Asn Thr Phe Asp Asn Ile Phe Ile Ala Pro Val Gly Ile Ser GTG CTG AAA GAC CAG GTCMC ACT TTC GAT GC ATCTG C ATA GCA CCC GTT GGC ATT TCT 422 141 150 160 Thr Ala Met Gly Met Ile Ser Leu Gly Leu Lys Gly Glu Thr His Glu Gln Val His Ser ACT GCG ATG GGT ATG ATT TCC TTA GGT CTG MG GGA GAG ACC CAT GAA CAA GTG CAC TCG 482 161 170 180 Ile Leu His Phe Lys Asp Phe Val Asn Ala Ser Ser Lys Tyr Glu Ile Thr Thr Ile His ATT ITG CAT TTT AMA GAC TTT GTT T GCC AGC AGCMG TAT GA ATC ACG ACC ATT CAT 542 181 190 200 Asn Leu Phe Arg Lys Leu Thr His Arg Leu Phe Arg Arg Asn Phe Gly Tyr Thr Leu Arg MT CTC TTC CGT MG CTG ACT CAT CGC CTC TTC AGG AGG MT 1TT GGG TAC ACA CTG CGG 602 201 210 220 Ser Val Asn Asp Leu Tyr Ile Gln Lys Gln Phe Pro Ile Leu Leu Asp Phe Lys Thr Lys TCA GTC MT GAC CTT TAT ATC CGAMG CAG T CCA ATC CTG CTT GACGTC MA ACT AA 662 221 230 240 Val Arg Glu Tyr Tyr Phe Ala Glu Ala Gln Ile Ala Asp Phe Ser Asp Pro Ala Phe Ile GTA AGA GAG TAT TAC TTT GCT GAG GCC CAG ATA GCT GACTG C TCA GACCCG GCC TGC ATA 722 241 250 260 Ser Lys Thr Asn Asn His Ile Wiet Lys Leu Thr Lys Gly Leu Ile Lys Asp Ala Leu Glu TCA AA. ACC MC MC CAC ATC ATG MG CTC ACC MG GGC CTC ATA AM GAT GCT CTG GAG 782 261 270 280 Asn Ile Asp Pro Ala Thr Gln Met Met Ile Leu Asn Cys Ile Tyr Phe Lys Gly Ser Trp MT ATA GAC CCT GCT ACC CAG ATG ATG ATT CTC MC TGC ATC TACGTC AM GGA TCC TGG 842 281 290 300 Val Asn Lys Phe Pro Val Glu Met Thr His Asn His Asn Phe Arg Leu Asn Glu Arg Glu GTG MT MA TTC CCA GTG GAA ATG ACA CAC MC CGC MGC TGC CGG CTG MT GAG AGA GAG 902

1080 Nucleic Acids Research

301 310 320 Val Val Lys Val Ser Met Met Gln Thr Lys Gly Asn Phe Leu Ala Ala Asn Asp Gln Glu GTA GTMAAG GTr TCC ATG ATG CAG ACC AAGGGG AAC TTC CTC GCA GCA AAT GAC CAG GAG 962 321 330 340 Leu Asp Cys Asp Ile Leu Gln Leu Glu Tyr Val Gly Gly Ile Ser Met Leu Ile Val Val CTG GAC TGC GAC ATC CTC CAG CTG GAA TAC GTG GGG GGC ATC AGC ATG CTA ATr GTG GTC 1022 341 350 360 Pro His Lys Met Ser Gly Met Lys Thr Leu Glu Ala Gln Leu Thr Pro Arg Val Val Glu CCA CAC MG ATG TCT GGG ATG MG ACC CTC GAA GCG CM CTG ACA CCC CGG GTG GTG GAG 1082 361 370 380 Arg Trp Gln Lys Ser Met Thr Asn Arg Thr Arg Glu Val Leu Leu Pro Lys Phe Lys Leu AGA TGG CM AAA AGC ATG ACA MC AGA ACT CGA GAA GTG CTT CTG CCG AAA TTC MG CTG 1142 381 390 400 Glu Lys Asn Tyr Asn Leu Val Glu Ser Leu Lys Leu Met Gly Ile Arg Met Leu Phe Asp GAG MG MC TAC MT CTA GTG GAG TCC CTG MG TTG ATG GGG ATC AGG ATG CTG ¶TT GAC 1202 401 410 420 Lys Asn Gly Asn Met Ala Gly Ile Ser Asp Gln Arg Ile Ala Ile Asp Leu Phe Lys Hlis AM MT GGC MC ATG GCA GGC ATC TCA GAC CAA AGG ATC GCC ATC GAC CTG TTC MG CAC 1262 421 430 440 Gln Gly Thr Ile Thr Val Asn Glu Glu Gly Thr Gln Ala Thr Thr Val Thr Thr Val Gly CM GGC ACG ATC ACA GTG MC GAG GAA GGC ACC CAA GCC ACC ACT GTG ACC ACG GTG GGG 1322 441 450 460 Phe Met Pro Leu Ser Thr Gln Val Arg Phe Thr Val Asp Arg Pro Phe Leu Phe Leu Ile TTC ATG CCG CTG TCC ACC CAA GTC CGC UTC ACT GTC GAC CGC CCC ITT CTT TTC CTC ATC 1382 461 470 480 Tyr Glu His Arg Thr Ser Cys Leu Leu Phe Met Gly Arg Val Ala Asn Pro Ser Arg Ser TAC GAG CAC CGC ACC AGC TGC CTG CTC TTC ATG GGA AGA GTG GCC MC CCC AGC AGG TCC 1442

xxx TAG AGGTGGAGGTCTAGGTGTCTGMGTGCCTTGGGGGCACCCTCATTTTGMTCCATTCCAACAACGAGAACAGAGA 1520 TGTTCTGGCATCAmACGTAGTTTACGCTACCAATCTGAATTCGAGGCCCATATGAGAGGAGCTTAGAAACGACCAAG 1599 MGAGAGGrGflTC GCACATAGCCCATGCTGTMGCTCATAGMGTCACTGTMCTGTAGTGTGTC 1678 TGCTGTTACCTAGAGGGTCTCACCTCCCCACTCTTCACAGCAAACCTGAGCAGCGCGTCCTMGCACCTCCCGCTCCGG 1757 TGACCCCATCCTTGCACACCTGACTCTGTCACTCAAGCC1TTCTCCACCAGGCCCCTCATCTGAATACCAAGCACAGM 1836 ATGAGTGGTGTGACTMTTCCTTACCTCTCCCAAGGAGGGTACACAACTAGCACCATTCTTGATGTCCAGGGAAGAAGC 1915 CACCTCAAGACATATGAGGGGTGCCCTGGGCTMTGT AGGGCTFMTTTCTCAMGCCTGACCTTTCAAATCCATGA 1994 TGAATGCCATCAGTCCCTCCTGCTGflGCCTCCCTGTGACCTGGAGGACAGTGTGTGCCATGTCTCCCATACTAGAGAT 2073 AAATAAAT 2081

Figure 3. Nucleotide and deduced amino acid sequences of hLS2. The underlined region indicates the nucleotide sequence which hybri- dizes to the oligonucleotide probe used for screening the cDNA library.

1081 Nucleic Acids Research

a b Origin

9 - 28S

8S

P

Figure 4. Determination of the size of hLS2 mRNA by bjot hybridi- zation. 9 pg (a) and 3 pg (b) poly(A)+RNA were analyzed as de- scribed (26) using a nick-translated 788-bp AccI restriction frag- ment derived from the plasmid pL10/2. The size markers used were human rRNA. the amino acid and DNA level has made it possible to isolate a new member of the protease inhibitor gene family. The assembled cDNA insert comprises 2081 bp, a value close to that of the correspon- ding mRNA determined by Northern blotting. This suggests that only about 100-200 bp at the 5'-end of the clone are missing. The cDNA insert encodes a consisting of 480 amino acids. A compari- son with recently published partial amino acid sequences obtained by peptide sequencing (40,41) indicates a strong homology between heparin cofactor II and the factor described here. There is total identity between the 24 amino acids determined by protein sequen- cing at the N-terminus of mature heparin cofactor II and the dedu-

1082 Nucleic Acids Research

GSMGPLDQLEKGGETAQSADPQWE HCII GSKGPLDQLEKGGETAQSADPQWEQLNNKNLSMPLLPADFHKENTVIND HLS2

** * * WIPEGEEDDDYLDLEKIFSEDDDYIDIVDSLSVSPTDSDVSAGNILQLFHGKSRIQRLNILNAKFAFNLYRVLKD HLS2 HGSPVDICrAKPRDIP NPMCIYRSPEKKATEDEGSEQKIPEATNRRVWELSKANSRFATrFYQHLAD ATIII EDPQGDAAQKTDTSHHDQDHPTFNKITPNLAEFAFSLYRQLAH AMAT NSPLDE}NLTQENQDRGTHVDLGLASANVDFAFSLYKQLVL A1AC

** * * ** ** * * * QVNTFDNIFIAPVGISTAMQISLGLKGErHHQVHSILKDFVNASSKYEITrI4LFRKLRLFRR-NFGYT HLS2 SKDNIIFLSPLSISTAFAMrICLGAO4DTLQQIMEVFKFDTISEKTSD----QIHFFFAKILNCRLYRKANKSSK ATIII QSNS-INIFFSPVSIATAFAMLSLGTKADTDEILEG;LNFN-----LTE3IPEA44IHEGFQEILLRTLNQP-DSQLQ A1AT KALDRNVIF-SPLSISTAALSLGAH=LT.EIL------SWPHGDLLRQY.SFQELWPSISS-SDELQ AlAC

* * * * * ** * LRSVNDLYIQKQFPILLDFKTKREYYFAEAQIADFSDPAFISKTNNHIM--KLTKGLIKDA--LENIDPATQ41 HLS2 LVSANRLFGDKSLTFNETYQDISELVYGLQPLDFKENABQSR NKWVSNKTEGRITDVIPSE4INELTVLV ATIII LTrDGGLFLSEGLKLVDKFLEDVKKLYHSEAFNNPTEEKQ-INDYVEKGIQGIVDL--VKELDRIVFA AMAT LSMAMFNVEQLSLLDRFTEDAKRLYGSEAFATDFQDSAAkKL-INDYVKNGTRGKITDL--IKDPDSQTI AMAC

* *** * * * * ** * ,* ILNCIYFKGSWVNKPVf*G4NFRLNEEVKV$S*QTNFL-AANDQEXCDILQLEYVGG-ISLIVWPH HLS2 LVNTIYFKGLWKSKFSPENTRKELFYKADGEssASMMffQEGKFR-YRRVAEGTQVLELP-FKGDDI TMVLILPK ATIII LVNYIFFKGKWERPFEVKTEDFHVDQVKMVPKRGLFN-IICKLSWaVLMGN-ANAIFD AMAT LVNYIFFKAWEMPFDPQThQ5RFYLSKKKWWMV ULTIPYFRDEELSCNVKYTGN-ASALFILPD MAC

* * * * * * * * M;GMKTLEAQL7PRWER)nISKNRTRE-VLLPKNVESLKLMGIRMLFDKNOM-GISDQRIAI HLS2 PEKSLAKVEKELTPEVLQEWLDELEB4.LV-VIWRFRIEDGFSLKBQLQ4MGLVDLFSPEKSKLPGIV-AEGRD ATIII -EGKLQHLENELTHDIITKFLENEDMSAS-LHLPKLSITGTYDLKSVLGQLGITKVFSNGADLS-GVT-EEAP- A1AT -QDDIEEVEAMLLPEFLKRWRDSLEFREIGELYLPKFSISRDYNLNDILLQLGIEAsKADLS-GITGAR--- AlAC

* * * * * ** ** * ** STQVRFIDRPFLFLIYEHPTSTLLFMGRVMAPSRQ HCII DLFK----HQGTITVNEEGTQA---TVrrVGFM-PLSTQVRFIVDRPFLFLIYEHRTSCLLFMGRVANPSRS HLS2 DLYVSDAFHKAFLEVNEEGSEAAASTAVVIAGRS-LNPNRVTFKAWPFLVFIREVPLNTIIFMGRVANPCVK ATIII -LKLSKAVMHAVLTIDEKGrEAAGA---MFLEAI-PMSIPPEVKFNPFVFLIBEQNTKSPLF4GKVNPTQK AMAT NLAVSQVV(VVSDVFEEAG TAVATAVKIThLALVETRTIVRFNRPFLMIIVITlNIFF4KVINPSKPRA MUAC

CIKQWGSQ AlAC Figure S. Amino acid sequence comparison of hLS2, heparin cofactor II and 3 human plasma protease inhibitors which demonstrate an in- hibitor function. The following abbreviations are used: AlAT: o(l-antitrypsin (56), AlCH: c(l-antichymotrypsinogen (6), ATIII: antithrombin III (57), HCII: heparin cofactor II (47), HLS2: human Leuserpin2.The signal peptide regions are not included. Amino acids common in all sequences are denoted by an asterisk and the reactive sites are underlined.

1083 Nucleic Acids Research ced amino acid sequence of phLS2. Furthermore, of 36 amino acids, 33 are conserved at the C-terminus of both proteins (Figure 5). A molecular weight of 65 600 - 70 000 has been estimated for heparin cofactor II, some 10% of which is due to carbohydrate (17). The calculated Mr of hLS2 is 54 968. The presence of 3 potential N- glycosylation sites and other possible post-translational modifi- cations may well increase the actual Mr of hLS2 to the size esti- mated for heparin cofactor II. However, one important fact sug- gests that hLS2 and heparin cofactor II are not identical. There is a cysteine residue in the carboxy-terminal region of hLS2 (amino acid position 467), which may be linked to another cysteine residue in the molecule (cys 273 or cys 323). A disulfide bond (cys 247 - cys 430) links the C-terminus of ATIII to the remainder of the molecule (42). It has been suggested that this disulfide bond is important for the heparin-mediated conformational changes in ATIII, which seem to be connected with the dramatic accelera- tion in the catalytic activity of the protease inhibitor due to the addition of heparin (43). However, in the corresponding posi- tion of heparin cofactor II there is a threonine residue (Figure 5). Apart from possible sequencing errors these results can be explained in several different ways. Since it is not known whether the heparin cofactor II molecule has an intramolecular disulfide bond with a function comparable to that of ATIII, one may argue that hLS2 and heparin cofactor II are allelic. Since polymorphisms are not uncommon in plasma proteins, their existence has to be taken into account. Alternatively hLS2 and heparin cofactor II may derive from different loci as is the case in the 4(-interferon gene family (44). Preliminary results from Southern blotting experiments using a variety of restriction enzymes indicate either the existence of at least one intervening sequence in the gene for hLS2 or the presence of at least two closely related genes per haploid genome (data not shown). Analysis of genomic clones and the elucidation of the amino acid sequence, especially around the Pl-residue at the reactive center of heparin cofactor II, will clarify the situation. Examination of the aligned sequences of hLS2, ATIII, il-anti- trypsin and o(l-antichymotrypsin suggests that the P1-residue of hLS2 is leucine (Figure 5). Leucine is also located at the reac-

1084 Nucleic Acids Research tive site of human o(l-antichymotrypsin (45). In accordance with the proposal to characterize serine protease inhibitors by the amino acid at the Pl-position in conjunction with the term "ser- pin" (46), it is suggested, that the factor described in this ar- ticle should be called human Leuserpin 2 (hLS2), in which case hu- man 0(l-antichymotrypsin would be termed human Leuserpin 1. The identity of the reactive site in these two factors may indicate that hLS2 reacts in vivo with proteases which resemble the target enzymes of o(l-antichymotrypsin (e.g. cathepsin G, mast cell chy- mase and chymotrypsin). A comparison between the amino acid sequences of ATIII, °C 1_antitrypsin, o/l-antichymotrypsinogen and hLS2 clearly in- dicates, that the latter belongs to the plasma protease inhibitor family. The homologies support the hypothesis that these four proteins share a common genetic ancestor. In addition to the homo- logies discussed here, hLS2 shows considerable homology to oval- bumin and to the amino acid sequences derived from the ovalbumin- related genes X and Y (47,48) in chicken (data not shown). As previously observed, the N-terminal regions of the plasma protease inhibitors vary in length and show little homology with each other (8,46). In the case of angiotensinogen, the N-terminal tail encodes angiotensin I and II (49) and it has been proposed that the N-terminal sequences of the other members of the gene fa- mily may have a distinct physiological function (8,46). Evidence that N-terminal amino acids are involved in heparin binding to ATIII is increasing (50). Heparin cofactor II also binds to hepa- rin but less readily (17,18). Since there is little homology at the N-terminal tail of ATIII, in comparison with heparin cofactor II and hLS2, the N-termini of the latter 2 proteins are probably not involved in heparin binding or else the heparin binding mecha- nism is different. In this context it is interesting to note the presence of a stretch of acidic amino acid residues close to the N-terminus of hLS2 (see results). At present, it is not known whether a separate physiological function is associated with this structure. The classification of plasma protease inhibitors in one fami- ly is not reflected in the genomic organization of its individual members (51). The angiotensinogen gene in rats and the ail-anti-

1085 Nucleic Acids Research

trypsin gene in man each contain 4 which are located in equivalent positions, mainly in the 3'-half of the genes (52,53). However, in the ovalbumin gene and the X and Y genes, 7 introns are situated in the 5'-half (47). This is a contrast to the situa- tion in most other multi-gene families, where the number and posi- tions of introns are strictly conserved (54,55). The analysis of genomic clones which encode hLS2 may, therefore, help to elucidate how the - structure of the plasma protease inhibitor ge- ne family evolved. Furthermore, the availability of A or cosmid clones will permit the molecular analysis of genetic defects si- tuated in and around the hLS2 gene.

ACKNOWLEDGMENTS I would like to thank Dr. Schone and my colleagues for their continuous support and encouragement, and I am also indebted to Dr. Engels and Dr. Uhlmann for synthesizing the oligonucleotides. I am grateful to M. Henke, M. Stephan and T. Ulshofer for their technical assistance and to Mrs. P. Schmidt and M. Hill for their help in preparing the manuscript.

REFERENCES 1. Travis, J. and Salvesen, G.S. (1983) Annu. Rev. Biochem. 52, 655-709. 2. Laskowski, M. and Kato, I. (1980) Annu. Rev. Biochem. 49, 593-629. 3. Owen, M.C., Brennan, S.O., Lewis, J.H. and Carrell, R.W. (1983) New Engl. J. Med. 309, 694-698. 4. Hill, R. E., Shaw, P.H., Boyd, P.A., Baumann, H. and Hastie, N.D. (1984) Nature 311, 175-177. 5. Hunt, L.T. and Dayhoff, M.O. (1980) Biochem. Biophys. Res. Commun. 95, 864-871. 6. Chandra, T., Stackhouse, R., Kidd, V.J., Robson, J.H. and Woo, S.L.C. (1983) 22, 5055-5061. 7. Carrell, R.E., Jeppsson, J.-O., Laurell, C., Brennan, S.O., Owen, M.C., Vaughan, L. and Boswell, D.R. (1982) Nature 298, 329-334. 8. Doolittle, R.F. (1983) Science 222, 417-419. 9. Kageyama, R., Ohkubo, H. and Nakanishi, S. (1984) Biochemistry 23, 3603-3609. 10. Snyder, S.H. and Innis, R.B. (1979) Annu. Rev. Biochem. 48, 755-782. 11. Balldin, G., Laurell, C.-B. and Ohlsson, K. (1978) Hoppe- Seylers Z. physiol. Chem. 359, 699-708. 12. Gadek, J.E., Fells, G.A., Zimmerman, R.L., Rennard, S.I. and Crystal, R.G. (1981) J. clin. Invest. 68, 889-898. 13. Rosenberg, R.D. and Damus, P.S. (1973) J. Biol. Chem. 248, 6490-6505.

1086 Nucleic Acids Research

14. Damus, P.S., Hicks, M. and Rosenberg, R.D. (1973) Nature 246, 355-357. 15. Stead, n., Kaplan, A.P. and Rosenberg, R.D. (1976) J. Biol. Chem. 251, 6481-6488. 16. Rosenberg, R.D. (1977) Fed. Proc. 36, 10-18. 17. Tollefsen, D.M., Majerus, D.W. and Blank, M.K. (1982) J. Biol. Chem. 257, 2162-2169. 18. Wunderwald, P., Schrenk, W.J. and Port, H. (1982) Thromb. Res. 25, 177-191. 19. Tran, T.H., Marbet, G.A. and Duckert, F. (1985) Lancet ii 413-414. 20. Sie, P., Dupony, D., Pichon, J. and Boneu, B. (1985) Lancet ii 414-416. 21. Caruthers, M.H. (1982) in Gassen, H.G. and Lang, A. (eds), Chemical and Enzymatic Synthesis of Gene Fragments, Verlag Chemie, Weinheim, F.R.G., pp. 71-79. 22. Engels, J., Hashimoto-Gotoh, T., MUllner, H., Uhlmann, E. and Wetekam, W. (1985) Gene, in the press. 23. Chirgwin, J.M., Przybyla, A.E., MacDonald, R.J. and Rutter, W.J. (1979) Biochemistry 18, 5294-5299. 24. Auffray, C. and Rougeon, F. (1980) Eur. J. Biochem. 107, 303-314. 25. Aviv, H. and Leder, P. (1972) Proc. Natl. Acad. Sci. USA 69, 1408-1412. 26. Thomas, P.S. (1980) Proc. Natl. Acad. Sci. USA 77, 5201-5205. 27. Wickens, M.P., Buell, G.N. and Schimke, R.T. (1978) J. Biol. Chem. 253, 2483-2495. 28. Maniatis, T., Fritsch, E.F. and Sambrook, J. (eds), Molecular Cloning; Cold Spring Harbor Laboratory, New York, 1982. 29. Vieira, J. and Messing, J. (1982) Gene 19, 259-268. 30. Gubler, U. and Hoffman, B.J. (1983) Gene 25, 263-269. 31. Toole, J.J., Knopf, J.L., Wozney, J.M., Sultzman, L.A., Buecker, J.L., Pittman, D.D., Kaufman, R.J., Brown, E., Shoemaker, C., Orr, E.C., Amphlett, G.W., Foster, W.B., Coe, M.L., Knutson, G.J., Fass, D.N. and Hewick, R.M. (1984) Nature 312, 342-347. 32. Ragg, H. and Weissmann, C. (1983) Nature 303, 439-442. 33. Twigg, A.J. and Sherratt, D. (1980) Nature 283, 216-218. 34. Hanahan, D. and Meselson, M. (1980) Methods Enzymol., 65, 499-560. 36. Sanger, F. and Coulson, A.R. (1978) FEBS Letters 87, 107-110. 37. Proudfoot, N.J. and Brownlee, G.G. (1976) Nature 263, 211-214. 38. Breathnach, R. and Chambon, P. (1981) Annu. Rev. Biochem. 50, 349-383. 39. Mononen, J. and Karjalainen, E. (1984) Biochim. Biophys. Acta 788, 364-367 40. Witt, J., Kaiser, C., Schrenk, W.J. and Wunderwald, P. (1983) Thromb. Res. 32, 513-518. 41. Griffith, M.J., Noyes, C.M. and Church, F.C. (1985) J. Biol. Chem. 260, 2218-2225. 42. Petersen, T.E., Dudek-Wojcieckowska, G., Sottrup-Jensen, L. and Magnusson, S. (1979) in Collen, D., Wiman, B. and Ver- strate, M. (eds), The Physological Inhibitors of Blood Co- agulation and Fibrinolysis, Elsevier/North-Holland, Amsterdam, pp. 43-54. 43. Ferguson, W.S. and Finlay, T.H. (1983) Arch. Biochem. Biophys. 221, 304-307.

1087 Nucleic Acids Research

44. Weissmann, c., Nagata, S., Boll, W., Fountoulakis, M., Fujisawa, A., Fujisawa, J.-I., Haynes, J., Henco, K., Mantei, N., Ragg, H., Schein, C., Schmid, J., Shaw, G., Streuli, M., Taira, H., Todokoro K. and Weidle, U., (1982) Phil. Trans. R. Soc. Lond. B 299, 7-28. 45. Morii, M. and Travis, J. (1983) J. Biol. Chem. 258, 12749-12752. 46. Carrell, R. and Travis, J. (1985) TIBS 10, 20-24. 47. Heilig, R., Perrin, F., Gannon, F., Mandel, J.L. and Chambon P., (1980) Cell 20, 625-637. 48. Heilig, R., Muraskowsky, R., Kloepfer, C. and Mandel J. L. (1982) Nucleic Acids Res. 10, 4363-4382. 49. Bouhnik, J., Clauser, E., Strosberg, D., Frenoy, J.-P., Menard, J. and Corrol, P. (1981) Biochemistry 20, 7010-7015. 50. Koide, T., Odani, S., Takahashi, K., Ono, T. and Sakuragawa, N. (1984) Proc. Natl. Acad. Sci. USA 81, 289-293. 51. Leicht, M., Long, G., Chandra, T., Kurachi, K., Kidd, V., Mace Jr, M., Davie, E.W. and Woo, S.L.C. (1982) Nature 297, 655-659. 52. Tanaka, T., Ohukubo, H. and Nakanishi, S. (1984) J. Biol. Chem. 259, 8063-8065. 53. Long, G.L., Chandra, T., Woo, S.L.C., Davie, E.W. and Kurachi, K. (1984) Biochemistry, 23, 4828-4837. 54. Maniatis, T., Fritsch, E.F., Lauer, J. and Lawn, R.M. (1980) Annu. Rev. Genet. 14, 145-178. 55. Wahli, W., David, I.B., Wyler, T., Weber, R. and Ryffel, G.U. (1980) Cell 20, 107-117. 56. Bollen, A., Herzog, A., Cravador, A., Herion, P., Chuchana, P., van der Straten, A., Loriau, R., Jacobs, P. and van Elsen, A. (1983) DNA 2, 255-264. 57. Bock, S.L., Wion, W.L., Vehar, G.A. and Lawn, R.W. (1982) Nucleic Acids Res. 10, 8113-8125. 58. Berger, S.L., Wallace, D.M., Puskas, R.S. and Eschenfeldt, W.H. (1983) Biochemistry 22, 2365-2372.

1088