Proc. Nati. Acad. Sci. USA Vol. 82, pp. 2282-2286, April 1985 Biochemistry

Nucleotide sequence of cDNA encoding human a2-macroglobulin and assignment of the chromosomal locus (human serum proteins/proteinase inhibitors/human chromosome mapping/thiol ester bond) CHEN-CHEN KAN*, ELLEN SOLOMONt, K. TERTIA BELTt, AILEEN C. CHAIN*, LYNNE R. HIORNSt, AND GEORG FEY* *Department of Immunology, Research Institute of Scripps Clinic, 10666 North Torrey Pines Road, La Jolla, CA 92037; tImperial Cancer Research Funds Laboratory, P.O. Box 123, Lincoln's Inn Fields, London WC2A 3PX, England; and tDepartment of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, England Communicated by Frank J. Dixon, December 6, 1984

ABSTRACT Six a2-macroglobulin (a2M) cDNA clones ester sites are conserved. These observations led to the pro- were isolated from a human liver cDNA library by using syn- posal that the C3, C4, and a2M genes are derived from a thetic oligonucleotides as hybridization probes. One of these, common ancestral gene (16). The sequences of murine and pa2M1, carries a 4.6-kilobase-pair insert, which was se- human C3 and human C4 and partial cDNA sequences of quenced. The insert contains the coding sequences for the ma- murine C4 have recently been determined (17-21). Compari- ture a2M polypeptide (1451 amino acids) and for a 23-amino son of human a2M with murine C3 revealed a 25% overall acid signal at the NH2 terminus of the precursor pro- sequence homology (17, 18, 22), supporting the proposal of a2M. At the 3' end of the insert a poly(A) addition signal A-A- their common evolutionary origin. The human C3 gene is lo- T-A-A-A and part of the poly(A) tail of the messenger RNA cated on chromosome 19 (23), and the human C4 genes map were found. The protein sequence deduced from the nucleotide to the histocompatibility complex on chromosome 6 (24). sequence agrees with the published a2M sequence In addition to binding to proteinases, a2M can also adhere for all except three residues. The a2M locus was assigned to to small molecules, including lectins, cations (zinc), basic human chromosome 12 by Southern blot analysis with DNA proteins (myelin basic protein, histone H4), and platelet-de- from a panel of mouse/human somatic hybrids, using a2M rived growth factor (PDGF) (1, 25). The binding mechanisms cDNA as a hybridization probe. involved are probably unrelated to the trapping of protein- ases because adherence of small molecules does not prevent a2-Macroglobulin (a2M) is a serum glycoprotein and a major the proteinase binding (1). A physiological role for the com- plasma proteinase inhibitor with a wide specificity. a2M-re- plex formation between a2M and these small molecules has lated proteins are present in all vertebrate species (1-4). Hu- not been established. The striking similarity of the amino man a2M is a tetramer of four identical 185-kDa subunits, acid sequences of PDGF and p28Sis, the transforming protein arranged as a pair of dimers each consisting of two disulfide- of the simian sarcoma virus (26, 27), suggests that viral and linked monomers (5, 6). The a2M polypeptide has a so-called other cellular transformation mechanisms may be mediated bait region and an internal thiol ester bond, which account by growth factor-like proteins. Thus, a2M may serve as an for its properties as a proteinase inhibitor. The bait region, important role in limiting the effects of PDGF-like mitogens. composed of a series of target peptide bonds for plasma pro- The amino acid sequence of a2M has recently been report- teinases (7, 8), is located close to the center of the monomer- ed (28). For our future studies of the structural properties of ic subunit. The thiol ester bond is a covalent linkage between a2M that underlie its functions and of the regulation of the y-carboxyl function of glutamate residue 975 and the sul- expression of the a2M gene, it is necessary to prepare cDNA hydryl group of cysteine residue 972 (9-12). Cleavage of the clones and to derive the complete cDNA sequence. Here we bait region by proteinases leads to a conformational change report the isolation and sequence analysis of human a2M of the a2M molecule. This conformational change activates cDNA clones and the assignment of the human a2M locus to the thiol ester bond, leading to its hydrolysis and as a conse- chromosome 12. quence to covalent binding between the proteinase and a2M via the reactive glutamate 975. a2M receptors are present on MATERIALS AND METHODS the surfaces of fibroblasts and macrophages. These receptor proteins preferentially bind a2M-proteinase complexes or Materials. Restriction enzymes were from Boehringer a2M molecules with hydrolyzed thiol ester bonds (13, 14). Mannheim, DNA polymerase I (Klenow fragment) was from This suggests that the conformational change, which accom- Bethesda Research Laboratories, and the 17-nucleotide uni- panies the complex formation between a2M and proteinases versal sequencing primer was from Collaborative Research. and the hydrolysis of the thiol ester bond, exposes regions of T4 DNA polymerase and dideoxy- and deoxynucleotide tri- the a2M molecule that are recognized by these receptors. phosphates were from P-L Biochemicals. The radionucleo- Receptor-mediated endocytosis of proteinase-a2M com- tides [a-32P]dCTP (>400 Ci/mmol; 1 Ci = 37 GBq) and plexes by macrophages and liver cells leads to clearance of deoxyadenosine 5'-(a-[35S]thio)triphosphate (>400 Ci/ the complexes from the circulation. mmol) were from Amersham. T4 DNA ligase and the Esche- Internal thiol ester bonds are also found in the comple- richia coli strain JM101-TG1 were gifts from D. Bentley and ment proteins C3 and C4 (15), which are derived from pre- T. Gibson. Oligonucleotide probes were synthesized by Cre- cursor polypeptides of size similar to a2M (180-200 kDa). ative Biomolecules (South San Francisco, CA). Their thiol ester sites are found in positions comparable to Synthetic Oligonucleotide Probes and Isolation of cDNA those in a2M, and the amino acid sequences of all three thiol Clones. The construction of the human liver cDNA library enriched for mRNA species over 2000 nucleotides in length The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" Abbreviations: C3 and C4, third and fourth components of comple- in accordance with 18 U.S.C. §1734 solely to indicate this fact. ment; a2M, a2-macroglobulin; kb, kilobase pair(s). 2282 Biochemistry: Kan et aL Proc. Natl. Acad. Sci. USA 82 (1985) 2283 with the plasmid vector pAT 153 Pvu II-8 has been previous- Schuell) (43). Filters were hyridized with nick-translated (44) ly described (19). Two methionine residues at positions 1378 a2M cDNA in 2.5x NaCl/Cit (lx NaCl/Cit is 150 mM sodi- and 1385 of the human a2M sequence (28) and their sur- um chloride/15 mM trisodium citrate), lOx Denhardt's solu- rounding sequences were chosen as probe sites because they tion (2% Ficoll/2% bovine serum albumin/2% polyvinylpyr- were encoded by nucleotide triplets with minimal degenera- rolidone), 0.1% sodium dodecyl sulfate, 6% polyethylene 1378 glycol 6000 (Fisons, Loughborough, England) or 10% dex- cy. These amino acid sequences are Asn-Met-Ala-Ile- 1385 tran sulfate, and denatured salmon sperm DNA at 50 ,ug/ml. Val-Asp and Asp-Val-Lys-Met-Val. The corresponding Hybridization was carried out at 65°C overnight. Filters oligonucleotide mixtures were were washed at 65°C successively with 2x, 1x, 0.3x, and AA 0.lx NaCl/Cit, three times for 20 min each. ST'-T-C-N-A-C-G-A-T-N-G-C-C-A-T-G _T-T-3' RESULTS (N represents all four nucleotides) and Identification and Sequence Analysis of cDNA Clones. A S5'-A-C-C-A-T- C_T-T-N-A-C-G T-C-3'. human liver cDNA library was screened with two mixed synthetic oligodeoxynucleotide probes, a 17-mer and a 14- This cDNA library consisted of about 15,000 different mer constructed on the basis of known amino acid se- recombinant clones; 30,000 colonies were plated on nitrocel- quences. Of 30,000 colonies screened, 34 clones hybridized lulose discs. A set of replica filters were prepared from the with both the 17-mer and 14-mer probes. master filters (29). Colonies on replica filters were grown, After analysis of the cDNA inserts, six then transferred to chloramphenicol plates to amplify the recombinant plamids were selected for detailed analysis. plasmid. The colonies were processed (30, 31) and hybrid- One of them, pa2M1, carries a 4.6-kilobase pair (kb) insert ized with kinase 32P-labeled (32) 14-mer and 17-mer oligonu- whose nucleotide sequence was determined. The nucleotide cleotide mixtures. Positive colonies were rescreened until sequence from the 5' nontranslated region to the poly(A) tail genetic homogeneity was reached. The length of cDNA in- of the mRNA and the derived amino acid sequence are given serts was determined by restriction analysis with the en- in Fig. 1. The complete coding sequence was determined on zymes HindIII and Sal I. both DNA strands; each character was determined 3.75 DNA Sequence Analysis. Plasmid DNAs carrying the long- times on average. A methionine encoding triplet (nucleotides est cDNA inserts were digested with HindIII and Sal I. The 44 to 46) was identified as the translational initiation site. fragments were separated by electrophoresis in low-melting This methionine, together with the following 22 amino acid agarose gels and recovered from the agarose by extraction residues, forms the of the a2M protein, which with phenol. Shotgun cloning of random 200- to 400-base- has not previously been described. Thus, a2M must be syn- pair subfragments in the Sma I site of the phage M13 mp8 thesized as a pro-a2M precursor polypeptide. The a2M sig- vector and preparation of single-stranded DNA from phage nal peptide has a 10 out of 23 amino acid residue sequence particles were performed as described (33). DNA sequences homology with the mouse complement C3 signal peptide were determined with the dideoxynucleotide chain-termina- (Fig. 2 and ref. 17). This similarity further supports the idea tion method (34), using deoxyadenosine 5'-(a-[35S]thio)tri- that the C3 and a2M genes must have a common - phosphate and buffer gradient gels (35). DNA sequence data ary origin (17, 22). The amino acid sequence derived from were analyzed by using Staden's computer programs (36- the cDNA sequence was compared with the a2M amino acid 38). sequence determined on the protein level by Sottrup-Jensen Chromosomal Assignment of the a2M Locus. All human/ and collaborators (28). Only three variant residues were rodent somatic cell hybrids grown for DNA preparations found: cysteine vs. glutamic acid at position 563, glutamine (Table 1), have been recharacterized as to human chromo- vs. glutamic acid at 975, and valine vs. isoleucine at 1000. some composition by karyotyping (39, 40), human isozyme Residue 975 participates in the formation of the thiol ester analysis (41, 42), or both. High molecular weight DNA from bond (28). When the mature protein is sequenced, the bond human, rodent and hybrid clones was digested with the re- is hydrolyzed and this residue manifests itself as glutamic striction enzymes EcoRI, Pst I (Boehringer Mannheim), or acid. However, the cDNA sequence reveals the original resi- Taq I (P-L Biochemicals). Resulting fragments were electro- due at this position to be a glutamine. Similarly, cDNA se- phoretically separated in 0.7% agarose gels. DNA was dena- quence predicted glutamine residues in the thiol ester se- tured and transferred to nitrocellulose filters (Schleicher & quences of the related murine C3, C4, and sex-limited pro-

Table 1. Assignment of the a2M locus to chromosome 12 Human chromosomes Hybrid Human a2M gene 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X

laA9.498 + + - - - - - + ------+ - + MOG2 + + + + + + + + + + + + + + + + + + + + - + + + DT2.1.2 + ------+ + + + - + - + - - + + + SIR7.4ii + + + + - - - - + + - - - - - + -+ SCC16.5 - HORL9X ------+ MCP6 -...... +...... + MOG2.1ii - + - + + + + - + + - + - + + + + + + + + + - + CTP41.2.A2 - - + - - ++ - - - - + - - - - + Human/rodent somatic cell hybrids tested for the presence of the human a2M gene. References for the hybrids are 1aA9.498, SCC16.5 (gift of D. Cox, Department of Pediatrics, University of California, San Francisco); MOG2, MOG2.1ii (49); HORL9X (50); MCP6 (51); CTP41.2.A2 (52). A human chromosome is scored + if it is present in at least 10%o of cells by karyotyping or the marker enzyme is clearly present by electrophoretic or immunological assay; ± indicates that the marker is present very weakly by enzyme assay. An empty space means not determined by either karyotyping or marker enzyme assay. 2284 Biochemistry: Kan et al. Proc. Natl. Acad. Sci. USA 82 (1985)

M 6 K N K L L H P S L V L L L L V L L P T D A S V S 26 GCTr ACAATrc CATrcTrGGTrcrCTrCCTAOC TCC r TTTTCTrGCAACATGGGGAAGAACAAACTCCTrc TCATCCAaGTCrGGTTr C TTCTCCTC TTGGTCCTCCTGCCCACAGACGCC TCAGCTCrc 120 G K P 6 Y N V L V P S L L H T E T T E K 6 C V L L S Y L N E T V T V S A S L E S 66 TGGAAAACCGC AG rATr ATrGGTTrcT TGrCcCrCCCCC TGCTrCCACACTrGAGACCAC TGAGAABGGCTGTGTCCTTCTGAGCTACCTGAATGAGACAB TGACTGTAAG TGCT TCCT TOGAGTC 240 V R G N R S L F T D L E A E N D V L H C V A F A V P K S S S N E E V H F L T V a 106 rGTrCAGGGGAAACAGGAGCCTrcTr TCAC TGACCTGGAGGCGGAGAATGACGTACTCCACTGTGTCGCCT TCGCTGTCCCAAAGTCTTCATCCAA TGAGGAGGTAATGTTCCTCAC TGTCCA 360 V K 6 F T 0 E F K K R T T V N V K N E D S L V F V 0 T D K S I Y K P 6 a T V K F 146 AGTr GAAAGGAC CAAC C CAAGAATrTr TAAGAAGCGGACCACAG T GA TGGT TAAGAACGAGGACAGTC TGGTCT T TGTCCAGACAGACAAATCAATC TACAAACCAGBGCAGACAGTGAAATTr 480 R V V S N D E N F H P L N E L I P L V Y I a D P K G N R I A 0 W 0 S F 0 L E 6 6 186 TCGTOTTrGrcrCTCCATGGATGAAAAC rrTTCACCCCCTrGAArGAGTrTrGATTCCACTrAOTATACAT TCAGGATCsCAAAGGAAATrCOCATCOCACAATGOCAGAOTTrTCCAOTTrAGAGGGTGG 600 L K 6 F S F P L S S E P F Q 6 S Y K V V V 0 K K S 6 6 R T E H P F T V E E F V L 226 CC T CAAGC AArrTTTCTTTr T CCCCTrcTrCATCAGAGCCCT TC CAGGGC TCCTACAAGGTGGTGGTACAGAAGAAATCAGGTGGAAGGACAGAGCACCCrrTCACCG T GGAGGAATTrrGTTrcr 720 P K F E V 6 V T V P K I I T I L E E E H N V S V C 6 L Y T Y 6 K P V P 6 H V T V 266 rTC C CAAG rrTTGAAGTr ACAAGT AACAGTr GCCAAAGATr AATCACCAT C rTTGGAAGAAGAGATGAArGrATCASGTBTGTG~CCT ATACACATATCGGAAGCCTGTCCCTGACATGTGAC TGT 840 S I C R K Y S D A S D C H 6 E n s a A F C E K F S 6 a L N S H 6 C F r o 6 V K T 306 GAGCAT T T GCAGAAAGTATAGTGACGCT TCCGACTGCCACGGTGAAGATTCACAGGCTTTCTGTGAGAAATTCAGTGGACAGCTAAACAGCCATGBCTGCTTCTATCAGCAAGTAAAAAC 960

K V F 0 L K R K E Y E N K L H T E A 6 I a E E 6 T V V E L T G R 6 S S E I T R T 346 CAAGGTCTTCCAGC GAAGAGGAAGGAGT ATGAAATGAAACTTCACACTGAGGCCCAGATCCAAGAAGAAGGAACAGTGGTGGAATTACTGGAAGGCAGTCCAGTGAAATCACAAGAAC 1080

I T K L S F V K V D S H F R 0 G I P F F 6 6 V R L V D G K 6 V P I P N K V I F I 386 CAT AACCAACTCTCATTTGTGAAAGT1GGACTCACACrrTCACAGGGAA0TCCCTTCTTTGGGCAGGTGCGCCTAGTAGATGGGAAAGGCGTCCCTATACCAAATAAAGTCATATTCAT1200 R G N E A N Y Y S N A T T D E H 6 L V 6 F S I N T T N V N 6 T S L T V R V N Y K 426 CAGAGGAAAT GAAGCAAACTAT T ACTCCAATGCT ACCACGGATGAGCATGGCCTTGTACAGTTCTCTATCAACACCACCAACGTTATGGGTACCTC TCTTACTGTTAGGGTCAATTACAA 1320 11 R S P C Y 6 Y a W V S E E H E E A H H T A Y L V F S P S K S F V H L E P N S H 466 GGAT CGTAGTCCCTGTTrACGGCTACCAGTGGGTGTCAGAAGAACACGAAGAGGCACATCACACTGCT TATCT TGTGT TCTCCCCAAGCAAGAGCTTTGTCCACCTTGAGCCCATGTCTCA 1440 E L F C G H T 6 r V 6 A H Y I L N G 6 T L L G L K K L S F Y Y L I N A K G G I V 506 TGAACT ACCC T GTGGCCATACTCAGACAGTCCAGGCACAT T ATAT TCTGAAT GGAGGCACCCTrGCT GGGGCTGAAGAAGCTCTCCTTT TAT TATCTGATAATGGCAAAGGGAGGCATTGT 1560 R T G T H G L L V K a E D N K G H F S I S I P V K S D I A P V A R L L I Y A V L 546 CCGAACTGGGAC TCATGGACTGCT TGTGAAGCAGGAAGACATGAAGGGCCATTTTTCCATCTrCAATCCCTGTGAAGTCAGACATToC TCCTGTCGCTCGGT TGCT CATC TATGCTrGT TTTr 1680 P T 6 D V I G D S A K Y D V E N C L A N K V n L S F S P 5 0 S L P A S H A H L R 586 AGCT AECGGGGACGTBA'TTGGGGAT TCTGCAAAATATGATGTTr GAAAAT TGTCTGGCCAACAAGGTGGAT T TGAGCT TCAGCCCATCACAAAGTCT CCCAGCCTCACACGCCCACCTGCG 1800 V T A A F Q S V C A L R A V D 0 S V L L N K P D A E L S A S S V Y N L L P E K D 626 AGTCACAGCGGCTCCT CAGTCCGTC TGCGCCCTCCGrGC TGT GGACCAAAGCGTGCTGCTCATGAAGCCTGATGCT GAGCTCTCGGCGTCCTrCGGTT TACAACCTGCTACCAGAAAAGGA 1920 L T 6 F P G P L N D 0 D D E D C I N R H N V Y I N G I T Y T P V S S T N E K D N 666 TCCCTOGGGCCTTrTGAATGACCAGGACGAT GAAGAC TGCATCAATCGTCATAATGTCT ATATTAATGGAATCACATATACTCCAGTATCAAGTACAAATGAAAAGGATAT 2040 CCTrCACTGGCT PrOtsslytic

Y S F L E D N G L K A F T N S K I R K P K N C P 0 L 0 0 Y E N H G P E 6 L R V 6 706 GT ACAGCTTCCTAGAGGACATGGGCTrTAAAGGCATTrCACCAACTCAAAGAT TCGTAAACCCAAAATGTGTCCACAGCTrTCAACAGTrATGAAATGCATGGACCTGAAGGTTc ACGTOTAGG 2160 Site F Y E S D V N G R G H A R L V H V E E P H T E T V R K Y F P E T W I U D L V V V 746 r rT TTTATGAGTCAGATGT AATGGGAAGAGGCCATrGCACGCC TGGTGCATGTTGAAGAGCC TCACACGGAGACCGTACGAAAGTACT TCCC TGAGACATGGATCTGGGATT TGGTGGTGGT 2280 N S A G V A E V G V T V P D T I T E W K A 6 A F C L S E D A G L 6 I S S T A S L 786 AAACTCAGCAGGGGTGGCTGAGGTAGGAGTAACAGTcCCTGACACCATCACCGAGTGGAAGOCAGGGGCCTTCTGCCTGTCTGAAGATGCTrGGAC TTGGTATCTCTr TCCACTGCCTCTCT 2400 R A F 6 P F F V E L T N P Y S V I R 6 E A F T L K A T V L N Y L P K C I R V S V 826 CCGAGCCTTCCAGCCCrcTTrTGTGGAGC T TACAATGCC TTr AC TCTGTGAT TCTGGAGAGOCCT TCACAC TCAAGGCCACGGTCCTAAAC TACCTr TCCCAAATGCATCCGGG TCAGTGT 2520 6 L E A S P A F L A V P V E K E 6 A P H C I C A N G R 0 T V S W A V T P K S L 6 866 GCAGCTGGAAGCCTrcTrCCCGCCTTCCTr TGCTGTCCCAG TGGAGAAGGAACAAGCGCCTCAC TGCATCTGTGCAAACGGGCGGCAAACTGTGTCCTGGGCAGTAACCCCAAAGTCATTAGG 2640 N V N F T V S A E A L E S 0 E L C G T E V P S V P E H G R K D T V I K P L L V E 906 AAATGTGAAT T TCACT G T GAGCGCAGAGGCACTAGAGTC TCAAGAGC TGrTGTGGGACTGAGGTGCCTr TCAGTTCCT GAACACGGAAGGAAAGACACAGTCATCAAGCC TCTGTTGGTTGA 2760 6 P E L E K E T T F N S L L C P S G 6 E V S E E L S L K L P P N V V E E S A R A 946 Thioester Site S V S V 1 6 El I L 6 5 A N 6 N T 6 N L L 6 N P V 6 Cjjjj N N V L F A P N I V V 986 T TC TGT CTC AGT TT T GGG AGACATA T TAGG CT C TGC CAT GCAAAA CACAC AAAAT CT T CTC CAGAT GCCC TAT GGC T GT GAGAG CA-GAA TAT GGT C CTCT TTGCT CC TAACATC TA TGT 30G00 L Dl Y L N E T 6 6 L T P C V K S K A I 6 Y 1 N T 6 V 0 K 6 1 N Y K H V El 6 5 V 5 1026 ACT GG AT T AT CT AAATGAAACACAGCAGCT T AC T CCAGAGGTCAAGT CCAAGGCCAT TGGCTATCTCAACACTGGTTACCAGAGACAGTTGAACTACAAACACTATGAT GGCTCCTACAG 3120 T F 6 C K V 6 K N 6 6 N T U L T A F V L K T F A 6 A R A Y I F IDEl A H I T 6 A 1066 CACCTT TGGGGAGCGATATGGCAGGAACCAGGGCAACACCTGGCTCACAGCCTTTGTTCTGAAGACTT TTGCCCAAGCTCGAGCCT ACATCT TCATCGATGAAGCACACATTACCCAAGC 3240 L I U L 5 6 K 6 K El N 6 C F R 5 5 6 5 L L N N A I K 6 6 V C El V T L S A Y I T 1106 CC TCATAT GGC TCT CCCAGAGGCAGAAGGACAAT GGC TGT TTCAGGAGCT C TGGGTCAC TGC TCAACAATGCCA TAAAGGGAGGAGTAGAAGATGAAGTGACCCTC TCCGCCTATATCAC 3360 I A L L E I F L T V T H P V V R N A L F C L E S A W K T A Q E G D H G S H V Y T 1146 CAT CGCCCT TCT GGAGAT TCCT CTCACAGTCACTCACCCTGT TGTCCGCAATGCCCTGT TTTGCCTGGAGTCAGCCTGGAAGACAGCACAAGAAGGGGACCAT GGCAGCCATGTATATAC 3480 K A L L A Y A F A 1 A 6 N 6 El K K K C V L K S L N E C A V K K El N S V H U C K P 1186 C AAAGCAC TGC TGGCC TAT GC TT T TGCCC TGGCAGGT AACC AGGACAAGAGGAAGGAAG TACTCAAGT CACT TAAT GAGGAAGCT GTGAAGAAAGACAACT CT GTCCATT GGGAGCGCCC 3600 6 K P K A P V 6 H F Y C P 6 A P 5 A C V C N T S V V L L A Y L T A 6 P A F T S E 1226 T CAGAAACCCAAGGCACCAG TGGGGCAT TT T TACGAACCCCAGGCT CCCT C TGCT GAGGTGGAGATGACAT CC TAT GTGC TCCTCGCT TAT CT CACGGCCCAGCCAGCCCCAACCT CBGA 37 20

GGACC TGACC TCT GCAACCAACATCGTGAAGT GGATCACGAAGCAGCAGAAT GCCC AGGGCGGT TT CTCCTCCACCCAGGACACAGTGGT GGCTCTCCATGCT CTGT CCAAATATGGAGC 3840

C GCCACAT T TACCAGGACT GBGAAGGCT GCACAGGT GACT ATC CAGT CT TCAGGGACAT T TTCCABCAAAT TCCAAGTGGACAACAACAAT CGCCTGT TACTGCAGCAGGTC TCAT TGCC 3960

AGAGC TGCCT GGGGAAT ACAGCATGAAAG TGACAGGAGAAGBAT GTGTC TACCTCCAGACCT CCT TGAAAT ACAATAT TCT CCCAGAAAAGGAAGAGT TCCCCT TTGCT T TAGGAGTGCA 4080 rTCTGCTCATTCCCGAAGCAAASTGAAAGCTTAAAATCTACCTGTGGCCGACGAAGCAOAGCACCAGTCTGATTACCTGAAAGGOTCAATCG 39602 T L P 0 T C D E P K A H T S F 0 N S L S V S Y T G S R S A S AMN H A I S V K V 13626

S G F I K' L K P VA K M L E R S N H V S R T E V S S N H V L I Y LI 0K V S N P 1426

L S L F F T V L 6 D V P V R D L K P A I V K V Y D Y Y E T D E F A I A E Y N A P 1466 ACT GAGCT TGT TC TTCACGGT TCTGCAAGATGTCCCAGTAAGAGATCTCAAACCAGCCATAGTGAAAGTCTATGATTACTACGAGACGGATGAGT TTGCAATCGCTGAGTACAATGCTCC 4440 C S K D 1 6 N A * 1474 T TGCAGCAAAGAT CT T GGAAAT GCT TGAAGACCACAAGGCTGAAAAGTGCT T TGCTGGAGTCCTGTTCTCTGAGCTCCACAGAAGACACGTGTTT TTGTATCT TTAAAGACT TGATGAAT 4560 AAACACTTTTTCTGGTCAAAAAAAAA 4586 Fig. 1. (Legend appears at the bottom of the next page.) Biochemistry: Kan et aL Proc. NatL Acad. Sci. USA 82 (1985) 2285

Murine C3 [T]P A S G S Q L - L V LL L L A S S I- L A L G

Human A2M M G K N K L L H P S L V L L L L - V L L _ T D A

FIG. 2. Amino acid sequence homology between signal of human pro-a2M and murine prepro-C3. The sequences were aligned to give the best homology. The C3 sequence is from ref. 17. tein and of human C3 and C4 (17-21, 46). At the 3' end of the plete concordance with the a2M gene is 12. The other chro- cDNA sequence (nucleotide positions 4558-4563) an A-A-T- mosomes show at least two discordancies. Identical results A-A-A poly(A) addition signal was observed, 89 nucleotides were obtained when the same hybrids were digested with the downstream from the translational stop codon and 14 nucde- restriction Taq I. We therefore conclude that the otides upstream from a stretch of 9 adenines, which appears a2M gene is located on chromosome 12. to be the beginning portion of the poly(A) tail. Assignment of the a2M Locus to Chromosome 12. The chro- DISCUSSION mosomal locus of the a2M gene was assigned by using so- matic cell genetics. Fig. 3 shows human, mouse, and hybrid a2M cDNA clones were isolated in this study by using two DNA digested with Pst I and probed with a nick-translated synthetic oligonucleotide probes representing neighboring 1600-base-pair cDNA fragment representing the 3' proximal regions in the protein, located in its carboxyl-terminal por- portion of human a2M mRNA. The human DNA in track 1 tion. Successive screening with both probes eliminated the shows restriction fragments of 16, 5.8, 5.5, 4.3, and 2.5 kb, problem of false positives. One cDNA clone was found to whereas the mouse DNA in track 10 shows major restriction contain an insert with the entire coding sequence for the a2M fragments of 9.5, 6, and 5.5 kb. Three low-abundance mouse protein, including its signal peptide. To date, no amino acid fragments are seen in some of the hybrids at 19, 14, and 3.5 sequence data are available to confirm the signal peptide se- kb. Hybrids in tracks 2, 4, and 9 are positive for the diagnos- quence predicted from cDNA; however, the sequence shows tic human fragments. Track 4 contains DNA from the hybrid substantial similarity with the corresponding human and mu- cell line laAq.498, which contains human chromosomes 6, rine C3 signal peptides. The amino acid sequence derived 12, 21, and X. Hybrid cell line SCC16.5 (track 5) is negative from cDNA confirms the protein sequence reported by Sot- and contains human chromosomes 21 and X. Hybrids MCP6 trup-Jensen and co-workers (28) in all but three residues. (track 8) and CTP41.2.A2 (track 6) are also negative and con- The glutamine/glutamic acid difference in the thiol ester site tain chromosome 6, among others. These combined results (residue 975) has been explained in Results. The finding of a indicate that the human a2M gene is located on chromosome glutamine further supports our proposal that the formation of 12. Table 1 presents all of the hybrids tested and their chro- the thiol ester is initiated with a glutamine rather than a glu- mosome compositions. The only chromosome showing com- tamic acid (46, 53). The difference between a cysteine and a glutamic acid residue at position 563 raises a question about the reported disulfide-bridge structure of the molecule (28). 1 2 3 4 5 6 7 8 9 10 Further sequence analysis of this region of the protein would be desirable. The third difference, valine vs. isoleucine at position 1000, might be due to a DNA polymorphism, be- cause the corresponding coding triplets can be converted -19 from one to the other by a single base resulting in a 16- conservative replacement between two functionally similar -14 amino acids. The a2M gene belongs in a gene family together with the -9.5 complement C3, C4, and C5 genes. In mice, the C3 and C4 genes reside on the same chromosome, chromosome 17 (54); in humans they reside on different chromosomes: C3 on -6.0 chromosome 19 (23) and C4 on chromosome 6 (24). Thus, no 58*- OI* -5.5 selective pressure is apparent to maintain these loci on the same chromosome after duplication and divergence. The as- signment of the human a2M locus to yet a different chromo- some -3.5 suggests that no selective pressure keeps the C3, C4, and a2M loci together on the same chromosome. Frotn the data presented here we cannot definitively eval- 2.5- +0 uate the number of a2M loci on chromosome 12. The possi- bility exists that more than one a2M locus or cross-hybridiz- ing loci might be present on chromosome 12, in analogy to HU hy hy hy hy hy hy hy hy M the duplicated C4 loci on chromosome 6. Rats and rabbits have two a-macroglobulins in their serum: a1-macroglobulin FIG. 3. Detection of human gene in a2M sequences human/ (a1M) and a2M (4). Both are similar in size, are proteinase mouse somatic cell hybrids. Track 1, human (HU) line Molt-4 (47); track 2, hybrid (hy) MOG2; track 3, hybrid MOG2.1.ii; track 4, hy- inhibitors, contain thiol ester bonds, share antigenic determi- brid laA9.498; track 5, hybrid SCC16.5; track 6, hybrid nants, and show similar amino acid composition and glyco- CTP41.2.A2; track 7, hybrid Horl9X; track 8, hybrid MCP6; track sylation patterns (4). Thus, most likely they are products of 9, hybrid DT2.1.2; track 10, mouse (M) line IR (48). DNA was di- duplicated genes. For humans, an equivalent of the ajM pro- gested with Pst I and 25 ,ug of DNA was loaded per track. The + and tein has not been found. We had anticipated that an a1M- - refer to the presence and absence of diagnostic human fragments. related gene might be detected by cross-hybridization in the

FIG. 1 (on preceding page). cDNA and deduced amino acid sequences of pa2Ml, showing the 5' nontranslated sequence, the signal peptide, the entire a2M protein sequence, the poly(A) addition signal, and a portion of the poly(A) tail. The standard one-letter code for amino acids (45) is used. The proteolytic cleavage site and the thiol ester site are underlined. Also shown are regions of nucleotide sequences from which the 14- mer and 17-mer oligonucleotide probes were derived (broken underlines). 2286 Biochemistry: Kan et aL Proc. NatL. Acad. Sci. USA 82 (1985) human , but until now we have not found its trace. 25. Huang, J. S., Huang, S. S. & Deuel, T. F. (1984) Proc. Natl. The cDNA sequences reported here can be used for isola- Acad. Sci. USA 81, 342-346. tion of the corresponding rat and human a2M genomic DNA 26. Waterfield, M. D., Scrace, G. T., Stroobant, P., Johnsson, A., clones and for further studies of the control of expression of Wasteson, A., Westermark, B., Heldin, C.-H., Huang, J. S. & Deuel, T. F. (1983) Nature (London) 304, 35-39. these genes during acute inflammations (55). 27. Doolittle, R. F., Hunkapiller, M. W., Hood, L. E., Devare, S. G., Robbins, K. L., Aaronson, S. A. & Antoniades, H. N. We thank Dr. Lars Sottrup-Jensen for making the a2M sequence (1983) Science 221, 275-277. available prior to publication, Dr. Maarten de Bruijn for making C3 28. Sottrup-Jensen, L., Stepanik, T. M., Wierzbicki, D. M., data available and for helpful discussions, Dr. Hans Muller-Eber- Jones, C. M., L0nblad, P. B., Kristensen, T., Mortensen, hard for support, Dr. Michael C. Carroll and Prof. Rodney R. Porter S. B., Petersen, T. E. & Magnusson, S. (1983) Ann. N.Y. for their cooperation, and Keith Dunn for preparation of the manu- Acad. Sci. 241, 41-60. script. This work was supported by Research Grant AI19651 from 29. Hanahan, D. & Meselson, M. (1980) Gefte 1 7. the National Institute of Allergy and Infectious Diseases. This is 30. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular publication no. 3597IMM from the Research Institute of Scripps Cloning: A Laboratory Manual (Cold Spring Harbor Labora- Clinic. tory, Cold Spring Harbor, NY). 31. Grunstein, M. & Hogness, di. (1975) Proc. Natl. Acad. Sci. 1. Barrett, A. J. (1981) Methods Enzymol. 80, 737-753. USA 72, 3961-3965. 2. Harpel, P. C. & Brbwer, M. S. (1983) Ann. N.Y. Acad. Sci. 32. Studencki, A. B. & Wallace, R, B. (1984) DNA 3, 7-15. 241, 1-9. 33. Bankier, A. T. & Barrell, B. G. (1983) Techniques in Nucleic 3. Starkey, P. M. & Barrett, A. J. (1982) Biochem. J. 205, 91-95. Acid Biochemistry, ed. Flavell, R. A. (Elsevier Scientific, 4. Gordon, A. H. (1976) Biochem. J. 159, 643-650. Limerick, Ireland), Vol. B5, pp. 1-34. 5. Harpel, P. C. (1973) J. Exp. Med. 138, 508-521. 34. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. 6. Wang, D., Yuan, A. I. & Feinman, R. (1983) Ann. N. Y. Acad. Acad. Sci. USA 74, 5463-5467. Sci. 241, 90-97. 35. Biggin, M. D., Gibson, T. J. & Hong, G. F. (1983) Proc. Natl. 7. Barrett, A. J. & Starkey, P. M. (1973) Biochem. J. 133, 709- Acad. Sci. USA 80, 3963-3965. 724. 36. Staden, R. (1982) Nucleic Acids Res. 10, 4731-4751. 8. Sottrur;-Jnsen, L., L0nblad, P. B., Stepanik, T. M., Peter- 37. Staden, R. (1984) Nucleic Acids Res. 12, 521-538. sen, T. E., Magnusson, S. & Jornvall, H. (1981) FEBS Lett. 38. Staden, R. (1982) Nucleic Acids Res. 10, 2951-2961. 127, 167-173. 39. Bobrow, M. & Cross, J. (1974) Nature (London) 251, 77-79. 9. Howard, J. B. (1983) Ann. N. Y. Acad. Sci. 241, 69-80. 40. Solomon, E., Bobrow, M., Goodfellow, P. N., Bodmer, 10. Swenson, R. P. & Howard, J. B. (1980) J. Biol. Chem. 255, W. F., Swallow, D. M., Povey, S. & Noel, B. (1976) Somatic 8087-8091. Cell Genet. 2, 125-140. 11. Sottrqp-Jensen, L., Petersen, T. E. & Magnusson, S. (1980) 41. Harris, H. & Hopkinson, D. A. (1976) Handbook of Enzyme FEBS Lett. 121' 275-279. Electrophoresis in Human Genetics (North-Holland, Amster- 12. Sottrup-Jensen, L., Hansen, H. F., Mortensen, S. B., Peter- dam); supplements 1977 and 1978. sen, T. E. & Magnusson, S. (1981) FEBS Lett. 123, 145-148. 42. Swallow, D. M., Solomon, E. & Pajunen, L. (1977) Cytogenet. 13. Hanover, J. A., Willingham, M. C. & Pastan, I. (1983) Ann. Cell Genet. 18, 136-148. N. Y. Acad. Sci. 241, 410-423. 43. Southern, E. (1980) Methods Enzymol. 68, 152-174. 14. Kaplan, J. & Keogh, E. A. (1983) Ann. N.Y. Acad. Sci. 241, 44. Bernards, R. & Flavell, R. A. (1980) Nucleic Acids Res. 8, 442-456. 1521-1534. 15. Tack, B. F. (1983) Springer Semin. Immunopathol. 6, 259- 45. IUPAC-IUB Commission on Biochemical Nomenclature 282. (1968) J. Biol. Chem. 243, 3557-3559. 16. Sim, R. B. & Sim, E. (1981) Biochem. J. 193, 129-141. 46. Domdey, H., Wiebauer, K., Kazmaier, M., Muller, M., 17. Fey, G. H., Lundwall, A., Wetsel, R. A., Tack, B. F., de Odink, K. & Fey, G. H. (1982) Proc. Natl. Acad. Sci. USA 79, Bruijn, M. H. L. & Domdey, H. (1984) Philos. Trans. R. Soc. 7619-7623. London Ser. B 306, 333-344. 47. Minowada, J., Ohnoma, T. & Moore, G. E. (1972) J. Natl. 18. de Bruijn, M. H. L. & Fey, G. H. (1985) Proc. Natl. Acad. Cancer Inst. 49, 891-895. Sci. USA 82, 708-712. 48. Nabholz, M., Miggiano, V. & Bodmer, W. F. (1969) Nature 19. Belt, K. T., Carroll, M. C. & Porter, R. R. (1984) Cell 36, 907- (London) 223, 358-363. 914. 49. Solomon, E., Swallow, D. M., Burgess, S. & Evans, L. (1979) 20. Ogata, R. T., Schreffler, D. C., Sepich, D. S. & Lilly, S. P. Ann. Human Genet. 42, 273-281. (1983) Proc. Natl. Acad. Sci. USA 80, 5061-5065. 50. Goodfellow, P. N., Banting, G., Wiles, M. V., Tunnacliffe, 21. Nonaka, M., Takahashi, M., Natsuume-Sakai, S., Nonaka, A., Parkar, M., Solomon, E., Dalchau, R. & Fabre, J. W. M.,, Tanaka, S., Shimizu, A. & Honjo, T. (1984) Proc. Natl. (1982) Eur. J. Immunol. 12, 659-663. Acad. Sci. USA 81, 6822-6826. 51. Goodfellow, P. N., Banting, G., Trowsdale, J., Chambers, S. 22. Sottrup-Jensen, L., Stepanik, T. M., Kristensen, T., L0nblad, & Solomon, E. (1982) Proc. Natl. Acad. Sci. USA 79, 1190- P. B., Jones, C. M., Wierzbicki, D. M., Magnusson, S., Dom- 1194. dey, H., Wetsel, R. A., Lundwall, A., Tack, B. F. & Fey, 52. Jones, E. A., Goodfellow, P. N., Kennett, R. H. & Bodmer, G.sH. (1985) Proc. Natl. Acad. Sci. USA 82, 9-13. W. F. (1976) Somatic Cell. Genet. 2, 483-496. 23. Whitehead, A. S., Solomon, E., Chambers, S., Bodmer, W., 53. Fey, G. H., Wiebauer, K. & Domdey, H. (1983) Ann. N.Y. Povey, S. & Fey, G. H. (1982) Proc. Natl. Acad. Sci. USA 79, Acad. Sci. 421, 307-312. 5021-5025. 54. Fey, G. H., Domdey, H., Wiebauer, K., Whitehead, A. S. & 24. Carroll, M. C., Campbell, R. D., Bentley, D. R. & Porter, Odink, K. (1983) Springter Semin. Immunpathol. 6, 119-147. R. R. (1984) Nature (London) 307, 237-241. 55. Kushner, I. (1982) Ann. N. Y. Acad. Sci. 389, 39-48.