MOLECULAR AND CELLULAR BIOLOGY, May 1987, p. 1841-1847 Vol. 7, No. 5 0270-7306/87/051841-07$02.00/0 Copyright © 1987, American Society for Microbiology Identification and Characterization of cDNA Clones Encoding Two Homologous Proteins That Are Part of the Asialoglycoprotein Receptor MICHAEL McPHAULt* AND PAUL BERG Department ofBiochemistry, Stanford University Medical Center, Stanford, California 94305 Received 16 October 1986/Accepted 11 February 1987 The asialoglycoprotein receptor (ASGP-R) from rat liver contains the following three distinct protein species when it is analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis: RHL1 (42 kilodaltons), RHL2 (49 kilodaltons), and RHL3 (54 kilodaltons). In this paper we describe the isolation of cDNA clones encoding RHL1 and RHL2 from a cDNA constructed from rat liver mRNA. A comparison of the predicted coding sequence for RHL2 with that for RHL1 showed that these sequences are highly homologous. The library also contained numerous cDNA clones for both RHL1 and RHL2 that were derived from unspliced precursor mRNAs. Differential splicing at the 5' end of the RHL1 transcript was inferred from the finding that two different types of RHL1 cDNA were identified, each having a different 5' terminus.

The rat asialoglycoprotein receptor (ASGP-R) binds ter- MATERIALS AND METHODS minal galactose residues in the oligosaccharide chains of numerous serum glycoproteins (1). Electrophoresis of recep- Materials. All restriction enzymes were purchased from tor preparations on sodium dodecyl sulfate (SDS)- New England BioLabs, Inc., Beverly, Mass. Escherichia polyacrylamide gels has indicated that the receptor is com- coli DNA ligase, terminal deoxynucleotide transferase, and posed of three protein chains with apparent molecular RNase H were obtained from P-L Biochemicals, Inc., Mil- weights of 42,000 (RHL1), 49,000 (RHL2), and 54,000 waukee, Wis. E. coli DNA polymerase I was obtained from (RHL3) (8). Whereas RHL1 appears to be unique on the Boehringer Mannheim Biochemicals, Indianapolis, Ind., and basis of its amino acid sequence, RHL2 and RHL3 share a avian myeloblastosis virus was ob- common 101-amino acid sequence at their carboxy termini tained from Seikageiku America. The vector-primer carrying (6). The latter finding suggests that RHL2 and RHL3 may polythymidylate tails at one end (14) was supplied by differ at their N-terminal amino acid sequences or they may Takashi Yokota (DNAX, Palo Alto, Calif.). Oligo(dT)- be identical but differ in their glycosylation patterns. To cellulose was purchased from Collaborative Research, Inc., examine this question and to explore the mechanism of Waltham, Mass. E. coli MC1061 was used for all bacterial ASGP-R synthesis and assembly, we set out to isolate the transformations (3). cDNAs encoding these proteins from rat liver RNA. Total rat liver RNA was obtained by homogenizing 6 g of In this paper, we describe the isolation and characteriza- fresh rat liver from a 200-g female Sprague Dawley rat in 60 tion of cDNA clones encoding RHL1 and RHL2. A compar- ml of a 5 M guanidinium isothiocyanate solution in a ison of the amino acid sequences predicted from the cDNA Polytron homogenizer (Brinkman Instruments) for 1 min. sequence indicated that there is extensive homology be- This preparation was then layered over a 5.7 M cesium tween RHL1 and RHL2. In addition, a comparison of the chloride-0.1 mM EDTA solution and centrifuged at 27,000 amino acid sequences of RHL1 and RHL2 with their human rpm in a Beckman type SW28 rotor for 18 h (5). The resulting homologs (abbreviated Hi and H2) revealed conserved pellets were solubilized and precipitated twice from 0.25 M regions among the related pairs, as well as among all four sodium acetate (pH 5.2) by adding 2.5 volumes of ethanol. polypeptides, that may be important for ASGP-R formation Poly(A)+ RNA was isolated by oligo(dT)-cellulose chroma- or function. Numerous RHL1 and RHL2 cDNAs were tography (2) and stored as a precipitate at -70°C until use. isolated from the library that contained one or more . cDNA library construction. The cDNA library was pre- Furthermore, we isolated a cDNA clone containing the pared with the pcD expression of Okayama and coding sequence for RHL1 that possessed a 5' terminus Berg, as described previously (14). Briefly, 4.5 ,ug of the different from that of the major class of RHL1 cDNAs. deoxyribosylthymine-tailed plasmid was annealed with 10 Judging from the sequence at the 5' end of the RHL1 gene, ,ug of poly(A)+ RNA. Sufficient buffer (10 mM Tris, pH 8.35, it appears that the odd RHL1 cDNA arose from an alter- 6 mM MgCl2, 30 mM KCl) was added to achieve a final nately spliced first . This alternatively spliced mRNA volume of 60 ,ul. The first cDNA strand was synthesized by may encode an RHL1 protein whose amino terminus is adding 40 U of avian myeloblastosis virus reverse transcrip- different from the termini which are most frequently iso- tase and incubating the preparation at 37°C for 90 min; the lated. reaction was terminated by adding 1 ,ul of 20% SDS and 1 p.l of 0.5 M EDTA. The reaction mixture was extracted once * Corresponding author. with phenol-CHCl3 (1:1), and the aqueous phase was precip- t Present address: Department of Intemnal Medicine, University itated twice from 2 M ammonium acetate by adding 2.5 of Texas Health Science Center at Dallas, Southwestem Medical volumes of ethanol at room temperature. The product was School, Dallas, TX 75235. then precipitated once from 0.25 M sodium acetate and 2.5 1841 1842 McPHAUL AND BERG MOL. CELL. BIOL. volumes of ethanol at -20°C. Deoxycytidylate tails were for 12 h on LB agar plates containing ampicillin (50 pug/ml) added to the first cDNA strand with terminal deoxynucleo- and chloramphenicol (15 ,ug/ml). In screening experiments tide tranferase. The product was cleaved with HindIll en- with both oligonucleotides, the lifts were processed sequen- donuclease, and the vector fragment was annealed with the tially for 15 min each on the following solutions: (i) 10% linker fragment tailed with deoxyguanidylate tails (kindly SDS, (ii) 0.5 N NaOH-1.5 M NaCl, (iii) 1 M Tris (pH 8)-1.5 provided by R. F. Margolskee, Stanford University). After M NaCl, and (iv) 6x SSC (lx SSC is 0.15 M NaCl plus 0.15 the annealing, cyclization, and synthesis of the second M sodium citrate, pH 7.0). The filters were dried and baked cDNA strand, portions of the reaction mixture were used to for 2 h at 80°C in vacuo. The baked filters were washed for transform competent E. coli MC1061. The library, selected 2 h in 1 M NaCl-5 mM Tris (pH 8)-i mM EDTA (pH 8.0) and on ampicillin medium, contained 2.3 x 106 independent in 0.1% SDS at 42°C for 1 h. The filters were then prehybrid- transformants. A portion was grown to saturating density ized at 37°C in a solution containing 5 x SSC, 50 mM sodium (overnight), and plasmid DNA representing the entire library phosphate (pH 6.8), 7 x Denhardt solution (lx Denhardt was prepared by cesium chloride gradient centrifugation solution contains 50 mg of bovine serum albumin per ml, 50 (12). Samples (10 ,ug) ofthe plasmid DNA were digested with mg of Ficoll per ml, and 50 mg of polyvinylpyrrolidone per either SalI, PvuI, or ClaI restriction endonuclease, and after ml), 0.1% SDS, and 150 ,ug of tRNA per ml at 37°C for 12 h. the three digests were pooled, the DNA was fractionated by Hybridization was performed at 37°C in fresh portions of the electrophoresis on a 0.8% low-melting-temperature agarose same solution to which we added approximately 105 cpm of gel. The gel was sliced so as to separate regions correspond- each degenerate oligonucleotide mixture per ml. Washes ing to containing cDNA inserts of 0 to 800 base were performed once at room temperature and twice (5 min pairs, 800 base pairs to 1.2 kilobases (kb), 1.2 to 2.2 kb, 2.2 each) at the washing temperature in 4x SSC. The wash to 2.8 kb, 2.8 to 5.8 kb, and more than 5.8 kb. Then the DNA temperature was determined empirically for each oligonucle- was isolated from the slices by melting the agarose, cyclized otide (42°C for RHL1 and 40°C for RHL2-RHL3). Each by ligation, and introduced into bacteria. Each sublibrary positive colony was purified twice by replating and contained 2 x 105 to 5 x 105 independent transformants with rehybridization. cDNA inserts corresponding to the size classes indicated Screening the . The genomic libraries de- above. scribed above were screened by hybridization with cDNAs Genomic library construction. A DNA blot analysis of known to encode RHL1 and RHL2 (p2AA and pllAl, restriction enzyme digests of rat genomic DNA showed that respectively). Two clones from library A hybridized to the the entire coding region was contained on a 15-kb EcoRI RHL1 cDNA and were subsequently found to contain the fragment. A 40-p.g portion of genomic DNA prepared from entire gene. One clone, which hybridized to the RHL2 isolated rat liver nuclei (19) was digested with EcoRI endo- cDNA, was identified from library B and was found to nuclease and loaded onto a 0.6% low-melting-temperature contain the entire coding sequence for RHL2. agarose gel. The fragments migrating in the region corre- DNA sequencing. Clones were sequenced after subcloning sponding to 15 to 18 kb were isolated, and 400 ng was ligated in appropriate M13 vectors (either M13mpl8 or M13mpl9 with 250 ng of purified arms of EcoRI endonuclease-digested A EMBL-3. Following in vitro packaging (16), we obtained a library with a complexity of 5 x 105 recombinants (library A). A B C To isolate the RHL2 genomic sequences, total genomic rat liver DNA was partially digested with Sau3AI endonuclease. The digestion was titrated so that the majority of fragments were between 15 and 30 kb long. A 40-Rg portion of partially digested genomic DNA was fractionated on a 0.4% low- melting-temperature agarose gel, and the 15- to 30-kb frag- ments were recovered. A 500-ng portion of this DNA was 2720- v ligated with 400 ng of purified arms prepared from BamHI 2190- _ endonuclease-digested X EMBL-3. After in vitro packaging, we obtained a library with a complexity of 2 x 106 recom- binants (library B). Oligonucleotide probes. Degenerate oligonucleotides were synthesized based on previously published protein se- quences (6). The RHL1 hybridization probe was a degener- ate 20-mer molecule 640w (5' TCATCATT CCAATGA CCATC 3'), and the RHL2-RHL3 probe was a 17-mer molecule A A (5' AT CCA GAT ATG AAA TGC 3'). T G G G C FIG. 1. RNA blot analysis of rat liver RNA. The RNA samples Screening the cDNA library. The oligonucleotide probes were denatured and electrophoresed as described in Materials and were treated with kinase with [-y-32P]ATP by using polynu- Methods. Lane A contained denatured end-labeled AccI-cut A cleotide kinase under standard conditions. Colonies from the DNA. Lanes B and C contained poly(A)+ RNA samples from rat pcD library were plated onto LB plates containing ampicillin liver hybridized with probes specific for RHL1 and RHL2, respec- (50 jig/ml), transferred to nitrocellulose disks, and amplified tively (see text). VOL. 7, 1987 PROTEINS THAT ARE PART OF THE ASGP-R 1843

ggt gcc tag att agc ccc ctc ctc ctt ctc gcc tgc tgt cct gct gtc cca ggt tta acc cec ttt ttc tcc ttg gac tca ggc tgc ctc cgg aag cag agt agc tct cta tac att taa cag tcc cag atc tgt ctc cag cct agg gcc atc

ATG GAG MG GAC TTT CM GAT ATC CAG CAG CTG GAC TCT GAG GM MC GAC CAT CAG met glu lys asp phe gln asp ile gln gln leu asp ser glu glu asn asp his gin 19 CTC ATT GGC GAT GAG GM CM GGC TCT CAT GTG CAG MT CTT AGG ACC GM MT CCA leu ile gly asp glu glu gln gly ser his val gln asn leu arg thr glu asn pro 38 CGT TGG GGA GGA CAG CCT CCT TCC AGG CCC TTT CCA CAG CGC CTC TGC TCC MG TTC arg trp gly gly gln pro pro ser arg pro phe pro gln arg leu cys ser lys phe 57 CGC CTC AGT CTG CTC GCC CTG GCC TTC MC ATT CTC CTG CTG GTG GTC ATC TGT GTG arg leu ser leu leu asla leu ala phe asn ile leu leu leu val val ile cys val 76 GTT TCA TCC CM AGC ATG CAG CTG CM MG GAG TTC TGG ACC CTG MA GM ACC TTG val ser ser gln ser met gln leu gln lys glu phe trp thr leu lys glu thr leu 95 AGC MC TrC TCC ACC ACC ACC CTG ATG GAG TTC AAG GCT CTG GAC TCC CAC GGA GGT ser asn phe ser thr thr thr leu met glu phe lys ala leu asp ser his gly gly 114 0 AGC AGG AAT GAC MC TTG ACT TCT TGG GAA ACA ATA CTG GAG AM MG CAG MG GAC ser arg asn asp a n leu thr ser trp glu thr ile leu glu lys lys gln lys asp 133 ATA MA GCA GAT CAC TCC ACG CTG CTC TTC CAC CTG AAG CAC TTC CCC CTG GAT CTG ile lys ala asp his ser thr leu leu phe his leu lys his phe pro leu asp leu 152 GCA ACC CTG ACC TGT CAG CTG GCG TTC TTC CTG AGC MC GGC ACA GM TGC TGC CCC arg thr .leu thr cys gln leu ala phe phe leu ser a:n gly thr glu cys cys pro 171 GTT MC TGG GTG GAG TTT GGT GGA AGC TGC TAC TGG TTT TCT CGG GAT GGG CTC ACC val san trp val glu phe gly gly ser cys tyr trp phe ser arg asp gly leu thr 190 * TGG GCT GAG GCT GAC CAG TAC TGC CM ATG GAG ATT GCC CAT CTG CTG GTC ATC MC trp ala glu ala asp gln tyr cys gln met glu asn ala his leu leu val ile asn 209 TCA AGG GAG GAG CAG GM TTC GTT GTA MG CAC AGG GGC I GCG TTT CAC ATT TGG A_ ser arg glu glu gln glu phe val val lys his arg gly ala phe his ile trp ile 228 GGT CTC ACC GAC MG GAT GGC TCC TGG MAA TGG GTG GAT GGG ACG GM TAT AGA AGT gly leu thr asp lys asp gly ser trp lys trp val asp gly thr glu tyr arg ser 247 MC TTC MG MT TGG GCT TTC ACT CAG CCA GAT MC TGC CAG GGG CAT GAA GAG GGG san phe lye asn trp ala phe thr gln pro asp asn cys gln gly his glu glu gly 266 GGA AGT GM GAC TGT GCT GAM ATC CTG TCA GAT GGC CTC TGG MT GAC MAC TTC TGC gly ser glu asp cys ala glu ile leu ser asp gly leu trp asn asp asn phe cys 285 CAG CAG GTG MAC CGC TGG GCT TGT GAM AGG AAA CGG GAC ATC ACC TAC TAG gag tct gln gln val asn arg trp ala cys glu arg lys arg asp ile thr tyr 0

gct cta cta tgt ctt tgt cac cet ccg gga acc ccg cat cac tca tta gga FIG. 2. Nucleotide and predicted amino acid sequence for RHL2 clone 13D. The single apparent transmembrane segment is underlined (residues 59 to 77). The sites of potential glycosylation are indicated by dots. The single discrepancy with the partial sequence determined previously (6) is marked with an asterisk. The region detected by the synthetic oligonucleotide probe is enclosed in a box.

(20) by using [35S]dATP and the Sanger dideoxy sequencing sperm DNA per ml, 20 ,ug of poly(A) per ml, and 20 ,ug of procedure (15). Reaction preparations were run on 6% poly(G) per ml. The filters were then hybridized with DNA gradient polyacrylamide gels, dried, and exposed without probes specific for RHL1 (p51XX) or RHL2 (pllAl) labeled intensifying screens to Kodak XAR-5 film. by incorporation of [32P]dCTP primed by random hexamers RNA blot analysis. Two 10-,ug samples of poly(A)+ RNA (7). The probe concentration was approximately 106 cpm/ml, prepared from a rat liver were denatured with glyoxal, and the specific activities were approximately 3 x 108 loaded onto duplicate slots of a 1.5% agarose gel cast, and cpm/,lg in both cases. The filters were washed twice (20 min electrophoresed in 10 mM sodium phosphate (pH 6.8) (18). each) in O.1lx SSC-0.1% SDS at 60°C. The filters were then DNA markers (AccI endonuclease-cleaved X DNA labeled at exposed to Kodak XAR-5 film with intensifying screens for the 3' end with [ct-32P]ATP) were also denatured with glyoxal 2 days at -70°C. and electrophoresed on the same gel. Following electropho- DNA blot analysis. Rat liver genomic DNA was cleaved resis, the gel was blotted onto Gene Screen Plus (New with restriction endonucleases, and the digest was electro- England Nuclear Corp., Boston, Mass.), and then the filter phoresed on a 0.6% agarose gel. The gel was denatured, and was cut into two pieces, each containing identical marker the DNA fragments were transferred to Gene Screen Plus and sample lanes. These filters were prehybridized at 42°C (New England Nuclear Corp.). Prehybridization, hybridiza- for 12 h in a solution containing 50% formamide, 5x SSC, tion, and labeling were as described above for the RNA blot 7.5 x Denhardt solution, 1% SDS, 50 p.g of denatured salmon analysis. 1844 McPHAUL AND BERG MOL. CELL. BIOL.

Hi 1 ftEYQ.LqLESD1*HQLR KGPPQPLLL.PRLLLSLLU.,VVV Ri tTKDY ..DN>NDNlNQLQR GPP?APRLLQRL-C GFRLEtSLLSI1.LVVV R2 SESDFMI S&ND-LQLIGDEEQGSHVQNLRTENPRWGGQ'P..RPFPQtLC. KFRLSLLAL.AFNIL-tLVVI 75 H2 14XADF I S NDR.PFHQGEGPGTRRLNPRRGNPFLKGFP AQPLAQO8LC$VCFSL A. FNI "V

00000 0 0 0 0 00 0 00 0oooooooo 00000 00000000oo

...... Hl C..IIG42NSO .QEELRGtRETF .AS-TEAEVKGLSTQOGNVGRKMKSLESQL .K.OLSEMS SL LV-F Rl ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~LHCBXNS. EDLRVL QNF-3-TVST-EDQ ...... RKKLES HEPLRERLLWKiL

R2 FVV ~~~~~~...SMQWQKEFWTLKETL&gWSTTtLMEFKALDSHRGSRNDNLTSCETIL'KQK0IKA TLLFWLHF 150 H2 £VrG SWQAELRSLKEAFS-0.SSSSLTEVQAISTHtGSVGDKITSLGAKL-..QQLKA.DALLF HF 0o0oo ooo o o00 o000 0o o o o0ooo o 000 0 0 0000 00000000

*...... -...... Hi VSDLiSLSCQMAALQGQUSERT CPV -HERS W-1RSGKAWAD#DN.eRL9DA VVVTS- KPVQHN R1 VSt,VRSLS>~~~~..AALRG .-SERI ,,.tI,,NYG,,YER%, SVKP TEADKT-.,¢L9N.- V VTT -W VQ ~~~~~~Fff If4I fI II1II I R2 PLOLUL.4L.AFFLS ME 2 2FGG25GL,E H2 PVI)LRFVkCA>ELLHSWx- T£¢,,,,HG,C SGKAgAEA.EK ,t¢r^8tIN4 KIVQH o o00 00 000 o000000 ooooooo o o000 ooo00 000 0000 000 0 0 0

*-- ...... * ...... Hi IEPVNTI4MG WM-QNWJV.P D FM.PPED"YXLtEXA AFTDXRE DVCQRPYWVC R1MGPLNT IG GYID G"PDQ AHX -HF CH. DVPY VCg AFTQPDN*QgEE 'AEIL NF VN -MCE H2 TNPFNTNICL2TDSD( t.RHNYS"AVTD'-N'H- ELS RHNYXQP-.AWTODF¢LQVYIWNcE 0 000000 0000000000 0o o 0o0o0o 0 o000000 0o 00 o ooo oo o o

Hi TELDKASQEPPLL Ri TELGKAN

R2 RKRDITY H2 KRRNATGEVA 0 0 FIG. 3. Alignment and comparison of the human and rat ASGP-Rs. Ri and R2, RHL1 and RHL2, respectively; Hi and H2, human homologs of RHL1 and RHL2, respectively. The positions at which the amino acids and Hi and RHLi are conserved are indicated by dots; the positions at which the amino acids of H2 and RHL2 are conserved are indicated by open circles. Where the amino acids in RHLi and RHL2 are identical, they are marked by a vertical line between them. Those residues that are identical in all four proteins are indicated by shading.

RESULTS the probe; 23 positive clones were detected. Six representa- tive clones were chosen for this study, one of which, 13D, Screening of cDNA clones with oligonucleotide probes. The was sequenced in its entirety. Figure 2 shows the sequence degenerate oligonucleotide probe for RHL1 hybridized to 11 of clone 13D, the putative RHL2-RHL3 cDNA. This clone cDNA clones from the sublibrary containing 1- to 2-kb contains a 903-base pair open reading frame which is sufficient- inserts. DNA sequencing established that five of these to code for a 301-amino acid protein with a predicted clones contained intact RHL1 coding regions which were the molecular weight of 34,946. Although the other clones were same as the region reported previously for RHL1 (9). Clone only partially characterized, it is clear that some of these p51XX contained the fewest G- C pairs at the 5' end of the clones contain intact coding regions, whereas others lack cDNA (12 pairs) and, therefore, was used in the tests for sequences at their 5' ends or contain introns that interrupt RHL1 expression (13). the coding sequence. Clone 13D was chosen for use in A region common to RHL2 and RHL3, but with only expression experiments (13). minimal homology with RHL1, was used to construct the Predicted protein sequence. Aligning the predicted protein oligonucleotide probe for screening the library for RHL2 and sequences of RHL1 and RHL2 showed that they are identi- RHL3 cDNAs. Out of 50,000 colonies of the total library, 3 cal at 144 of 301 positions, for an overall level of homology hybridized with the RHL2-RHL3 probe and were purified as of 48% (Fig. 3). Notably, the nucleotide sequence indicates described above. One clone, pllAl, contained a 1-kb insert that RHL2, like RHL1, is not made with a classic signal and was shown by DNA sequencing to encode a portion of peptide. Furthermore, RHL2 appears to have a single the RHL2-RHL3 coding sequence. This cDNA was used as transmembrane segment (amino acid residues 59 to 77). a hybridization probe in an RNA blot analysis of rat liver These features are compatible with previous deductions poly(A)+ RNA. Figure 1 shows that the RHL2-RHL3 about the orientation of the rat ASGP-R; the N terminus mRNA was about 1.6 kb long (somewhat larger than the occurs on the cytoplasmic face of the membrane, and the RHL1 mRNA). carboxy terminus is oriented externally (4). It is notable that This information suggested that the full-length RHL2- RHL2 contains additional sequences in its cytoplasmic tail RHL3 cDNA was most likely in the sublibrary containing compared with RHL1 (residues 23 to 42). In fact, this exact cDNA insertion of 1 to 2 kb. Accordingly, this set of clones configuration has been recently shown for the human coun- was screened by hybridization, using the pllAl cDNA as terpart of RHL2 (17). VOL. 7, 1987 PROTEINS THAT ARE PART OF THE ASGP-R 1845 A. FIG. 4. Southern analysis of rat genomic DNA. (A) Restriction map of the RHL2 gene. Restriction enzymes: B, BamHI; R, EcoRI, P, PstI. The solid box over the central portion of the restriction map R P RBRB P R corresponds to the sequences recognized by the cDNA probe. (B) Electrophoresis and transfer was performed as described in Mate- 1kb rials and Methods. Each lane contained 10 ,ug of rat genomic DNA digested with one restriction enzyme or sequentially with two B. different enzymes. Lane A, XbaI; lane B, Sall plus XhoI; lane C, A B C D E F G H I J K L M XhoI; lane D, SalI plus EcoRI; lane E, markers; lane F, EcoRI; lane G, Sall; lane H, BamHI plus EcoRI; lane I, BamHI plus Sall; lane lane K, BamHI; lane L, PstI; lane M, BamHI I J, BamHI plus XhoI; 11 plus PstI. The blot was analyzed by hybridization with a cDNA probe (pllA1) that contained a 1-kb insertion specific for the 3' end 23120- fA_ of the RHL2-RHL3 coding sequence shown above.

I- to cut within and flanking the gene indicated that there is only one structural gene homologous to the common se- quence of RHL2 and RHL3 (Fig. 4). Second, the sequences 4361- &4 of six representative clones for RHL2-RHL3 from a com- 0 .!.? '. 7! .4 mon internal restriction site for approximately 250 base W: pairs toward the 3' end show complete identity at all resi- I dues. lig 0 . cDNA library contains numerous unspliced precursors. Our 2027- characterization of RHL1 and RHL2 cDNA clones revealed that there were differences among some of the clones of a particular type. Part of this heterogeneity was due to the 0 presence of interruptions within the RHL1 or RHL2-RHL3 1200- coding regions. Figure 5 shows the presumptive structures of 0. .-. two RHL1 cDNA clones relative to their genomic version of one RHL2-RHL3 cDNA relative to its genomic counterpart. Quite clearly, the interruptions correspond to introns that were not removed from the RNA from which these cDNA copies were derived. In the 1- to 2-kb sublibrary, approxi- RHL2 and RHL3 arise from a single gene. Despite the fact mately 50% of the RHL1 cDNAs and 50% of the RHL2 that RHL2 and RHL3 share a common amino acid sequence cDNAs still contained one or more introns. In the total over large portions of their carboxy termini, the possibility cDNA library as many as one-third to one-half of the cloned exists that two separate but related genes give rise to the two cDNAs for RHL1 and RHL2 contained one or more introns. polypeptides. However, the results of two experiments lead The reason for such a high level of partially spliced or us to doubt that. First, DNA blot analyses of rat genomic unspliced RNAs is not clear. DNA that had been digested with restriction enzymes known RHL1 transripts exist in at least two forms. Most of the

RHL-1 GENOMIC DNA INTRON 16 18 EXON 1 E2 12 E3 I3 E4 14 E5 15E6 E7 E8 E9

4AA

RHL-2 GENOMIC DNA ES E6 16 E7 17 E8 I9 E9

I / CLONE, llAl

FIG. 5. Origin of unspliced precursors found in the cDNA library. The RHL1 gene (11; McPhaul, unpublished data) and a portion of the RHL2 gene (McPhaul, unpublished data) are shown. Exon (E) sequences are shown as open bars; intron (I) sequences are shown as solid lines. The pattern of introns that remain does not appear to favor the retention of any particular intron. As shown, clone 4AA retains intron 1, and clone 2AB contains intron 7. Clone 11Al is a partial-length cDNA, and its 5' terminus corresponds to sequences in intron 6 and RHL2. 1846 McPHAUL AND BERG MOL. CELL. BIOL.

Genomic Sequence Major 5' splice junction CTGTCCACCTGGTTCTTCAGGCTTCAGCCCCCTCCTTAGCCTGGGCTCTTCGTGGTGCTGAGGGACCT7 AGTACA Alternate 5' splice junction GCAGGGGAGAATGGATGGGTGGCTCCAGGACAGAGTGGCTGGAGGGGTCCTGAATGTGCAGTTCACATGTG @ T

3' Splice junction GTGTAA .. INTRON..I *AAACTTGGCTGTCC ICTGCCCCAGT

cDNA Sequence m metcysserserhisvalglnsercys 2AA GGAGAATGGATGGGTGGCTGCCAGGACAGAGTGGCTGGAGGGGTCCTGAATGTGCAGTTCACATGTGCAGTCCTGC proseralailemetthrlysasptyr CCCAGTGCTATCATGACAAAGGATTAT

51 CTGTCCACCTGGTTCTTCAGGCTTCAGCCCCCTCCTTAGCCTGGGCTCTTCGTGGTGCTGAGGACCTTCAGTCC metthrlysasptyr TGCCCCAGTGCTATCATGACAAAGGATTAT FIG. 6. Origin of alternatively spliced cDNA for RHL1. A portion of the genomic sequence of RHL1 is shown, along with the sequences at the 5' termini of two RHL1 cDNAs. The 5' and 3' splice junctions are enclosed in boxes in the genomic sequences. Utilization of the major 5' splice junction yields mRNA with a 5' end corresponding to clone 51, whereas splicing from the alternate 5' splice junction yields mRNAs containing the 5' terminus found in clone 2AA. The sequence at the 3' splice junction marking the start of sequences common to the two cDNAs is underlined.

RHL1 clones that were characterized had 5' ends that were its ability to express the RHL1 polypeptide after transfection identical to those reported previously (9). However, one into rat HTC cells (13). Identification of the RHL2 cDNA clone (p2AA) contained, in addition to all of the sequences clone relied on the occurrence of the shared amino acid present in the most frequently encountered RHL1 cDNAs, sequence in the carboxy-terminal one-third of RHL2 and extra nucleotides at its 5' end. The origin of these extra 5' RHL3 (6) and on its expression of RHL2 in rat HTC cells nucleotides became apparent in a comparison with the 5' end (13). Indeed, cotransfection with pcD-RHL1 and pcD-RHL2 of the RHL1 genomic sequence (Fig. 6). The extra nucleo- produced functional ASGP-R that contained RHL1 and tides of clone p2AA are present in the first intron. Thus, the RHL2 (13); transfections with either clone alone did not abundant RHL1 cDNA clones arose from mRNAs that were yield ASGP-R (13), nor was there detectable accumulation of spliced between the major 5' splice junction and a 3' splice either polypeptide (McPhaul, unpublished data). junction just upstream of the beginning of the RHL1 protein Isolation of the RHL2 cDNA clone allowed a comparison coding sequence. Clone p2AA appears to represent an of the complete amino acid sequence of RHL2 with that of mRNA that was spliced differently, from a 5' splice junction RHL1 (6, 9) and between each of these polypeptides and within the intron to the same 3' splice site. However, in this their counterparts, Hi and H2 (17), from the human ASGP-R alternatively spliced RNA, a new AUG codon was intro- (Fig. 3). Except for the additional 19 amino acids in the duced upstream of the RHL1 start site. Since it is in frame amino-terminal region of RHL2, RHL1 and RHL2 share with the RHL1 coding sequence, this mRNA could encode significant homology throughout their corresponding se- an RHL1 that is 13 amino acids longer at its N terminus. quences. Of the 301 amino acids in RHL2, 144 (48%) are identical to the corresponding positions of properly aligned DISCUSSION RHL1 (Fig. 3). Moreover, the region between residues 50 and 76 in RHL2, which contains the putative transmembrane The rat ASGP-R contains three protein chains, RHL1 and segment, is nearly identical to the corresponding region of a pair of closely related chains, RHL2 and RHL3 (6). In the RHL1 (20 of 26 residues). In addition, the sequence of RHL2 present work, we used the pcD expression vector and between residues 177 and 300, which corresponds to the cloning procedure introduced by Okayama and Berg (14) to region involved in ligand binding (20), is 62% identical to the isolate cDNA clones that encode the amino acid sequences analogous region of RHL1 (77 of 123 residues). of RHL1 and RHL2. The RHL1 cDNA clone was identified Another significant homology emerged from a comparison by the correspondence between its nucleotide sequence and of the amino acid sequences of the human and rat ASGP-R the amino acid sequence of RHL1 (6), by the nucleotide proteins (Fig. 3). First, Hi and RHL1 lack virtually the same sequence of a previously isolated RHL1 cDNA (9), and by number of amino acids compared with H2 and RHL2, VOL. 7, 1987 PROTEINS THAT ARE PART OF THE ASGP-R 1847 respectively (residues 23 to 42 in RHL2). Moreover, this gap acid-cellulose. Proc. Natl. Acad. Sci. USA 69:1408-1412. occurs in the same relative position in both proteins. In a 3. Casadaban, M. J., and S. N. Cohen. 1980. Analysis of gene comparison of RHL1 and HI, there are 221 identical resi- control signals by DNA fusion and cloning in Escherichia coli. dues out of a total of 284, and RHL2 and H2 share 190 amino J. Mol. Biol. 138:179-207. 4. Chiacchia, K. B., and K. Drickamer. 1984. Direct evidence for acids of their total of 301. There are numerous instances the transmembrane orientation of the hepatic glycoprotein re- where RHL1 and Hi have the same amino acid at a ceptors. J. Biol. Chem. 259:15440-15446. particular position, while RHL2 and H2 share a different 5. Chirgwin, J. M., A. E. Przybyla, R. J. McDonald, and W. J. amino acid at the analogous position. Rutter. 1979. Isolation of biologically active ribonucleic acid There is also a striking amino acid sequence homology from sources enriched in ribonuclease. Biochemistry 18:5294- when the amino acid sequences of the four polypeptide 5299. chains are compared with each other. Considering the 284 6. Drikamer, K., J. F. Mamon, G. Binns, and J. 0. Leung. 1984. amino acids that can be aligned in the four proteins, 113 Primary structure of the rat liver asialoglycoprotein receptor. J. residues are identical. Furthermore, the locations of all of Biol. Chem. 259:770-778. 7. Feinberg, A. P., and B. Vogelstein. 1983. A technique for the cysteine and tryptophan residues in the four proteins are radiolabelling DNA restriction endonuclease fragments to high also virtually invariant. specific activity. Anal. Biochem. 132:6-13. Our findings suggest that there are probably significant 8. Harford, J., M. Lowe, H. Tsunoo, and G. Ashwell. 1982. similarities in the way RHL1 and RHL2 and Hi and H2 Immunological approaches to the study of membrane receptors. associate to form functional ASGP-Rs. Now that a func- J. Biol. Chem. 257:12685-12690. tional ASGP-R can be reconstituted by transfection of ex- 9. Holland, E. C., J. 0. Leung, and K. Drikamer. 1984. Rat liver pressible RHL1 and RHL2 cDNAs, site-directed mutagene- asialoglycoprotein receptor lacks a cleavable NH2-terminal sig- sis and gene reconstructions can be used to explore how the nal sequence. Proc. Natl. Acad. Sci. USA 81:7338-7342. two polypeptide chains are synthesized and associate to 10. Hsueh, E. C., E. C. Holland, G. M. Carrera, and K. Drickamer. 1986. The rat liver asialoglycoprotein receptor polypeptide must form a functional ASGP-R. be inserted into a microsome to achieve its active conformation. An unanticipated outcome of the cloning experiments was J. Biol. Chem. 261:4940-4947. the finding that there were a substantial number of RHL1 11. Leung, J. O., E. C. Holland, and K. Drikamer. 1985. Charac- and RHL2 cDNAs that retained one or more introns. We terization of the gene encoding the major rat asialoglycoprotein presume that these cDNAs are copies of partially processed receptor. J. Biol. Chem. 260:12523-12527. poly(A)+ mRNAs, but the reason for their abundance is 12. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecular unclear and deserves further study. The more enticing cloning, p. 93-94. Cold Spring Harbor Laboratory, Cold Spring finding was the recovery of RHL1 cDNA clones which Harbor, N.Y. indicate that the first intron of the primary transcript of the 13. McPhaul, M., and P. Berg. 1986. Formation of functional asialoglycoprotein receptor after transfection with cDNAs en- RHL1 gene can be differentially spliced to yield mRNAs coding the receptor proteins. Proc. Natl. Acad. Sci. USA with different 5'-terminal sequences (Fig. 6). One conse- 83:8863-8867. quence of the alternate splicing is that the less abundant 14. Okayama, H., and P. Berg. 1983. A cDNA cloning vector that mRNA could specify an additional 13 amino acids at the permits expression of cDNA inserts in mammalian cells. Mol. amino terminus of RHL1. At present, however, there is no Cell. Biol. 3:280-289. indication that such variant forms exist in vivo, nor is any 15. Sanger, F., A. R. Coulson, B. G. Barrell, A. J. H. Smith, and special function for an amino-terminal extension on RHL1 B. A. Roe. 1980. Cloning in single-stranded bacteriophage as an apparent. Thus, the physiological significance of the two aid to rapid DNA sequencing. J. Mol. Biol. 143:161-178. forms of RHL1 mRNA is not evident. 16. Scherer, G., J. Telford, C. Baldari, and V. Pirrotta. 1981. Isolation of cloned genes differentially expressed at early and late stages of Drosophila embryonic development. Dev. Biol. ACKNOWLEDGMENTS 86:438-447. 17. Spiess, M., and H. F. Lodish. 1985. Sequence of a second human This work was supported by Public Health Service grant asialoglycoprotein receptor: conservation of two receptor genes GM-13235 from the National Institute of General Medicine Sciences during evolution. Proc. Natl. Acad. Sci. USA 82:6465-469. and by a fellowship from the Helen Hay Whitney Foundation. 18. Thomas, P. S. 1983. Hybridization of denatured RNA trans- We thank Eleanor Olson and Brenda Hennis for their excellent ferred to nitrocellulose paper. Methods Enzymol. 100:255-286. secretarial assistance in the preparation of the manuscript. 19. Wigler, M., R. Sweet, G. K. Sim, B. Wold, A. Pellicer, E. Lacy, T. Maniatis, S. Silverstein, and R. Axel. 1979. Transformation of LITERATURE CITED mammalian cells with genes from procaryotes and eucaryotes. 1. Ashweli, G., and J. Harford. 1982. Carbohydrate-specific recep- Cell 16:777-785. tors of the liver. Annu. Rev. Biochem. 51:531-540. 20. Yanisch-Perron, C., J. Vieira, and J. Messing. 1985. Improved 2. Aviv, H., and P. Leder. 1972. Purification of biologically active M13 phage cloning vectors and host strains: nucleotide se- globin messenger RNA by chromatography on oligothymidylic quences ofthe M13mpl8 and PUC 19 vectors. Gene 33:103-119.