Proc. Nati. Acad. Sci. USA Vol. 89, pp. 10925-10929, November 1992 Medical Sciences Molecular cloning and primary structure of the human blood group RhD polypeptide (erythocyte membrane/RhD /cDNA/ analysis/PCR) CAROLINE LE VAN KIM, ISABELLE MOURo, BAYA CHtRIF-ZAHAR, VIRGINIE RAYNAL, CATHERINE CHERRIER, JEAN-PIERRE CARTRON*, AND YVES COLIN Unite Institut National de la Sante et de la Recherche M6dicale U76, Institut National de Transfusion Sanguine, 6 rue Alexandre Cabanel, 75015 Paris, France Communicated by Vincente T. Marchesi, July 14, 1992

ABSTRACT The RH (rhesus) blood group from independently by our group (7) and others (8) from a human RhD-positive donors is composed oftwo homologous structural bone marrow cDNA library (clones RhIXb and Rh30A, , one of which encodes the Cc and Ee polypeptides, respectively). It has been shown subsequently by Southern whereas the other, which is missing in the RhD-negative blot analysis that the RH locus is composed of two structural condition, encodes the D that carries the major antigen homologous genes, one encoding the RhD polypeptide and of the RH system. Recently, different splicing isoforms tran- the second encoding the Cc and Ee polypeptides (9), most scribed from the CcEe gene were isolated. We report now the likely by a mechanism of alternative splicing of a primary characterization of two other Rh clones, RhIl and RhXHI, transcript (10). These findings are consistent with a two-locus generated by alternative choices for poly(A) addition sites that model of Rh inheritance (11) and suggest that the two Rh were identified as the RhD gene transcripts. That these cDNAs genes have evolved through duplication of a common ances- represented the RhD messenger and that the previously de- tor gene. In the genome of Rh-negative individuals the RhD scribed Rh clones were derived from the CcEe gene was gene is missing, either as a result of a gene deletion or demonstrated by amplification of RhII/XIII sequences only possibly as a survival of the ancestral RH locus before the from D-positive genomes and by cloning and sequencing ofD- duplication occurs (9). and CcEe-specifIc gene fragments. The predicted traation The Rh antigen specificity of the protein encoded by the product of the RhD mRNA is a 417-amino acid protein (Mr = cDNAs cloned previously (7, 8) has not been formally 45,500) that exhibited a similar membrane oration with 13 established. Partial RhD protein sequence analysis suggested bilayer-sning domains compared with the polypeptide en- that neither clone encoded the postulated D polypeptide (8), coded by the CcEe gene. The D and Cc/Ee polypeptides differ but recent studies with polyclonal raised against by 36 amino acid substitutions (8.4% divergence), but the NH2- synthetic peptides indicated that such a clone more likely and COOH-terminal regions of the two are well encodes the E or e (P. Hermand, I.M., M. Huet, C. conserved. Similarly, five of the six cysteine residues of the Bloy, K. Suyama, J. Goldstein, J.-P.C., and P. Bailly, Cc/Ee proteins were conserved in the D protein, inluding the unpublished data). This report describes the characterization unique exofacial cysteine, which is critical for antigenic reac- ofthe RhD gene product that encodes the major antigen RhD tivity. The between the Cc/Ee and D of the RH system.t proteins supports the concept that the genes encoding these polypeptides have evolved by duplication ofa common ancestor MATERIALS AND METHODS gene. cDNA Library Screening. A human bone marrow Agtll The is important in transfusion and cDNA library (Clontech) was screened with the previously clinical medicine because it is involved in the hemolytic isolated RhIXb cDNA (7). disease of the newborn, transfusion reactions, autoimmune Amplification of Genomic Sequences. One microgram of hemolytic anemias, and hemolytic reactions of nonimmune human leukocyte DNAs from donors of different Rh pheno- origin (1-3). This system is recognized as one of the most types was used as templates in PCR experiments (40 cycles: complex polymorphisms in man (4), but until recently the 940C for 1 min, 460C for 1 min, and 720C for 1.5 min) as structural basis ofblood group antigens has remained poorly described (10) using a Perkin Elmer amplification (con- understood (for review, see ref. 5). Currently, individuals are taining 2.5 units of Taq polymerase) and 300 ng of each classified as Rh-positive and Rh-negative according to the primer. PCR A was performed between amplimers common presence or the absence ofthe major D antigen on the surface to the RhIXb and RhXIII cDNAs: Al (5'-GGTGTTGTAAC- oftheir erythrocytes, but >46 other antigens, including those CGAG-3'; sense primer, positions 941-955; + 1 taken as the ofthe Cc and Ee series, have been identified (6). The absence first residue of the initiator AUG) and A2 (5'-ATCATGC- or the severe reduction of all of these antigens on erythro- CATTGCCG-3'; antisense primer, positions 1076-1062). cytes from Rh-deficient individuals (Rhnun, Rhmod) is associ- Amplification products were analyzed on agarose gel and ated with morphological and functional abnormalities and a hybridized at 450C with an RhXIII-specific probe, pXIII mild (3). (5'-GGCTCCGACGGTATC-3'; positions 1048-1062), or at The D, c, and E epitopes are carried by at least three 650C with the Rh cDNA probe (an equimolar mixture of the distinct homologous unglycosylated integral membrane pro- RhIXb and RhXIII cDNAs). PCR B was performed between teins of apparent Mr 30,000-32,000 that share a common amplimers A3 (5'-TAAGCAAAAGCATCCAAGAA-3'; po- NH2-terminal sequence (reviewed in ref. 5). Using this struc- sitions 1252-1271, sense primer common to RhIXb and tural information, the same Rh mRNA has been cloned RhXIII) and A-XIII (5'-ATGGTGAGATTCTCCTC-3'; po-

The publication costs ofthis article were defrayed in part by page charge *To whom reprint requests should be addressed. payment. This article must therefore be hereby marked "advertisement" t~he sequences reported in this paper have been deposited to the in accordance with 18 U.S.C. §1734 solely to indicate this fact. GenBank data base (accession nos. X63094 and X63097).

10925 Downloaded by guest on October 2, 2021 10926 Medical Sciences: Le Van Kim et al. Proc. Natl. Acad Sci. USA 89 (1992) sitions 1437-1421, antisense primer specific of the RhXIII suggest that the RhXIII mRNA is transcribed from a distinct cDNAs) and probed with the Rh cDNAs. Rh gene rather than from an allelic form of the CcEe gene Southern Blot Analysis. Human leukocyte DNA was di- coding for RhIXb. We assumed that the RhXIII-encoded gested with EcoRI, resolved by agarose gel electrophoresis, polypeptide was a likely candidate for the RhD protein, and transferred to a nitrocellulose membrane (12). Hybrid- although residues 42-54 were different from those attributed izations were performed at 650C in 5x SSPE (SSPE = 0.18 to the specific RhD protein by Avent et al. (8). Because ofthis M NaCl/10 mM sodium phosphate/1 mM EDTA), 3x Den- difference, further evidence was needed to establish unam- hardt's solution, and 0.1% SDS with Rh exon-specific probes biguously whether the RhXIII cDNA corresponded to the (9). Final washes were performed at 650C for 15 min in lx RhD gene transcript. SSC (SSC = 0.15 M NaCl/15 mM sodium citrate) and 0.1% Amplification ofRhXMI Sequences from RhD-Positive DNA. SDS. To determine whether the RhXIII cDNA derived from the Genomic Libraries. Three hundred micrograms of leuko- RhD gene, genomic DNA extracted from RhD-positive cyte DNA from a DCCee donor was digested by EcoRI and (DCCee, DccEE, Dccee) or RhD-negative (ddccee and ddc- resolved by 0.6% agarose gel electrophoresis. Gel slices of 1 cEE) donors was analyzed in different PCR experiments. At cm thick were cut, DNA fractions were electroeluted, and an first, primers were selected to amplify identical size products aliquot was hybridized with exon-specific probes. Three from the CcEe and D genes (or cDNAs), which could be hundred nanograms of each positive fraction was ligated with subsequently distinguished by hybridization to a gene- 1 jig of Agtll EcoRI arms and packaged in vitro. For each specific oligonucleotide located in a region of greater se- library, 3 x 105 plaques were transferred to nitrocellulose quence divergence. Accordingly, using the Al and A2 am- filters and hybridized with PCR probes in standard condi- plimers flanking nucleotide positions +956 to +1061 that tions. show as much as 30% divergence between RhIXb and RhXIII DNA Sequencing. Genomic and cDNA inserts were sub- (Fig. 1), a 136-bp product was amplified and detected in all cloned into pUC18 vectors and sequenced on both strands by DNA preparations with the Rh cDNA probe (Fig. 2, PCR A). the dideoxy chain-termination method (13). However, when the pXIII oligonucleotide probe specific for the RhXIII clone was used, only the 136-bp product from RhD-positive DNAs or from the RhXIII cDNA used as RESULTS AND DISCUSSION control was detected and no hybridization to the 136-bp Preliminary attempts to characterize the RhD mRNA were product amplified from RhD-negative samples occurred. carried out by an oligonucleotide approach using partial Thereafter, primers were chosen to amplify selectively a D sequence information on the NH2-terminal 54 amino acids of gene fragment in RhD-positive genomes. Accordingly, oli- the immunopurified RhD polypeptide published recently (8). gonucleotides A3 and A-XIII flanking nucleotides 1272-1420 Compared to the already cloned polypeptide presumably of the RhXIII cDNA were used as primers. Since the A-XIII produced by the RhCcEe gene (7-9), the proposed RhD amplimer is specific for the 3' noncoding sequence ofRhXIII, protein exhibits an identical NH2-terminal sequence up to an expected fragment of 186 bp could be amplified only in the residue 41 but diverged beyond this point (although still RhD-positive DNA samples and in the control RhXIII related). Different oligonucleotides deduced from the postu- cDNA, but no product was detected in RhD-negative samples lated RhD NH2-terminal sequence were used as primers in (Fig. 2, PCR B). PCR experiments. Surprisingly, when reticulocyte and bone These experiments demonstrated that specific RhXIII se- marrow RNA or leukocyte genomic DNA was used as quences are present in the RhD-positive DNA only, provid- templates no amplification product could be detected. ing proof that this mRNA is transcribed from the RhD gene. Isolation and Nucleotide Sequence of Putative RhD cDNA To add further evidence, sequencing of D and CcEe gene Clones. As the previously isolated RhIXb cDNA cross- fragments was performed. hybridized with the two homologous Rh genes (D and CcEe) Cloning and Sequencing of RhD Genomic Fragments. Al- (7, 9), we have characterized all of the clones detected in the though strongly related, the RhD and RhCcEe genes can be human bone marrow cDNA library with this probe. Two distinguished by restriction analysis of RhD-positive (two clones, RhVI and RhVIII, were clearly related to splicing genes) and RhD-negative (one gene) genomic DNAs (9). isoforms of the RhIXb mRNA transcribed from the CcEe Southern hybridization with probes specific for nucleo- gene (10). Two other clones, RhII and RhXIII, were ana- tides 486-635 and nucleotides 1227-1347 of the RhIXb lyzed. Sequence analysis (Fig. 1) indicates that these cDNAs cDNA, which are complementary to exon 4 and exon 10 of [1522 and 2789 base pairs (bp), respectively] differed only by the RhCcEe gene, respectively (B.C.-Z., C.L.V.K., V.R., the length of their 3' noncoding region (237 bp and 1504 bp, J.-P.C., and Y.C., unpublished data), each revealed two respectively). This is consistent with the detection of a faint fragments in the RhD-positive genomes but only one in the 3- to 3.5-kilobase (kb) band in addition to a major 1.5- to RhD-negative DNAs (Fig. 3). The 5.6- and 3.0-kb fragments 1.7-kb species on Northern blot of erythroblast RNAs (7). detected in RhD-positive and -negative DNAs with the exon Two AATAAA poly(A) signals and a class IV Alu sequence 4 (Fig. 3A) and exon 10 (Fig. 3B) probes correspond to the were identified in the RhXIII 3' untranslated sequence. The RhCcEe gene. The 3.0- and 5.2-kb fragments are D specific AATAAA sequence at positions 2747-2752 precedes the since they are revealed only in RhD-positive genomes. poly(A) tail found at the 3' end of the RhXIII cDNA, whereas Accordingly, four partial genomic libraries were con- the other AATAAA signal at positions 1462-1467 is used to structed. Libraries CE4 and D4 were enriched in the exon generate the RhII cDNA. These data indicate that the RhIl 4-specific CcEe and D genomic fragments, respectively, and RhXIII represent two mRNA isoforms generated by whereas libraries CE10 and D10 were selectively enriched in alternative poly(A) site choices. Examination of the 3' non- the exon 10 complementary fragments derived from the CcEE coding sequences of the RhII and RhXIII clones reveals that and D genes, respectively. Sequence analysis indicated that they are identical to the 3' untranslated region of the Rh the CE4 and CE10 genomic fragments contain sequences cDNAs previously published (7-10) until nucleotide 1358 but identical to those present in the RhIXb cDNA between surprisingly differ after this position. positions 486 and 801 (exons 4 and 5) and 1227 and 1488 (exon The coding sequence of the RhXIII and RhIXb cDNAs 10), whereas the D4 and D10 genomic fragments contain the exhibited 3.5% divergence, which results in a 8.4% diver- homologous but not identical exonic sequences found in the gence at the amino acid level (Fig. 1). Such polymorphism in RhXIII cDNA, including the entire 3' noncoding region (not the coding region and the difference in the 3' end sequence shown). Downloaded by guest on October 2, 2021 Medical Sciences: Le Van Kim et al. Proc. Natl. Acad. Sci. USA 89 (1992) 10927

cagagacggacacagg -1 _t Sor Sr Lys Tyr Pro Ar; Sar Vol Arg Arg Cys Lou Pro Lou Trp Ala Lou Thr Lou Clu Ala Al& Lou I11 0025 ATO MOC TCT AM TAC CCO COO TCT CTC COG COC TOC CTG CCC CTC TOO CCC CTA ACA CTC GAA cCA OCT CTC ATT 007S Lau Lou Ph. Tyr Ph. Phe Thr His Tyr Asp Ala Sar Lou Glu Asp lin Lys Cly Lou Vol Ala SOa Tyr Glin Val 0050 CTC CTC TTC TAT TTT TTT ACC CAC TAT GAC OCT TCC TTA GAG GAT CM AAM 0O CTC CTC ccA TCC TAT CIUGTI 0150 Gly Oin Asp Lou Thr Vol Not Ala Ala C1 ly Lou Gly P60 Lou Thr Soa r Ph0 Arg Arg His SOr Trp SOr 0075 OC CM GAT CTO CCaTc ATO 000 0CC ATT OC TTGO0 TTC CTC MCC TCC AG? TTC COO MA CAC MGC TOC AGC 022S Oar Val Ala Ph* Asn Lou Pha Net Lou Ala Lou Gly Vol Gln Trp Ala I1- Lou Lou Asp Gly Pha Lou SOr Gin 0100 MT Gct 0cC TTC AC CTC TTC ATC CTO OCO CTT OCT GTG CAM TGO CA ATC CTG CTG GAC 0CC TTC CTG AGC CAG 0300 Pha Pro hr Gly Lys Vol Vol 11- Thr Lou Pha Oar I1e Arg Lou Ala Thr Net Oar Ala "a Sor Vol Lou II1 0125 TIC CCT ICT 0OO AAM cTe OTC ATC acA CtG TtC MCT ATT COO CTG 0CC MCC ATG MGT OCT XTG TCG CTG CTG ATC 0375

oar ML A Ala Vol Lou 0ly Lys Vol Asn Lou Ala lin Lou Vo Vol Net Vol Lou Vol Glu Vol Thr Ala Lou O0S1 TCA GM CaT OCT CTC TTO 00 AMA ¢TC AAC TtG OCO CaM TTC GT¢ CTC ATO GTG CTG CTO GAO GTG ACA GCT TTA 0450 0ly A Lou Arg Net Vol 11- SOr Asn I1 Phe Asn Thr Asp Tyr His Net Asn ML ML His S.1 Tyr Vol Pha 0175 000 AAC CTG MG AGTC GTC ATC AMT AAT ATC TTC MAC ACA GAC TAC Cac ATC AMC &TG AZO CAC ATC TAC GTG TTC 052S

Ala Ala Tyr Pha 0ly Lou Ma Vol Ala Trp Cys Lou Pro Lys Pro Lou Pro CJ Gly Thr Glu Asp L" Asp Gin 0200 cA 0cC TAT TT 000 CTG ICT OTO WCC TOO TOC CTG CCA AMG CCT CTA CCC SAG GGA ACG GMG OAT MA GAT CAG 0600 Th Ala Tbr Ile Pro Sor Lou SOr Ala Net Lou Cly Ala Lou Ph. Lou Trp ,a Pha Trp Pro SOr beM Asn SOr 0225 AA ocA AC¢ ATA CCC AMT TTC TCT 0CC ATG CTC 0CC GCC CTC TTC TTG TOG ATO TTC TOO CCA AGT ZTC MC TCT 0675 &" Lou Lou Ar; Oar Pro I1 fl Arg Lys Asn Ala Y Pha Asn Thr Tyr Tyr Ala yaI Ala Val Sar Vol Val 02S0 0CT CTC CTG AMA AT CCA ATC G AAaaMc AaT 0CC GCG TTC AMC ACC TAC TAT OCT 2TA GCA GTC ACC GTG CTC 0750 Thr Ala Ile SOr 0ly SOr SOr Lou Ala His Pro Gin fiU Lys Ile SOa "a Thr Tyr Vol His SOr Ala Val Lou 0275 CA 0CC ATC TCA O00 TCA TCC TTG OCT Cac CCC Caa fGG AMG ATC AGC A&G ACT TAT GTG CAC AGT GCG GTG TTG 0325 Ala 0ly 0ly Vol Ala Vl 01ly Thr SOr Cys His Lou I1I Pro SOr Pro Trp Lou Ala Net Vol Lou Gly Lou Val 0300 OCA GMa Oc CTC OCT CTO GOT ACC TCC TOT Cac CTC ATC CCT TCT CCC TOO CTT CCC ATC CTC CTG GGT CTT GTG 0900 Ala ly Lou Ile SOr Mely0 ly Ala Lys I= Lou Pro O" Cys Cys Asn Arg Vol Lou 0ly Ile Zr His Mr 0325 OCT 006 CTO ATC TCC RTC 000 cGa 0CC AM TAC CTG CCC G00 TOT TOT AAC CCAT CTG GG ATT CLC Cac ArC 0975 Sar "a Net GU1U An Pho SOr Lou Lou Gly Lou Lou 0ly Glu 11. "ae Tyr Ie- Val Lou Lou Val Lou &a 0350 TCC ATC ATO gQC TAC AW TIC AMC TTG CTG GOT CTG CTT GGA GMG ATC AZC TAC ATT GTG CTG CTG GTG CTT OAT 1050

Thr Vol AUa 0ly Asn 0ly Net Ile Gly Pha Gln Vol Lou Lou SOr I1- 0ly Glu Lou Sor Lou Ala Ile Val 0375 ACC CTC M 200C GOC aaT GOC ATO ATT 0CC TTC CAM GTC CTC CTC AMC ATT 000 GMA CTC AGC TTG CCC ATC 0TG 1125 Ile Ala Lou Tbr Sotr ly Lou Lou Thr 0ly Lou Lou Lou Asn Lou Lys lie Trp Lys Ala Pro His 0Lm Ala Lys 0400 ATA OCT CTC ACc TCT GOT CTC CTC ACA GCT TOC CTC CTA MAT CTI MA ATA TOO AAA OCA CCT CAT GAG GCT MA 1200 Tyr Ph0 Asp Asp lin Vol Ph0 Trp Lys Ph, Pro His Lou Ala Vl 01ly Prh 0417 TAT ITT cAT GAC CAA OTT TTC TOO AMG TI? CCT CAT TTG OCT OTT GCA TI? taogcoaoag catccog^aa oaacaaggcc 1201 tgttcooo cacoct tcctotcoct gttgcctgco tttgtocgtg *o-oocgctc *tgacgcao agtetc4at gttcgcgoag g 1372 coctggogt caggo-oaat 0g0ottipat cctttctctg ccoctcttt%..Vag atct cacc ttt-t tatgcactgt ogaotacoc -- 1463 aotcc ogatqtoc cactaJt- cotctggtoo acoocogoct gcotatotga tOqtggOtct ccrgtaogct ;aggttaatt tat 1554 tattott cttgL LLLLLLLLLL LLLLLLLLLL t U t a g 1645 1736 SLOOL LULL"AL rOLagaonLaaCaO.S£aouLr.AAorLoGLaLoogLao&oogoo 1827 aL sCamaa m accc e b G LOtss =ttgttt; Ctttttto&C aatataocgt gtgctcatag oaactgc 1916 ttt gcatgaetg eootCatgtg cttcetog^a oottoatto; ottotoccoc tmaogtcttc agatttttat &cLttttLtt ttgaaacq 2009 ga ;tctcoctct gtcccggC tggqgtgcog tgccgoaotC teg99teCet gCoCCtCC; cotccoggt tCeaaCgatt ctcCtgccL 2100 e octccrgagt agct00ott, coo-tgagce ctccoccc c9ctaottt ttgcottttt acttgtcagg gttteaccat gttggctaqq 2191 ataotttcoc cggatetct tOgectootg *tc gcctgc CtLgOgOtcLC e ogt9Ct9 ggottocogg tgtgaocca cqtgcccaqc c 2782 tot-ettec ctttttgot accatttggt gttttgoega *tta-cacgt ttgtg-acgt ggco-tgctt qtgattcarq cttccattga ga 2373 ec-ooagg" a*ctggt tgcagogc a *cogeggoc gegtgtggc a9gttttoao tgCtCttctg oaggctgata cgacagctct ctq 2464 tgcoct; attgeatat; cotcccoogs ttatottatt ;ttttctoet ;OtotgtgtC *actttgcc ooocoaggat tgLqaootgo oto& 2555 gegtt ttccttaagg coccttcctt aoacagacoo ttOgteoao tgooctccot tgcttoooo aoaocataaac accatttagt cactq 2646 aacat ogctatotgt atggttgtto ctotgga0at cttgttttgc caotttcttt gooaottctq gcoaoccaag gttctttttg ttMaca 2737 toot actaaoaea otgaaca ogctaocaaa ct(An) 2773 FIG. 1. Nucleotide sequence of the RhXIII cDNA and predicted amino acid sequence of the human RhD protein. Numbering of nucleotides and amino acids starts at the ATG codon and initiating methionine, respectively. Nucleotides and amino acid positions that differ from the RhIXb cDNA and deduced protein sequences (7) are underlined. The bracket indicates the limit ofhomology with the 3' noncoding sequence of a splicing isoform of RhIXb (10). The poly(A) signals used in the RhIl and RhXIII cDNAs and the Alu sequence present in the 3' region of RhXIII are underlined. The vertical arrow (at nucleotide 1488) indicates the end of RhIl cDNA. Positions of the oligonucleotides used as amplimers or as probe in PCR experiments (see Fig. 2) are indicated by dashed lines; from 5' to 3': Al, pXIII, A2, A3, and A-XIII. These findings conclusively established that the RhIXb a protein of417 amino acids (Mr = 45,500) that exhibited 8.4% cDNA, as well as its splicing isoforms (10), derive from the divergence compared with the RhIXb-encoded polypeptide. RhCcEe gene and that the RhXIII mRNA is transcribed from Hydropathy plot analysis suggests that the two proteins are the RhD gene. No spliceoform of the RhD cDNA could be organized similarly, starting with an intracellular NH2 termi- identified, either by the screening of the cDNA libraries or nus (14) followed by 13 hydrophobic transmembrane domains after PCR amplification of various RNA preparations (not and terminating with a predicted exofacial COOH end (Fig. shown). Elucidation ofthe D and CcEe gene structures might 4; ref. 7). Among the 36 amino acid substitutions between the allow determination of whether polymorphisms in the in- RhIXb and the RhXIII proteins, only 2 are within the first 100 tronic splicing sequences could account for this difference in residues and only 1 is within the 60 last amino acids, the expression patterns of the two Rh genes. The strong indicating a high degree of conservation of the NH2- and homology between the CcEe and D cDNAs is in agreement COOH-terminal regions of the Rh proteins. Five of the six with Southern blot analysis indicating that the two genes cysteine residues of the RhIXb polypeptide are conserved at composing the RH locus of RhD-positive donors are related the same positions in RhXIII, particularly the unique exofa- and may be derived by duplication from a common ancestral cial cysteine at position 285 (Fig. 4). Another difference gene (9). As expected, the tissue-specific expression of the D includes a potential N-glycosylation site on Asn-331, which gene explored by Northern blot analysis was found to be may not be used since the RhD protein is unglycosylated and identical to that previously reported for the CcEe gene because this residue is predicted to be located within a [represented by the RhIXb cDNA, (7)1-i.e., restricted to hydrophobic domain (Fig. 4). tissues or cell lines expressing erythroid/megakaryocytic Which among the 36 amino acid substitutions are essential characters (not shown). for antigenic expression of RhD is still undetermined. Re- Amino Acid Analysis of the RhD Gene Product. The pre- markably, however, <10 of these substitutions occur on the dicted translation product of the RhD transcript (RhXIII) is predicted extracellular hydrophilic loops connecting the Downloaded by guest on October 2, 2021 10928 Medical Sciences: Le Van Kim et al. Proc. Natl. Acad Sci. USA 89 (1992)

DNA e I pecied te m plate PCR product hybridization a) -I-1. .- pXiY cDNA as aX ) .. _- ; U __ -1_ (2 p :36 bp -C- a= C= . L_ PCRA 36 -v D P or or ;__s .J row D CcEe -IF_IN.._ 1) pos + + Al pool A2 Al A2 . :.! :36 bp 40.4b a -.0 3,n- C -1 fo- :.:: CcEe + 1) -neg - A I A2

-~ a) Qc-- 2 - - 2 PC RB 86 bp :.;(2 (2; 1t (_) (2 .- o D -- I- CcEe prodic, 1 -~- Ol r or or o'Lr D-pos b.- -_ - + A3 AXIII A3 * * _| D -neg CcEe no Product A3 PhcDNAs FIG. 2. Amplification of RhXIII-specific sequences from RhD-positive genomes. (Left) Diagram of D-positive and D-negative RH loci based on Southern blot analysis (9). Amplimers (arrows) and oligoprobe pXIII positions on the D and CcEe genes are indicated as are the size and the hybridization pattern of the expected PCR products. (Right) DNA from donors with the indicated Rh phenotypes were used as templates in PCR experiments. PCR A: amplification between Al and A2 primers and hybridization with the Rh cDNA probe (RhIXb plus RhXIII) or with the oligoprobe pXIII specific for the RhXIII coding sequence. RhXIII and RhIXb cDNAs were used as control. PCR B: amplification between primers A3 and A-XIII (specific for the 3' noncoding sequence of RhXIII) and hybridization with the cDNA probe. See Fig. 1 for the exact locations of the oligoprobes and amplimers. transmembrane domains of the RhD protein or in close epitopes may represent overlapping and conformation- vicinity to the cell surface (Fig. 4), and it is postulated that dependent structures (17). Moreover, the lipid composition these positions might be critical for the antigenic properties of the membrane and possibly also the fatty acylation of the of the Cc/Ee and D proteins. These few amino acid substi- RhD protein itself may modulate the D antigenic expression tutions are spread along the whole molecule (except for the (18-20). Such variations are thought to affect the cell surface first and fifth external loops, Fig. 4) and it is of interest that exposure of antigenic motifs, particularly among those that other investigations carried out with a panel of human are specifically present on the RhD molecule. In addition, as monoclonal antibodies specific for the D antigen have sug- the Rh proteins most likely associate with other glycoproteins gested that at least eight RhD epitopes could be distinguished that are deficient in Rhn,1 cells to form a membrane complex (16). Whether there may be some correlation between the of relatively large size (5, 21), it cannot be excluded that amino acid exchanges and such RhD epitopes is purely interactions with other proteins of the Rh protein cluster (5) speculative, but the possibility that the monoclonal antibod- may also account for D antigen expression. ies may define discontinuous epitopes on the RhD protein is Recently, it was proposed that a 17-kDa chymotryptic open to analysis. However, the folding and assembly of the fragment radiolabeled from intact erythrocytes and contain- loops and transmembrane domains of the RhD protein is not ing the NH2-terminal domain of the RhD protein could form known, but if the Rh proteins are approximately globular in an immune complex with a polyclonal anti-D (22). shape with a diameter of 50-60 A, it is likely that the D That such fragment was radiolabeled is surprising since

X 0 w 0L u u U U U 10 U U u U U U U u U U U u U U U U U U Q 0 Vd v ~

0 ._ CE4 (0.6kb) - _ w f Sw 4W -- D10.I -. Va I n - D4 (3. I~kb) -- CElO t ~ qto

. Exon 4 Exon 1 0

FIG. 3. Identification of RhD and RhCcEe EcoRI genomic fragments. RhD-positive or RhD-negative DNAs were digested by EcoRI and hybridized on Southern blot with exon 4- and exon 10-specific probes ofthe RhCcEe gene. Fragments ofthe DccEE DNA identified as D specific (D4 and D10) or CcEe specific (CE4 and CE10) were subcloned. Downloaded by guest on October 2, 2021 Medical Sciences: Le Van Kim et al. Proc. Natl. Acad. Sci. USA 89 (1992) 10929

41 7

In

FIG. 4. Topology and protein sequence of the RhD gene product. Proposed transmembrane organization of the RhD protein based on hydropathy analysis (15) and on the intracellular orientation of the NH2 terminus of Rh proteins (14). Open symbols refer to identical positions within the CcEe protein (7) and black symbols refer to substitutions typical of the RhD protein: 60L -+I; 68N -. S; 103p -* S; 121M -b L; 127A + V; 128G -+ D; 152T -* N; 169L -- M; 170R -. M; 172F -* I; 182T -. S; 193K -. E; 198N -. K; 201R -. T; 218M -. I; 223V -+ F; 226p -+ A; 233Q -k E; 238M -+ V; 245L -+ V; 263R -. G; 267M -. K; 306I - V; 311C -* Y; 314V -+ G; 323H -, P; 3251 -I S; 327V -) I; 329H -* G; 330S - Y; 3311 N; 342T -* I; 350H -- D; 353W -. G; 354N -- A; 3"V -, E. previous reports indicated that the 1251 label incorporated into 3. Nash, R. & Shojania, A. M. (1987) Am. J. Hematol. 24, tyrosine(s) by cell surface radioiodination was associated 267-275. with a single chymotryptic peptide following two-dimen- 4. Race, R. R. & Sanger, R. (1975) Blood Groups in Man (Black- (23) and was located within the well Scientific, Oxford), 6th ed. sional peptide map analysis 5. Agre, P. & Cartron, J. P. (1991) Blood 78, 551-563. COOH-terminal region of the RhD protein, as it was rapidly 6. Issitt, P. (1989) Transf. Med. Rev. 3, 1-12. removed by carboxypeptidase Y digestion (14, 24). It cannot 7. Cherif-Zahar, B., Bloy, C., Le Van Kim, C., Blanchard, D., be excluded, however, that the 17-kDa fragment is weakly Bailly, P., Hermand, P., Salmon, C., Cartron, J. P. & Colin, Y. labeled under the conditions used, but further investigations (1990) Proc. Natl. Acad. Sci. USA 87, 6243-6247. are required to clarify this point. Since the number of amino 8. Avent, N. D., Ridgwell, K., Tanner, M. J. A. & Anstee, D. J. acid residues specific for the RhD protein that are exposed (1990) Biochem. J. 271, 821-825. and available to antibodies is extremely low at the NH2 9. Colin, Y., Cherif-Zahar, B., Le Van Kim, C., Raynal, V., Van terminus (Fig. 4), only one of the RhD epitopes might be Huffel, V. & Cartron, J. P. (1991) Blood 78, 2747-2752. 10. Le Van Kim, C., Cherif-Zahar, B., Raynal, V., Mouro, I., detected on the 17-kDa fragment. Lopez, M., Cartron, J. P. & Colin, Y. (1992) Blood 80, 1074- From the above considerations on the complexity of the 1078. RhD epitopes, it is not surprising that preliminary attempts to 11. Tippett, P. (1986) Ann. Hum. Genet. 50, 241-247. express the recombinant Rh proteins (D and non-D) alone or 12. Southern, E. (1975) J. Mol. Biol. 98, 503-517. together in eukaryotic cells of erythroid or nonerythroid 13. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. origin have been unsuccessful to date (unpublished data). Acad. Sci. USA 74, 5463-5467. Determination of whether this is due to the incorrect trans- 14. Bloy, C., Hermand, P., Blanchard, D., Cherif-Zahar, B., port or folding of these proteins to the cell surface awaits Gossens, D. & Cartron, J. P. (1990) J. Biol. Chem. 265, further investigation. However, the characterization of var- 21482-21487. sequences encoded by the 15. Engelman, D., Steitz, T. & Goldman, A. (1986) Annu. Rev. ious mRNAs and deduced protein Biophys. Chem. 15, 321-353. two genes at the RH locus is a critical step toward the analysis 16. Tippett, P. (1990) J. Immunogenet. 17, 247-257. of the Rh polymorphisms and of the molecular defects 17. Lomas, C., Tippett, P., Thompson, K. M., Melamed, M. D. & responsible for the Rh-deficiency syndrome. Hughes-Jones, N. C. (1986) Vox Sang. 57, 261-264. 18. Shinitzky, M. & Souroujon, M. (1979) Proc. Natl. Acad. Sci. We thank F. Piller for critical reading. Computer calculations were USA 76, 4438-4440. performed at the CITI2 (Paris). Support from the Institut National de 19. Hughes-Jones, N. C., Gorick, B. D. & Brown, D. (1991) Im- la Sante et de la Recherche Medicale, the Caisse Nationale d'As- munol. Lett. 27, 101-104. surances Maladie des Travailleurs Salaries, and the North Atlantic 20. de Vetten, M. P. & Agre, P. (1988) J. Biol. Chem. 263, Treaty Organization (Grant 0556/88) is gratefully acknowledged. 18193-181%. 21. Hartel-Schenk, S. & Agre, P. (1992) J. Biol. Chem. 267, 1. Mollison, P. L., Engelfriet, C. P. & Contreras, M. (1987) Blood 5569-5574. Transfusion in Clinical Medicine, (Blackwell Scientific, Ox- 22. Suyama, K. & Goldstein, J. (1992) Blood 79, 808-812. ford), 8th ed. 23. Blanchard, D., Bloy, C., Hermand, P., Cartron, J. P., Saboori, 2. Petz, L. D. & Garratty, G. (1980) Acquired Immune Hemolytic A. M., Smith, B. & Agre, P. (1988) Blood 72, 1424-1427. Anemias, (Churchill Livingstone, New York). 24. Krhamer, M. & Prohaska, R. (1987) FEBS Lett. 226, 105-108. Downloaded by guest on October 2, 2021