<<

Divergent and Evolution by the Birth-and-Death Process in the Immunoglobulin VH Family

Tatsuya Ota and Masatoshi Nei Department of Biology and Institute of Molecular Evolutionary Genetics, The Pennsylvania State University

Immunoglobulin diversity is generated primarily by the heavy- and light-chain variable-region gene families. To understand the pattern of long-term evolution of the heavy-chain variable-region (V,) gene family, which is composed of a large number of member , the evolutionary relationships of representative Vn genes from diverse organisms of vertebrates were studied by constructing a . This tree indicates that the vertebrate Vn genes can be classified into group A, B, C, D, and E genes. All Vn genes from cartilaginous fishes such as sharks and skates form a monophyletic group and belong to group E, whereas group D consists of bony- fish Vn genes. By contrast, group C includes not only some fish genes but also amphibian, reptile, bird, and mammalian genes. Group A and B genes were composed of the genes from mammals and amphibians. The phylogenetic analysis also suggests that mammalian Vu genes are classified into three clusters-i.e., mammalian clans I, II, and III-and that these clans have coexisted in the genome for >400 Myr. To study the short-term evolution of Vn genes, the phylogenetic analysis of human group A (clan I) and C (clan III) genes was also conducted. The results obtained show that Vn have evolved much faster than functional genes and that they have branched off from various functional Vu genes. There is little indication that the Vu gene family has been subject to concerted evolution that homogenizes member genes. These observations indicate that the Vn genes are subject to divergent evolution due to diversifying selection and evolution by the birth-and-death process caused by and dysfunctioning . Thus, the evolutionary pattern of this monofunctional multigene family is quite different from that of such gene families as the ribosomal RNA and histone gene families.

Introduction Monofunctional multigene families, where all Concerted evolution was also invoked to explain member genes have the same function, are generally be- the evolution of the immunoglobulin heavy-chain vari- lieved to undergo coincidental or concerted evolution, able-region (Vu) genes (Hood et al. 1975; Ohta 1980, which homogenizes the DNA sequences of the member 1983). Gojobori and Nei ( 1984), however, showed that genes (Smith et al. 197 1; Smith 1974; Zimmer et al. Vu gene (often called “gene segment”) family in the 1980). Good examples of concerted evolution are seen mouse and human undergoes a very slow rate of con- with ribosomal RNA (rRNA) genes (e.g., see Arnheim certed evolution, if any; that the extent of sequence di- 1983) and histone genes (e.g., see Matsuo and Yamazaki vergence between member genes in the mouse or the 1989)) where all the member genes have virtually iden- human is substantial; and that some of these genes di- tical coding sequences within species. This is under- verged much earlier than the mammalian radiation standable, because these gene families produce a large ( N 80 Mya) (also see Tutter and Riblet 1989a). quantity of gene products of the same function that are During the past 10 years, the DNA sequences of essential for the survival of an organism. Concerted evo- Vu genes have been determined for many lower verte- lution is caused either by unequal crossover or intergenic brate organisms (see Litman et al. 1993 ) . We can there- fore study the pattern of evolution of the Vu gene family gene conversion. on a much longer evolutionary time scale. This type of study is particularly interesting, because the gene orga- Key words: birth-and-death process, divergent evolution, im- nization of Vu genes is now known to vary extensively munoglobulin, multigene family, VH genes. among different classes of vertebrates (Du Pasquier Address for correspondence and reprints: Masatoshi Nei, Institute 1993). In this paper, we show that the pattern of Vu of Molecular Evolutionary Genetics, The Pennsylvania State University, gene evolution is very different from the typical con- 328 Mueller Laboratory, University Park, Pennsylvania 16802. certed evolution and that two evolutionary processes, Mol. Biol. Evol. 11(3):469-482. 1994. 0 1994 by The University of Chicago. All rights reserved. which may be called “divergent evolution” and “evo- 0737-4038/94/l 103~0014$02.00 lution by the birth-and-death process,” predominate in

469 470 Ota and Nei

Ancestral form

VDJ C

Cartilaginous fishes Higher vertebrates

I IV -IHHHHW .: ,: B V DDJ C VDDJ C VDDJ C Vn Dn Jn Cn

Horned shark Mammals

V DDJ C VDDJ C VDDJ C vVn V Dn J Cn Little skate Chicken FIG.1.-Genomic organizations of immunoglobulin heavy-chain genes. Cartilaginous fishes have the ( VH-DH-Dn-JH-CH)ntype organization including germ-line-joined genes, whereas most higher vertebrates, including bony fishes and , have the (Vn),-( Dn)“-( Jr_,)“-(Cn), type gene organization. Chicken has a single functional variable gene. V = variable-segment gene; D = diversity-segment gene; J = joining-segment gene; C = constant-segment gene; and w = . For more detailed information, see the work of Litman et al. ( 1993).

the evolution of Vn genes in vertebrates, despite the fact The VH (as well as V,) genes can be subdivided that these genes have the same function. We first study further into two hypervariable or complementarity-de- the long-term evolution of VH genes, examining the termining regions (CDR 1 and CDR2 ) and three frame- evolutionary relationships of the genes obtained from work regions (FWl, FW2, and FW3). (CDR3 is deter- various classes of vertebrates, and then a short-term mined primarily by the Dn and Jn genes, so it was evolution of human Vn genes. excluded in this paper). The human and mouse genomes contain > 100 VH genes and a few to a dozen Dn and Material and Methods Jn genes (see fig. 1). Each of the Vn’s is recombined Background Information with one of the Du’s, one of the Jn’s, and one of the Immunoglobulin or antibody molecules consist of Cn’s (constant region genes), to form a heavy-chain gene two heavy chains and two light chains. Both the heavy (Tonegawa 1983). This process therefore generates a and light chains are composed of the variable region and large number of different immunoglobulins that are the constant region (C). The variable region is respon- necessary for defending an individual from many dif- sible for antigen-binding, whereas the constant region is ferent foreign antigens. Essentially the same humoral responsible for effector function. The heavy-chain vari- immune system exists for most of the bony fishes and able region is encoded by the variable-segment ( Vn ) , tetrapods. diversity-segment (Dn), and joining-segment (Jn) genes, In lower vertebrates such as sharks and skates, whereas the light-chain variable region is encoded by the however, the genomic organization of the immunoglob- variable-segment (V,) and joining-segment ( JL) genes. ulin genes is different from that of higher vertebrates, However, the major portion of the variable region is which may be described as (V,-D,-J,-C,) . In lower ver- determined by either Vn or VL. The Vn and Vi_ genes tebrates, the basic unit of repeat of genes is (V-D-D-J- are evolutionarily homologous with each other, but their C) (fig. 1); that is, the linked gene group Vn-Dn-Dn-Ju- sequence similarity is quite low. Cu is repeated several hundred times in the genome, Evolution of Immunoglobulin VH Gene Family 47 1 and the repeat seems to be scattered on different chro- from each species and to cover a wide range of organisms. mosomes (Litman et al. 1993). Therefore, the immu- We have therefore included (a) almost all sequences noglobulin genes in these organisms may be represented from a species where a small number of genes have been by (V-D-D-J-C),, . studied but (b) a relatively few representative sequences Of course, there are many exceptions to this rule. from a species where a large number of genes have been Some units of shark genes have the Vu and Du genes studied. For the study of long-term evolution, no pseu- that are already united (VDD-J-C) in the germ-line ge- dogenes were included. In both human and mouse, >60 nome, whereas in some other units all VH , Du’s, and JH functional germ-line genes have been sequenced, but are united (VDDJ-C) (Litman et al. 1993). In the little Schroeder et al. ( 1990) showed that they can be classified skate (Raja erinacea), (VD-DJ-C) are also present. into three major clusters, i.e., clans I-III (also see Kirk- Furthermore, in the coelacanth (Latimeria chalumnae), ham et al. 1992). Our preliminary analysis of -600 which belongs to the lobe-finned fishes, (V-D) is appar- mammalian VH genes, which were obtained from ently repeated many times in the 5’ side of the J and C GenBank and Kabatnuc databases, has also suggested genes ( Amemiya et al. 1993). Actually, the number of that these genes can be grouped into the three major lower vertebrates studied so far is very small, so it is clusters (data not shown). We have therefore used 11 likely that some new types of genomic organization of and 7 representative sequences from the mouse and the Vu genes will be discovered in the near future. human, respectively, covering the three major clusters, Another group of vertebrates that has a deviant ge- i.e., clans I-III. More than 30 different Vn sequences nomic organization of the VH genes is that of avian spe- are also available from the African toad Xenopus laevis. cies represented by chicken (for review, see McCormack We chose 11 representative sequences from them. For et al. 199 1) . In these species, there are only one func- other organisms, we used almost all available genes after tional Vn gene and - 80 Vu pseudogenes in the genome exclusion of closely related ones-except for chicken, (Reynaud et al. 1989), and the antibody diversity is where we used only one gene, which is used exclusively generated by gene conversion of the functional gene by for antibody generation. For the purpose of rooting the the pseudogenes (fig. 1) . (The same process operates for Vu gene tree, we included a horned shark light-chain Vi_ the light-chain genes as well; Reynaud et al. 1987; gene and a human preB gene, which is known to be Thompson and Neiman 1987.) However, this system homologous to vertebrate lambda VL genes (Kudo and may not apply for all avian species, because in the Mus- Melchers 1987). covy duck there seem to be at least two functional VL The total number of Vu and outgroup genes used genes (McCormack et al. 1989). It is interesting that a for our study of long-term evolution was 57, and they somewhat similar pattern has been observed in the rab- are listed in table 1. We used the germ-line sequences bit. In this organism, the D-proximal Vn gene is pref- whenever possible, to avoid the effect of somatic mu- erentially used, though there are b 100 VH genes and tation ( Tonegawa 1983 ) . In a few species, however, we many of them seem to be functional (Knight 1992). used cDNA, because germ-line genes were not available The immunoglobulin diversity in this organism seems and sequence divergences were so large that the effect to be generated by gene conversion and somatic muta- of somatic mutation was negligible. The cDNA se- tion. quences are denoted by asterisks in table 1. Since we In the primitive jawless vertebrates such as lampreys used many genes from many different organisms, we and hagfishes, an extensive search for immunoglobulin designated each gene by the first letter of both the genus genes has been conducted, but no genes that are clearly name and the species name plus the gene designation homologous to the immunoglobulin genes in cartilagi- used by the original authors, except in the African toad nous fishes or higher vertebrates have been identified, (X . Zaevis), where we used “Xe” instead of “Xl” because though antibodies are clearly present in these organisms “1” (el) is easily confused with “ 1” (one). (Litman et al. 1993). Therefore, a completely different For the study of “short-term” evolution, we in- immune system may exist in these species. cluded pseudogenes as well as functional genes and used 10 VH genes belonging to the clan I genes and 28 genes Nucleotide Sequences Used belonging to the clan III genes. (Clan II genes were not At the present time, there are >600 Vu gene se- used, because there was only one pseudogene.) All of quences that have been determined for the various these data were taken from recent exhaustive studies of groups of organisms mentioned above. This number in- human Vu genes by Shin et al. ( 199 1) and Matsuda et cludes both germ-line and cDNA sequences, and the al. ( 1993). These Vu genes are scattered throughout a majority of them come from the human and mouse 0.8-Mb region of the human Vn gene cluster of chro- (Kabat et al. 199 1) . For studying the long-term evolution mosome 14 and are intermingled with different Vu clan of Vu genes, it is necessary to use representative genes genes. 472 Ota and Nei

Table 1 VH Genes Used for Phylogenetic Analysis in Figure 4

V Gene(s)” (references)

Heavy chains: Cartilaginous fishes: Sharks: Nurse shark (Ginglymostoma cirratum) . . . GcA4-2* (Vazquez et al. 1992) Horned shark (Heterodontus franc&i) ...... HfFlOl, Hf1207, Hfl315, Hf1320, Hf1403, Hf2806, Ht2807, Hf2809, Hf3083 (Kokubu et al. 1988) Skate: Little skate (Raja erinacea) ...... Re102, Re107, RellO (Harding et al. 1990b) Bony fishes: Ray-finned fishes: Goldfish (Carassius auratus) . . . . CaVH99A (Wilson et al. 199 1) Ladyfish (Elops saurus) ...... Es14501*, Es14515* (Amemiya and Litman 1990) Channel catfish (Zctalurus punctatus) . IpNG lo*, IpNG22*, IpNG54*, IpNG64*, IpNG66* (Ghaffari and Lobb 199 1); Ip10.4* (Warr et al. 199 1) Rainbow trout (Oncorhynchus mykiss) ...... 0mRTVH431 (Matsunaga et al. 1990) Lobe-finned fishes: Coelacanth (Latimeria chalumnae) Lc1501 la (Amemiya et al. 1993) Amphibians: African toad (Xenopus laevis) . . . . . XeVH 1 (Yamasaki-Kataoka and Honjo 1987); XeLL2.8, XeLL3.4 (Schwager et al. 1989); Xe4, Xe5, Xe6, Xe7, Xe9, XelO, Xel l.lb (Haire et al. 1991) Reptiles: Caiman (Caiman crocodylus) . . Ccc3 (Litman et al. 1983) Birds: Chicken (Gallus domesticus) . . . GdVHl (Reynaud et al. 1989) Mammals: Human (Homo sapiens) . . . HsVH251 (Humphries et al. 1988); HsVH26 (Mathyssens and Rabbitts 1980); HsVH6 (Schroeder et al. 1988); HsV 1.4.l.b, HsV2.5 (Shin et al. 1991); HsV35 (Matsuda et al. 1988); Hsl2G-1 (Lee et al. 1987) Mouse (Mus musculus) . . MmA/J (Gefter et al. 1984); MmH30 (Schiff et al. 1985); MmH4a-3 (Schiff et al. 1986); MmME3 (Kasturi et al. 1990); MmPJ 14 (Sakano et al. 1980); MmVH283 (0110 et al. 1983); MmVH441 (0110 et al. 1981); MmVl 1 (Siu et al. 1987); Mm1.8, Mm43X (Rathbun et al. 1988); Mm186-2 (Bothwell et al. 1981) Rabbit (Oryctolagus cuniculus) . . . . Ocp26.9.p2 (Currier et al. 1988) Light chains (outgroups): Horned shark (Heterodontus francisci) Hf122 (Shamblott and Litman 1989) Human (Homo sapiens) ...... HsVPB6 (Bauer et al. 1988)

’ An asterisk (*) indicates a cDNA sequence.

Phylogenetic Analysis (see fig. 2), so they were excluded from the analysis. In In the study of the evolutionary relationships of Vn the present analysis even the FW regions showed some genes for the entire group of vertebrates, we used amino indels. All sites showing indels were excluded from the acid sequences rather than nucleotide sequences, because analysis. Therefore, the total number of amino acid sites the Vn genes are highly divergent and the nucleotide used for phylogenetic analysis was 64. Three sequences, differences at certain positions apparently have reached one from each of the horned shark ( Hf 1113 ) , little skate the saturation level. All amino acid sequences were ( Re20 * ) , and African toad ( Xe8 ) , had several indels in aligned by using the program CLUSTAL V (Higgins et FW2 and FW3, and it was difficult to align them with al. 1992 ) , with some modification by visual inspection. other sequences ( fig. 2 ) . Therefore, they were not used As mentioned earlier, a Vu gene consists of CDRs for the present analysis. In the study of a “short-term” and FWs. However, CDRs are highly variable and con- evolution of Vn genes, we used nucleotide sequences, tain many insertions/deletions (indels) when the Vn since the extent of sequence divergence was relatively and VL sequences from diverse organisms are included small in this case. Because we were interested in the Evolution of Immunoglobulin Vu Gene Family 473

Heavy chain FRl CDRl FR2 CDR2 FR3

I I I I HsV35 QVQLVQSGAEV-KKPGASVKVSCXA-SGYTFTGYYM-IRINPNSGGTNYAQKFQGRVTSTRDTSISTAYMELSRLRSDDTVWYCAR MmH30 K.R....-...I.Y...S..Y...N...KDKA.L.A.K.S.....Q..S.T.E.SA...... HsV12G-1 I.Y.Y-Y..S.Y.NPSLKS MSV...KNQFSLK..SVTAV..A...... m/J E . ..QE..PSL-V..SQTLSLT.SV-T.DSI.S-DYWN.I.~~::~~::~..Y.S-Y..S.Y.NPSLKS:~~I . .. ..KNQY.LQ.NSVT.E..AT..... HsVH26 E .LE..GGL-VQ..G.LRL..A.-..F..SS.A.-S...... K.-.. .VSA.SGSG.S.Y.GDSVK..F.IS..N.KN.L.LQMNS..AE..A.....K MmVH283 E:id..E..GGL-V...G.L.L..A.- ..F..SS.T.-S....T.EKR-.. .VAT.SSGG.N.Y.PDSVK..F.IS..NAKNNL.LQM.S...E..AL..... CaVH99A .-S.TS.DSV.-....E..TL..T.- ..FSMSS.W.-G.I..K..K-...I.Q.SGST--~...SL..QFSI.T...~HL.~S.KTE..A...... HfFlOl D.V.T.PE..T-RS..C.LRLT..T-..FN.GSHT.-.....V....-.. .LVYY-YI.SNIH..PGVKD.F.ASK..LNNIFALDMKN.KTE.NAI.F... XelO DPA.T.EESQLE..HND.F.LT.RG- ..FD.SQ.G.-N.I..T..K.-.. .L.L.WYDASK.V..KSVE..LVI..NNAEQVTF...KN.VYQ..A....T. Xe8 EIL.S.KAS.I-ANV.T.ILLQ.EV-....NI~HH.-...... P.TI..~YR--.DT.YISES.KE...PS--..G...QLRI.K.S.S..AT...T. Hf1113 DIV.T..ASVT-....E.H.L..SV-..FSLDS..V-...K....K.-...LLAYRKP.DTIY..PGV...IIPS.AS-ST.Y-I.IKS...E..AM..... Re20' A.V.N.KPT.A-A.S.E.L.LT.VT-..FSLSSSNV-...K.V..K.-.. .V-A.MWYDDDKD..PA.S..F.VS..S-SNVY-LQMTN...AT....A

Light chain HsVPB6 .PV.H.PP.MS-SAL.TTIRLT.TLRNDHDIGV.SV-Y.YQ.R..HP-PRFLL.YF-SQSDKSQGPQVPP.FSGSK.VA~G.LSI.E.QPE.~....M Hr122 VPV.N.TPISDPVSA.ETSELK.AMQN.-KMGS.. .-SiY..R..EA-PV.~HS-T..SIYRGTG.TD.FKPS....SNSHILTIGS.EPG.SA.....A

+++++++++ +++++++++++ +++++++++ +++++ ++++++++++++++++++++++++++++++

FIG. 2.-Amino acid sequences of the variable regions of immunoglobulins from diverse organisms. They are representative sequences from five major groups of Vn segments (A, B, C, D, and E) in fig. 4 and three divergent sequences (Hfll13 [Hinds-Freyet al. 19931; Re20* [Harding et al. 1990~1; and Xe8 [Haire et al. 19911) that contain additional indels in FW2 and FW3. Two VL sequences, which were used as outgroups, are also included. A dash (-) indicates an indel, and the 64 sites marked with a plus sign ( +) were used for phylogenetic construction in fig. 4. CDRl and CDR2 were not used because their sequence divergence was extensive and the sequence alignment was unreliable. dynamics of the evolution of Vu genes in this study, we This is reasonable, because the genomic organization of included both functional genes and pseudogenes. There the genes from cartilaginous fishes is different from that was no difficulty in the alignment of the sequences, ex- of other vertebrates and because cartilaginous fishes sep- cept in the case of a few pseudogenes. These few pseu- arated from the other vertebrates -450 Mya. The second dogenes were excluded from the analysis. split of VH genes in the rest of vertebrates occurs between The phylogenetic analysis of both amino acid and bony fishes (group D) and tetrapods, though group C nucleotide sequence data was conducted by using the includes some bony (ray-finned) -fish and coelacanth minimum-evolution method ( Rzhetsky and Nei 1992). (lobe-finned fish) genes. However, the confidence prob- This method has a solid theoretical basis and is known ability (CP) ( 1 - type I error) of group D genes is rel- to be efficient in obtaining correct phylogenetic trees atively low (63%). This is caused mainly by the fact that (Rzhetsky and Nei 1993). The evolutionary distances the genes CaVH99A, IpNG64 *, and Es 145 15 * in this used were the Poisson-correction distance (see Nei 1987, group are relatively closely related to the genes XeVH 1, p. 4 1 ), for amino acid sequence data, and Jukes and Es1450 1 *, and 0mRTVH43 1 of group C (fig. 3). This Cantor’s ( 1969) distance, for nucleotide sequence data. somewhat anomalous relationship of genes may be due The reliability of the tree obtained was tested by com- to either stochastic errors of amino acid substitution or puting the standard error of each interior branch and some intergenic recombination or gene conversion conducting a t-test (Rzhetsky and Nei 1992). events that occurred a long time ago. The next level of splitting is between group A genes and group (B+C) Results genes. The group A gene cluster also has a relatively low Evolutionary Relationships of Vn Genes from Diverse CP value. This low CP value (64%) is mainly due to the Organisms of Vertebrates fact that the gene Xe 10 is similar to some group C and The Poisson-correction distances for all pairs of 55 D genes (see fig. 3 ) . If we exclude this gene, the cluster Vu and two outgroup sequences are presented in figure of group A genes shows a CP value of 96%. The final 3, whereas the phylogenetic tree obtained is given in clusters of group B and C genes show a relatively high figure 4. Figure 4 shows that the Vu genes in vertebrates CP value. can be classified into five groups, i.e., groups A, B, C, Within each group of Vu genes, there are several D, and E. Group A includes all mammalian clan I genes clusters that are highly significant. In group E, for ex- and a few Xenopus genes, whereas group B consists of ample, the skate genes are substantially different from all mammalian clan II genes and some Xenopus genes. the shark genes and form a cluster with a 99% proba- Group C includes mammalian clan III genes and Vn bility. Apparently, this cluster separated from the shark genes from chicken, caiman (reptile), Xenopus, coel- cluster in an early stage of evolution, and, if more data acanth, and some bony fishes. Group D consists of bony- accumulate and the cluster is still supported, we may fish VH genes only, whereas all group E genes are from have to separate it from the shark group. Note that sharks cartilaginous fishes. and skates diverged -2OOMya(Caroll1988, p. 62-83). Figure 4 shows that the first evolutionary split of Significant clusters (CP>95%) are also observed in all Vu genes occurred between group E genes and others. other groups. An interesting one is the mouse Vu cluster Gene 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

1.H.svH251 2 HsVlA.lb 30 3. Hsv35 35 28 4. MmH30 44 39 37 5. Mml.8 54 49 47 11 6. IUm43X 49 47 44 11 16 (A) 7. h4mVHl862 49 44 42 9 16 16 8. MmH4a-3 49 39 44 18 26 22 22 9. xes 49 44 52 60 69 63 63 63 lO.Xell.lb 44 42 52 60 69 69 60 54 33 ll.Xe9 54 54 72 75 79 69 69 60 57 54 12. xc10 90 79 75 90 98 98 86 94 79 75 75 13. HsVH6 72 66 79 69 75 69 66 54 82 66 82 821 14. MmAiJ 79 72 79 75 82 79 75 57 82 72 98 94 28 1s. H8Vl2G-1 63 63 72 69 75 79 66 60 79 69 90 82 24 37 16. MmPJl4 60 66 79 82 86 82 86 75 63 66 82 79 47 52 42 17. Hsv2.5 86 90 94 102 111 111 102 86 82 66 102 82 47 49 44 44 (B) 18. XeLL28 63 82 82 82.90 90 79 79 66 63 79 82 52 54 42 44 39 19. xeLL3.4 72 75 79 90 98 98 79 82 75 69 86 86 57 72 60 69 66 54 20. xe4 72 63 69 69 75 75 63 66 72 60 72 98 66 57 63 90 63 54 60 21.X& 69 66 63 72 82 79 66 75 75 69 106 90 69 57 60 79 66 69 63 26 22. xe7 86 86 98 106121 116102 98 94 98 116 106 79 75 69 94 69 72 75 82 72 23. HsVH26 49 44 60 63 66 66 63 54 69 54 69 86 152 57 49 44 63 52 69 66 66 86 24. MmVH283 54 54 66 69 79 72 66 57 75 60 72 82 63 63 60 57 63 57 75 69 60 75 -8-3 2.5.htUlVH441 57 54 66 63 69 72 60 52 69 66 72 86 66 63 57 57 72 54 75 66 60 86 ;; 22 26. Gcp26.9p2 52 52 72 66 63 63 63 57 66 57 66 86 60 66 52 44 63 54 82 72 72 90 22 35 28 27.hhVll 54 54 69 69 72 72 69 72 75 66 69 90 66 66 63 47 60 54 75 72 72 94 20 30 30 30 28.MmME3 57 54 69 72 75 69 72 63 75 63 72 98 63 72 66 52 69 63 79 69 72 94 22 26 35 35 28 (0 29.GdVI-n 69 69 75 86 94 90 82 79 79 69 75 94 72 72 69 60 69 66 82 79 69 90 37 49 52 42 42 39 5 30. CCC3 49 54 66 69 79 79 69 63 66 52 75 98 69 69 66 66 69 66 82 75 79 98 P 28 33 39 39 37 47 52 3l.LclSOlla 49 57 66 79 79 82 79 75 64 60 66 90 82 79 75 60 79 60 79 72 79 90 30 39 47 49 37 39 47 35 32. XeVHl 52 52 66 79 86 79 75 72 63 52 54 75 75 82 82 69 82 69 72 69 79 98 35 44 47 47 49 52 63 42 42 33.Es14501* 52 52 63 79 86 79 79 72 57 54 63 63 52 57 52 52 52 47 66 57 72 69 28 37 44 47 42 39 52 39 33 35 34. GmRw31 52 54 72 75 82 75 75 19 60 49 63 66 66 69 66 57 66 52 66 57 75 94 31 49 54 54 52 47 60 47 37 33 16 35. IpNG54* 72 66 79 90 98 90 90 86 69 54 75 75160 66 60 57 51 54 66 66 12 82 44 47 54 63 60 44 63 54 52 47 30 28 36. CaVH99A 60 57 69 79 86 75 79 63 60 57 63 72172 72 69 75 69 12 75 66 86 lo; 60 66 66 72 69 75 98 63 63 57 47 42 57 37. IpNG64 ’ 63 60 72 79 19 82 72 60 69 57 69 75 79 79 75 75 19 69 69 63 79 111 60 69 63 72 72 75 98 66 60 57 52 44 66 13 38.E914515* 52 54 72 63 66 66 60 63 66 66 63 86 69 79 66 60 69 66 60 63 75 lo; 54 63 60 60 54 63 75 63 60 66 57 52 63 47 54 @I 39. IpNGlO ’ 79 75 98 86 94 94 86 82 82 75 75 82 94 102 82 72 79 82 86 106111 121 75 86 19 82 79 86 86 82 82 72 72 63 69 57 63 35 40. IpNG22 ’ 86 82 98 94 98 94 86 82 94 98 79 86 86 90 82 72 90 90 82 90 82 111 72 82 66 75 75 94 94 90 94 79 79 15 82 66 66 35 35 41. IpNG66* 72 72 90 98 106 94 94 90 82 72 82 94 82 90 19 75 69 82 69 86 86 1oC 79 90 82 94 82 90 90 82 82 72 69 60 57 60 72 30 35 47 42. IplO.4. 86 75 86 94 98 102 86 82 79 75 69 94 I82 82 75 75 69 82 69 82 94 lit 75 90 79 82 75 82 86 94 82 82 75 75 63 69 75 37 42 44 39 43. HfFlOl 19 72 86 94 90 90 86 86 79 79 79 90182 106 86 79 82 82 69 94 94 111 69 79 82 72 69 82 90 75 75 60 66 69 82 79 72 79 82 75 90 75 44.Hf2809 72 66 79 90 90 86 86 82 75 72 79 86 75 94 79 75 72 79 66 90 94 101 63 69 79 69 63 75 90 72 72 54 60 66 75 69 75 69 69 72 79 69 11 45.Hf2807 69 63 75 86 90 90 86 75 72 69 79 86 69 94 72 69 75 79 69 90 94 101 57 69 69 69 60 69 86 69 66 54 57 63 12 69 69 72 75 82 82 75 16 11 46. Hfl320 72 66 79 90 98 94 86 82 75 72 72 79 69 90 72 75 69 72 63 82 90 98 63 72 79 75 66 75 90 72 72 54 54 60 69 66 69 66 64 72 75 69 15 9 13 47. r-m806 79 69 86 98 106102 94 90 75 72 72 82 72 90 75 79 69 75 12 86 98 98 66 72 82 79 66 79 94 72 75 54 57 63 72 66 72 66 69 72 72 72 18 11 16 9 48. Hfl403 66 63 19 86 94 90 82 79 72 69 79 82 66 86 69 66 72 69 60 79 82 98 57 66 75 69 63 72 90 63 60 52 54 60 66 66 69 66 75 79 79 75 16 11 13 11 15 03 49. Hfl2u7 63 57 72 79 86 82 75 72 66 63 75 82 63 82 66 63 69 69 57 75 19 94 52 60 69 66 60 63 86 63 66 54 49 54 63 6063637275757213 8 9 8 11 6 50.Hf1315 66 60 72 79 86 82 75 72 66 66 72 86 69 86 66 69 72 66 60 75 82 98 54 63 63 66 63 66 86 66 69 54 54 57 66 60 63 63 69 72 75 69 15 9 9 9 13 9 3 5l.Hf3083 IS 69 86 94 98 94 90 90 75 72 82 94 79 94 82 69 75 79 69 94 98 101 60 66 75 75 57 66 79 75 69 66 66 69 72 72 79 72 72 82 75 79 26 20 24 22 18 24 16 18 52. GcA4-2, 75 72 75 90 90 94 86 79 75 69 79 94 66 82 63 75 72 69 79 75 90 98 69 79 75 72 79 79 94 82 75 69 63 69 66 72 72 72 86 82 82 72 33 28 28 22 28 26 22 22 42 53. RclO2 34 79 90 102 102 106 98 98 86 82 79 86 79 111 82 86 86 86 82 98 111 121 69 82 86 82 79 82 90 86 79 69 63 72 79 75 75 75 75 79 90 75 49 42 47 39 39 44 44 47 49 49 54.RellO a6 72 86 94 94 98 90 90 75 79 79 86 75 111 82 82 90 90 75 102106121 63 75 79 75 72 75 86 79 72 63 63 72 75 72 79 75 79 79 90 75 49 42 47 44 44 44 44 47 47 54 6 55. RelO7 32 66 79 90 90 94 86 86 72 79 79 90182 111 86 90 90 102 79 94 102116 66 82 82 79 75 82 86 82 82 66 63 72 82 75 82 72 79 79 90 75 49 42 44 44 44 47 42 44 44 54 16 11 56. Hf122 06 90 111 102 102 116 102 94 98 86 106 121194 102 94 111 94 116 102 98 106 111 82 94 106102 90 94 116 94 98 98 86 86 98 90 98 82 102102 86 86 94 94 86 94 86 90 86 90 94 98 94 90 90 57. IwTB6 11 102126 116 126 138 106 102106 86 106126111 116116 106 111 116 106 116111 111 02 102 106 116102 121 121 102 106 98 98 111 116 116 106111106 102 111 98 82 82 79 82 79 79 79 82 82 102 90 90 90 75

FIG. 3.-Pairwise Poisson-correction distances for 55 V, and two VL sequences. The five major groups of VH genes and two VL outgroup genes are separated by lines. An asterisk (*) indicates a sequence deduced from a cDNA. Mammalian clan I Group A

HsVH6 97 MmA/J 91 HsV12G-1 Mammalian 1clan II Group B

African toad Xe7 1 HsVH26 rl . MmVH283 Mammalian clan III

Chicken Group C Caiman Coelacanth African toad

99 CaVH99A IpNG64” I 1 Ray-finned fishes IGroup D

_, Hf.2807 Hf1320 65f Hi2806 71 Hf1403 Hf1207 Cartilaginous 62c Group E Hf1315 fishes 86 , Hf3083 GcA4-2*

HsVPB6 1 J Outgroup

FIG. 4.-Phylogenetic tree of 55 Vu and two VL (outgroup) genes. There are five major groups (A, B, C, D, and E) of Vn genes identified from this tree, though the grouping is not necessarily statistically significant. The number given to each interior branch is the probability at which the branch length is different from 0 (confidence probability). Confidence probability ~60% is not given. The branch lengths are measured in terms of the number of amino acid substitutions, with the scales given below the tree.

475 476 Ota and Nei

Table 2 Jukes-Cantor Distances (X100) for 10 Human Group A VH Genes

1 2 3 4 5 6 7 8 9 10

1. 1.2 ...... 2. 1.3 ...... 7.7 3. 1.8 ...... 3.5 7.1 4. 1.18 ...... 6.1 7.7 6.6 5. 1.46 ...... 4.0 5.5 4.0 6.1 6. ~1.17 ...... 10.4 11.0 11.5 12.7 11.5 7. ~1.24 ...... 9.8 10.4 8.7 10.4 9.3 16.8 8. ~1.27 ...... 21.2 23.8 21.8 26.5 22.5 31.5 29.3 9. ~1.58 ...... 10.4 11.0 9.8 12.1 9.8 16.8 13.2 28.6 10. 5.51 ...... 27.2 31.5 29.3 29.3 30.0 37.7 33.0 37.7 32.3 11. Xenopus 1 l.lba . 50.8 53.7 51.7 48.9 49.8 56.7 53.7 61.0 54.7 49.8

NOTE.-“I+I” denotes a pseudogene. ’ Outgroup gene. in group A. Although this cluster belongs to the mam- al. 199 1). These genes again belong to group C (see fig. malian clan I, it is quite different from the human gene 4) and are relatively closely related to each other (au- belonging to the same clan. thors’ unpublished data). The V region diversity of this An important feature that emerges from figure 4 is organism is also generated by somatic gene conversion. that the three mammalian VH clans separated a long time ago before mammals and amphibians diverged Evolutionary Relationships of Vu Genes in the ( -350 Mya), because group A, B, and C genes are Human Genome shared by both mammals and Xenupus; that is, mam- In the study mentioned above, we chose distantly malian Vu gene clans I-III have coexisted in the genome, related VH genes from diverse organisms to study the for 2350 Myr, as separate lineages. Figure 4 shows that pattern of long-term evolution. To understand the dy- group C includes VH genes from both bony fishes and namics of evolutionary change of Vn genes, however, it tetrapods (including mammalian clan III genes) and that is necessary to examine the evolutionary relationships separation of group C genes from bony-fish of genes within either the same species or closely related group C genes occurred after group A, B, and C genes species. The most extensive study of the structure and diverged. This suggests that the divergence of group A, physical map of Vu genes has been done in the human. B, and C genes occurred even before bony fishes and Matsuda et al. ( 1993 ) constructed the physical map of tetrapods diverged -400 Mya. It is impressive that dif- the O&Mb DNA fragment that contains 64 VH genes, ferent copies of VH genes have persisted for such a long and the nucleotide sequences of these genes are available time in the tetrapod genome, and this raises a serious from the GenBank. question about the importance of concerted evolution We examined the evolutionary relationships of Vu in Vu genes. genes belonging to groups A and C (groups a and c in It should be noted that the mammalian Vu clans the notation of Matsuda et al.), as mentioned earlier. I-III may not exist in many mammalian species. Indeed, Both of these groups include many pseudogenes, but all Vu genes in the rabbit seem to belong to clan III, and some of the pseudogenes were truncated or unalignable they are relatively closely related to.each other in terms with the functional genes. We used the Xenopus gene of sequence divergence (data not shown). In the rabbit, 11.1 b as an outgroup for group A genes and used the however, the VH gene, which is located proximally to caiman gene C3 for group C genes. (Closely related genes the D segment gene cluster, is used with a high frequency are preferable as outgroups.) ( 70%90% ) for generating antibody diversity, though The Jukes-Cantor distances for group A genes are there are 2 100 Vu genes in the genome and the rest of presented in table 2, and the phylogenetic tree for these the genes seem to be used with a low frequency (Knight genes is given in figure 5. This figure clearly shows that 1992). pseudogenes evolve much faster than functional genes. A more extreme case of preferential use of one VH This is expected because pseudogenes do not have func- gene is known in chicken. In this organism, only one tional constraint and thus accumulate more Vn gene (Vu 1) is functional, and the other Vn genes are rapidly than do functional genes (Li et al. 198 1; Miyata all pseudogenes (Reynaud et al. 1989; McCormack et and Yasunaga 198 1; Kimura 1983). This result is again Evolution of Immunoglobulin Vn Gene Family 477

whose branch lengths are nearly the same as those of

v1.17 the functional genes. These pseudogenes probably lost r-L,., their function very recently. At any rate, this tree also c1.46 does not support the view that intergenic gene conversion or unequal crossover plays an important role to ho- mogenize the member genes. The mouse genome is also known to contain many 1.2 * pseudogenes as well as functional Vn genes. Unfortu- 1.8 nately, there seems to be no systematic study of Vn gene sequences in this organism. However, using GenBank y-1 r-F11.18 sequences available, we again constructed a phylogenetic trees for 39 group A mouse Vn genes (data not shown). This tree also showed a tendency for pseudogenes to I‘ 5.51 v1-27evolve faster, and there was no clear-cut evidence of se- Xenopus 1l.lb quence homogenization.

Discussion 0.1 Concerted Evolution

FIG. 5.-Phylogenetic tree of 10 group A human Vn genes. All As mentioned earlier, multigene families are be- sequences except for Xenopus 11.1b were taken from Shin et al. ( 199 1) lieved to be subject to unequal crossover, which generates and Matsuda et al. ( 1993). The Xenopus gene used here is the one of new duplicate genes or deletes some extant duplicate the closest outgroup gene (see fig. 4). w = Pseudogene. The branch lengths are measured in terms of the number of nucleotide substitutions, genes (Smith et al. 197 1; Smith 1974). If this process with the scales given below the tree. The confidence probabilities of continues, duplicate genes in a multigene family tend interior branches are indicated by asterisks: * , >90%; and * *, >95%.

inconsistent with the view that the Vn genes are subject to frequent (germ-line) gene conversion or unequal crossover events. If intergenic gene conversion or un- yr3.41 equal crossover occurs very often, we would expect that 3.13 ye3.16 both functional and pseudogenes would be mixed and l a that most genes in the genome would have similar nu- 65 3.35 cleotide sequences. The phylogenetic tree in figure 5 does y 3.52 not show this pattern. It is interesting to see that all functional genes have evolved nearly at the same rate and that some functional genes are quite close to each ty 3.63 other. Gojobori and Nei ( 1984) estimated that the rate y 3.54 Iy3.6 of nucleotide substitution for FWs is 1.Ol X lo-‘, 0.65 X 10v9, and 2.63 X 10m9 per site per year per lineage ez3 ,_y3.38 for the first, second, and third codon positions, respec- yl3.37 tively. Therefore, the average rate is 1.43 X 10 -9. If this rate is reliable, even the closest pair of functional genes L,;y ( Hs 1.2 and Hs 1.8 ) seem to have diverged - 12 Mya. The most divergent gene, Hs5.5 1, seems to have diverged from the other five functional genes - 100 Mya. This suggests that many functional Vn genes have persisted for a long time in the human genome. When all group A, B, and C genes are considered together, they have survived for >400 Myr, as mentioned earlier. Figure 6 shows the phylogenetic tree for group C FIG. 6.-Phylogenetic tree of 28 group C human Vu genes. All sequences except for Caiman C3 were taken from Matsuda et al. ( 1993). genes (distance estimates are not shown). The pseu- The Caiman gene used here is the one of the closest outgroup gene dogenes in this group also generally evolve much faster (see fig. 4). The confidence probabilities of interior branches are in- than functional genes. There are a few pseudogenes dicated by asterisks: *, >90%; * *, >95%, and + * *, >99%. 478 Ota and Nei

to have similar nucleotide sequences even in the presence three major problems in her study. First, in the study of of mutation. More recently, intergenic gene conversion evolution of Vu genes, only germ-line sequences should was considered as an additional mechanism acting to be used; but she used many cDNA sequences. Second, homogenize the member genes of a multigene family CDRs evolve rapidly and include many indels. There- (Slightom et al. 1980). These two mechanisms are fore, for the comparison of ds and dN , only closely related thought to be the major causes of concerted evolution sequences should be used. When this is done, ds is nearly that homogenizes the member genes of such gene fam- the same for both CDRs and FWs, and dN > ds in CDRs ilies as RNA genes and histone genes (fig. 7A) (Smith but dN < ds in FWs (Tanaka and Nei 1989). (In the 1974; Ohta 1980; Arnheim 1983). human, ds was higher in CDRs than in FWs in Tanaka The concerted (coincidental) evolution was also and Nei’s study; but the difference was not significant.) invoked to explain the evolution of immunoglobulin Third, there are no data showing that gene conversion diversity (Ohta 1980, 1983), as mentioned earlier. In occurs exclusively in CDRs. Actually, somatic gene con- this case, unequal crossover or gene conversion was re- version occurs in both CDRs and FWs in chicken garded as a mechanism to increase the ( McCormack and Thompson 1990). For the above rea- () at a locus by introducing new variants sons, Ohta’s conclusion does not seem to be supported from different loci. However, when Gojobori and Nei by available data. ( 1984) conducted a phylogenetic analysis of human and Divergent Evolution mouse Vu genes, they noticed that the extent of sequence divergence among different loci is very high and that, if In figure 4, we have seen that the major groups of these genes were subject to concerted evolution, the rate Vn genes in higher vertebrates have been maintained in of unequal crossover or gene conversion must be very the genome for ~400 Myr. This observation is in rough low-two orders of magnitudes lower than that of ri- agreement with the estimate of divergence (300-400 bosomal RNA genes. To explain the difference between Myr) obtained by Gojobori and Nei ( 1984) in a statis- the two gene families, they proposed that, whereas rRNA tical analysis of human and mouse Vn genes (see Ad- genes are subject to purifying selection, Vn genes are dendum in their paper). However, it is in sharp contrast subject to diversifying selection. with the evolution of rRNA genes, where even the hu- Conducting a study on the number of synonymous man and the chimpanzee, which apparently diverged (&) and nonsynonymous (&) substitutions in CDRs - 5 Mya (see Stoneking 1993), do not share the same and FWs, Ohta ( 1992) recently concluded that the group of genes ( Arnheim 1983). In the present paper, amino acid variability in CDRs is primarily caused by this type of long-term persistence of member genes of a gene conversion. This conclusion is based on her obser- multigene family will be called “divergent evolution,” vations that ds is generally higher in CDRs than in FWs and it is schematically represented in figure 7B. and that & was not necessarily higher than ds, even in There are two possible hypotheses to explain this CDRs. She argued that these observations do not support long-term persistence of Vu genes in the vertebrate ge- the hypothesis of diversifying selection but are consistent nome (Gojobori and Nei 1984); one is to assume that with the gene conversion hypothesis. However, there are the rate of occurrence of unequal crossover is low in Vu genes, and the other is the hypothesis that diversifying selection operates among Vu genes. Gojobori and Nei Time ( 1984) estimated the rate of occurrence of unequal crossover by fitting a mathematical theory to the phy- logenetic tree for Vu genes. The rate obtained was - 100 times lower than the estimate for the rRNA multigene family. However, since there is no reason to believe that the rate must be lower in Vu genes than in rRNA genes, they supported the second hypothesis. The recent study of the physical map of Vu genes by Matsuda et al. ( 1993) supports Gojobori and Nei’s ( 1984) view, because group A, B, and C Vu genes are scattered throughout the 0.8- Mb Vn gene region, suggesting that unequal crossover species 1 species 2 species 1 species 2 species 1 sepcies 2 has occurred relatively frequently in the past (also see Tutter and Riblet 1989b). (A) Concerted (B) Divergent (C) Evolution by birth evolution evolution and death process The theoretical basis of the second hypothesis is FIG. 7.-Three different modes of evolution of multigene families. that vertebrate individuals are exposed to a large number 0 = Functional genes; and 0 = pseudogenes. of foreign antigens, and thus, if an individual has many Evolution of Immunoglobulin VH Gene Family 479 different types of Vn genes, it will enhance the of process, when a relatively short period of evolutionary the individual. This means that a new mutant Vu gene time is considered. that encodes an antibody molecule matching for a new From the phylogenetic trees given in figures 5 and type of antigens has selective advantage and that old 6, we previously concluded that concerted evolution does genes that have proved to be useful will be retained in not play an important role in the Vu gene family. This the genome. Indeed, Tanaka and Nei ( 1989) have shown does not mean that there is no homogenization process that the rate of amino acid-altering (nonsynonymous) in the evolution of VH genes. As long as unequal cross- nucleotide substitutions is higher than the rate of syn- over occurs, there must be some chance of homogeni- onymous nucleotide substitution in CDRs of the human zation. We also do not completely rule out the role of and mouse Vn genes. In other words, there is positive intergenic gene conversion. Because gene conversion is Darwinian selection operating for the diversification of an important mechanism of somatic generation of an- Vu genes. tibody diversity in some organisms (e.g., chicken), it Diversifying selection would certainly increase the might have played some role in the evolution of genomic persistence time of different Vn genes in the genome, structure of Vu genes as well. Nevertheless, as far as but can it explain the existence of three major groups currently available data are concerned, the Vn gene of Vu genes that have survived for ~400 Myr? Theo- family appears to have evolved mainly following the retically, it can happen by chance when random occur- scheme of divergent evolution and evolution by the rence of unequal crossover creates new genes or deletes birth-and-death process. existing ones. However, it is more likely that these three Some Remarks groups are specialized to cope with different groups of antigens, though there is no evidence at the present time. It should be emphasized that, although we now have Kirkham et al. ( 1992) showed that each of the mam- a rough picture of the evolution of VH genes in verte- malian clans I-III has a unique FR structure that influ- brates, this picture is based on studies of a limited num- ences the conformation of the antigen-binding site, and ber of species. If more species are studied, we may find they suggested that there might be a differential use of new types of genomic organization of Vu genes. It is Vn clans in the immune response. rather surprising that the genomic structure of the coel- acanth is somewhat similar to that of cartilaginous fishes. Evolution by the Birth-and-Death Process It is also interesting that the rabbit system is similar to In the theory of concerted evolution, it is commonly the chicken system. Although these systems should be assumed that the total number of duplicate genes of a studied in more detail, they suggest that antibody di- gene family remains more or less constant (Ohta 1983 ). versity is generated in many different ways and that some Hughes and Nei ( 1989) and Nei and Hughes ( 199 1, systems may have evolved independently in different 1992) showed that, in the major-histocompatibility- groups of vertebrates. Obviously, more detailed studies complex (MHC) gene family, gene duplication often of the genomic structure of Vn genes are necessary in occurs but that many duplicate genes die out because many different organisms before we understand the gen- of deleterious mutation. However, the number of func- eral pattern of evolution of this multigene family. tional genes remains more or less constant, since the Previously we suggested that the long persistence effects of the birth and death of duplicate genes are bal- of VH genes in vertebrate genomes is aided by diversi- anced out. The dead genes either stay as pseudogenes in fying selection. However, the extent of selective advan- the genome or are eliminated by unequal crossover. They tage of a new mutant Vn gene is probably small (possibly called this the “evolution by the birth-and-death process” l%-2%). The reason for this is that there are organisms (fig. 7C). in which only one or a few Vu genes are functional and Figures 5 and 6 show that evolution by the birth- in which others are either pseudogenes or rarely used. and-death process also occurs in the Vn gene family. If every gene had a unique function that is essential to Matsuda et al. ( 1993) estimated that - 50% of Vu genes the survival of the organism, this type of system would that exist in the human genome are pseudogenes. Sim- never have evolved; somatic mutation and intergenic ilarly, the mouse genome harbors many pseudogenes gene conversion would be too risky and too uneconom- (Rathbun et al. 1989). Since the number of functional ical in this case. The fact that many Vu genes die out in VH genes in the genome is similar for many mammalian the course of evolution also suggests that the adaptive species, these results indicate that nearly the same num- differences between different Vu genes is rather small. ber of functional genes as that of pseudogenes is pro- Nevertheless, this small magnitude of adaptive dif- duced by unequal crossover for a given period of evo- ference (s) would be sufficient to explain the long per- lutionary time. We can therefore conclude that the Vu sistence of Vn genes in the vertebrate genome if the pop- gene family is subject to evolution by the birth-and-death ulation size (N) is sufficiently large, because the 480 Ota and Nei maintenance of Vu genes would be determined mainly HARDING, F. A., C. T. AMEMIYA,R. T. LITMAN, N. COHEN, by Ns. If the theoretical study on frequency-dependent and G. W. LITMAN. 1990a. Two distinct immunoglobulin selection with respect to MHC genes (Takahata and Nei heavy chain isotypes in a primitive, cartilaginous fish, Raja 1990) gives any guidance, it suggests that the persistence erinacea. Nucleic Acids Res. 18:6369-6376. of different Vn genes would be enhanced substantially HARDING, F. A., N. COHEN, and G. W. LITMAN. 1990b. Im- by diversifying selection when Ns > 100. munoglobulin heavy chain gene organization and com- plexity in the skate, Raja erinacea. Nucleic Acids Res. 18: Acknowledgment 1015-1020. HIGGINS, D. G., A. J. BLEASBY,and R. FUCHS. 1992. CLUS- This work was supported by research grants from TAL V: improved software for multiple sequence alignment. the National Institute of Health (GM-20293) and the Comput. Appl. Biosci. 8: 189- 19 1. National Science Foundation (DEB-9 119802) to M.N. HINDS-FREY,K. R., H. NISHIKATA, R. T. LITMAN,and G. W. LITMAN. 1993. Somatic variation precedes extensive di- LITERATURE CITED versification of germline sequences and combinatorial join- ing in the evolution of immunoglobulin heavy chain di- ARNHEIM,N. 1983. Concerted evolution of multigene families. versity. J. Exp. Med. 178:8 15-824. Pp. 38-61 in M. NEI, and R. K. KOEHN, eds. Evolution of HOOD, L., J. H. CAMPBELL,and S. C. R. ELGIN. 1975. The genes and . Sinauer, Sunderland, Mass. organization, expression, and evolution of antibody genes AMEMIYA, C. T., and G. W. LITMAN. 1990. Complete nu- and other multigene families. Annu. Rev. Genet. 9:305- cleotide sequence of an immunoglobulin heavy-chain gene 353. and analysis of immunoglobulin gene organization in a HUGHES, A. L., and M. NEI. 1989. Evolution of the major primitive teleost species. Proc. Natl. Acad. Sci. USA 87: histocompatibility complex: independent origin of non- 811-815. classical class I genes in different groups of mammals. Mol. AMEMIYA,C. T., Y. OHTA, R. T. LITMAN, J. P. RAST, R. N. Biol. Evol. 6:559-579. HAIRE, and G. W. LITMAN. 1993. VH gene organization in HUMPHRIES, C. G., A. SHEN, W. A. KUZIEL, J. D. CAPRA, a relict species, the coelacanth Latimeria chalumnae: evo- F. R. BLATTNER,and P. W. TUCKER. 1988. A new human lutionary implications. Proc. Natl. Acad. Sci. USA 90:666 l- immunoglobulin VH family preferentially rearranged in 6665. immature B-cell tumours. Nature 331:446-449. BAUER, S. R., A. KUDO, and F. MELCHERS. 1988. Structure JUKES, T. H., and C. R. CANTOR. 1969. Evolution of and pre-B lymphocyte restricted expression of the VpreB molecules. Pp. 2 1- 132 in H. N. MUNRO, ed. Mammalian gene in humans and conservation of its structure in other protein metabolism. Academic Press, New York. mammalian species. EMBO J. 7: 11 l- 116. KABAT, E. A., T. T. WV, H. M. PERRY, K. S. GOTTESMAN, BOTHWELL,A. L. M., M. PASKIND, M. RETH, T. IMANISHI- and C. FOELLER. 199 1. Sequences of proteins of immu- KARI, K. RAJEWSKY, and D. BALTIMORE. 198 1. Heavy nological interest, 5th ed. U.S. Department of Health and chain variable region contribution to the NPb family of Human Services, Government Printing Office, Washington, antibodies: somatic mutation evident in a y2a variable re- D.C. gion. Cell 24:625-637. CAROLL, R. L. 1988. Vertebrate paleontology and evolution, KASTURI, K. N., R. MAYER, C. A. BONA, V. E. SCOTT, and Freeman, New York. C. L. SIDMAN. 1990. Germline V genes encode viable CURRIER, S. J., J. L. GALLARDA, and K. L. KNIGHT. 1988. motheaten mouse autoantibodies against thymocytes and Partial molecular genetic map of the rabbit Vu chromo- red blood cells. J. Immunol. 145:2304-23 11. somal region. J. Immunol. 140: 165 l- 1659. KIMURA, M. 1983. The neutral theory of . Du PASQUIER,L. 1993. Phylogeny of B-cell development. Curr. Cambridge University Press, Cambridge. Opin. Immunol. 5: 185- 193. KIRKHAM, P. M., F. MORTARI, J. A. NEWTON, and H. W. GEFTER, M. L., M. N. MARGOLIES, R. I. NEAR, and L. J. SCHROEDER,JR. 1992. Immunoglobulin Vu clan and family WYSOCKI. 1984. Analysis of the anti-azobenzenearsonate identity predicts variable domain structure and may influ- response at the molecular level. Annu. Inst. Pasteur Im- ence antigen binding. EMBO J. 11:603-609. munol. 135C: 17-30. KNIGHT, K. L. 1992. Restricted VH gene usage and generation GHAFFARI, S. H., and C. J. LOBB. 1991. Heavy chain variable of antibody diversity in rabbit. Annu. Rev. Immunol. 10: region gene families evolved early in phylogeny. J. Immunol. 593-6 16. 146:1037-1046. KOKUBU, F., R. LITMAN, M. J. SHAMBLOTT,K. HINDS, and GOJOBORI,T., and M. NEI. 1984. Concerted evolution of the G. W. LITMAN. 1988. Diverse organization of immuno- immunoglobulin Vu gene family. Mol. Biol. Evol. 1:195- globulin Vu gene loci in a primitive vertebrate. EMBO J. 212. 7~3413-3422. HAIRE, R. N., Y. OHTA, R. T. LITMAN, C. T. AMEMIYA,and KUDO, A., and F. MELCHERS. 1987. A second gene, VpreBin G. W. LITMAN. 199 1. The genomic organization of im- the h5 locus of the mouse, which appears to be selectively munoglobulin VH genes in Xenopus laevis shows evidence expressed in pre-B lymphocytes. EMBO J. 6:2267-2272. for interspersion of families. Nucleic Acids Res. 19:306 l- LEE, K. H., F. MATSUDA, T. KINASHI, M. KODAIRA, and T. 3066. HONJO. 1987. A novel family of variable region genes of Evolution of Immunoglobulin VH Gene Family 48 1

the human immunoglobulin heavy chain. J. Mol. Biol. 195: - . 1992. Balanced polymorphism and evolution by the 76 l-768. birth-and-death process in the MHC loci. Pp. 27-38 in K. LI, W. H., T. GOJOBORI, and M. NEI. 198 1. Pseudogenes as TSUJI, M. AIZAWA, and T. SASAZUKI, eds. HLA 199 1. a paradigm of neutral evolution. Nature 292:237-239. Proceedings of the 11th Histocompatibility Workshop and LITMAN, G. W., L. BERGER, K. MURPHY, R. LITMAN, K. Conference. Vol. 2. Oxford University Press, Oxford. HINDS, C. L. JAHN, and B. W. ERICKSON. 1983. Complete OHTA, T. 1980. Evolution and variation of multigene families. nucleotide sequence of an immunoglobulin Vu gene ho- Springer, Berlin. mologue from Caiman, a phylogenetically ancient reptile. -. 1983. On the evolution of multigene families. Theor. Nature 303:349-352. Popul. Biol. 23:2 16-240. LITMAN, G. W., J. P. RAST, M. J. SHAMBLOTT,R. N. HAIRE, -. 1992. A statistical examination of hypervariability in M. HULST, W. ROESS, R. T. LITMAN, K. R. HINDS-FREY, complementarity determining regions of immunoglobulins. A. ZILCH, and C. T. AMEMIYA. 1993. Phylogenetic diver- Mol. Phylogenet. Evol. 1:305-3 11. sification of immunoglobulin genes and the antibody rep- OLLO, R., C. AUFFRAY, J.-L. SIKORAV, and F. ROUGEON. ertoire. Mol. Biol. Evol. 10:60-72. 198 1. Mouse heavy chain variable regions: nucleotide se- MCCORMACK,W. T., L. M. CARLSON,L. W. TJOELKER,and quence of a germ-line Vu gene segment. Nucleic Acids Res. C. B. THOMPSON. 1989. Evolutionary comparison of the 9:4099-4 109. avian IgL locus: combinatorial diversity plays a role in the OLLO, R., J.-L. SIKORAV,and F. ROUGEON. 1983. Structural generation of the antibody repertoire in some avian species. relationships among mouse and human immunoglobulin Int. Immunol. 1:332-34 1. Vu genes in the subgroup III. Nucleic Acids Res. 11:7887- MCCORMACK, W. T., and C. B. THOMPSON. 1990. Chicken 7897. IgL variable region gene conversions display pseudogene RATHBUN,G., J. BERMAN,G. YANCOPOULOS,and F. W. ALT. donor preference and 5’ to 3’ polarity. Genes Dev. 4:548- 1989. Organization and expression of the mammalian 558. heavy-chain variable region locus. Pp. 63-90 in T. HONJO, MCCORMACK,W. T., L. W. TJOELKER,and C. B. THOMPSON. F. W. ALT, and T. H. RABBITTS, eds. Immunoglobulin 199 1. Avian B-cell development: generation of an immu- genes. Academic Press, London. noglobulin repertoire by gene conversion. Annu. Rev. Im- RATHBUN, G. A., F. OTANI, E. C. B. MILNER, J. D. CAPRA, munol. 9:2 19-24 1. MATSUDA, F., K. H. LEE, S. NAKAI, T. SATO, M. KODAIRA, and P. W. TUCKER. 1988. Molecular characterization of S. Q. ZONG, H. OHNO, S. FUKUHARA,and T. HONJO. 1988. the A/J 5558 family of heavy chain variable region gene Dispersed localization of D segments in the human im- segments. J. Mol. Biol. 202:383-395. munoglobulin heavy-chain locus. EMBO J. 7: 1047- 105 1. REYNAUD,C.-A., V. ANQUEZ, H. GRIMAL, and J.-C. WEILL. MATSUDA, F., E. K. SHIN, H. NAGAOKA, R. MATSUMURA, 1987. A hyperconversion mechanism generates the chicken M. HAINO, Y. FUKITA, S. TAKA-ISHI, T. IMAI, J. H. RILEY, light chain preimmune repertoire. Cell 48:379-388. R., ANAND, E. SOEDA,and T. HONJO. 1993. Structure and REYNAUD, C.-A., A. DAHAN, V. ANQUEZ, and J.-C. WEILL. physical map of 64 variable segments in the 3’ 0.8 megabase 1989. Somatic hyperconversion diversifies the single Vn region of the human immunoglobulin heavy-chain locus. gene of the chicken with a high incidence in the D region. Nature Genet. 3:88-94. Cell 59:171-183. MATSUNAGA,T., T. CHEN, and V. T~RMANEN. 1990. Char- RZHETSKY, A., and M. NEI . 1992. A simple method for esti- acterization of a complete immunoglobulin heavy-chain mating and testing minimum-evolution trees. Mol. Biol. variable region germ-line gene of rainbow trout. Proc. Natl. Evol. 9:945-967. Acad. Sci. USA 87:7767-777 1. -. 1993. Theoretical foundation of the minimum-evo- MATSUO, Y., and T. YAMAZAKI. 1989. Nucleotide variation lution method of phylogenetic inference. Mol. Biol. Evol. and divergence in the histone multigene family in Dro- 10:1073-1095. sophila melanogaster. Genetics 122~87-97. SAKANO, H., R. MAKI, Y. KUROSAWA,W. ROEDER, and S MATTHYSSENS,G., and T. H. RABBITTS. 1980. Structure and TONEGAWA. 1980. Two types of somatic recombination multiplicity of genes for the human immunoglobulin heavy are necessary for the generation of complete immunoglob- chain variable region. Proc. Natl. Acad. Sci. USA 77:656 l- ulin heavy-chain genes. Nature 286:676-683. 6565. SCHIFF,C., M. MILILI, and M. FOUGEREAU. 1985. Functional MIYATA,T., and T. YASUNAGA.198 1. Rapidly evolving mouse and pseudogenes are similarly organized and may equally a-globin-related pseudo gene and its evolutionary history. contribute to the extensive antibody diversity of the IgVuII Proc. Natl. Acad. Sci. USA 78:450-453. family. EMBO J. 4: 1225-1230. NEI, M. 1987. Molecular evolutionary genetics. Columbia SCHIFF,C., M. MILILI, I. HUE, S. RUDIKOFF, and M. FOUGER- University Press, New York. EAU. 1986. Genetic basis for expression of the idiotypic NEI, M., and A. L. HUGHES. 199 1. Polymorphism and evo- network. J. Exp. Med. 163:573-587. lution of the major histocompatibility complex loci in SCHROEDER, H. W. JR., J. L. HILLSON, and R. M. PERLMUT- mammals. Pp. 222-247 in R. SELANDER,A. CLARK, and TER. 1990. Structure and evolution of mammalian Vu T. WHITTAM,eds. Evolution at the molecular level. Sinauer, families. Int. Immunol. 2:4 l-50. Sunderland, Mass. SCHROEDER, H. W. JR., M. A. WALTER, M. H. HOFKER, A. 482 Ota and Nei

EBENS,K. W. VAN DIJK, L. C. LIAO, D. W. Cox, E. C. B. THOMPSON,C. B., and P. E. NEIMAN. 1987. Somatic diver- MILNER, and R. M. PERLMUTTER. 1988. Physical linkage sification of the chicken immunoglobulin light chain gene of a human immunoglobulin heavy chain variable region is limited to the rearranged variable gene segment. Cell 48: gene segment to diversity and joining region elements. Proc. 369-378. Natl. Acad. Sci. USA 85:8 196-8200. TONEGAWA,S. 1983. Somatic generation of antibody diversity. SCHWAGER,J., N. BURCKERT, M. COURTET, and L. Du PAS- Nature 302:575-58 1. QUIER. 1989. Genetic basis of the antibody repertoire in TUTTER, A., and R. RIBLET. 1989~. Conservation of an im- Xenopus: analysis of the Vu diversity. EMBO J. 8:2989- munoglobulin variable-region gene family indicates a spe- 3001. cific, noncoding function. Proc. Natl. Acad. Sci. USA 86: SHAMBLOTT,M. J., and G. W. LITMAN. 1989. Genomic or- 7460-7464. ganization and sequences of immunoglobulin light chain -. 1989b. Evolution of the immunoglobulin heavy chain genes in a primitive vertebrate suggest of im- variable region (Ig/.r-V) locus in the genus MUX Immu- munoglobulin gene organization. EMBO J. 8:3733-3739. nogenetics 30:3 15-329. SHIN, E. K., F. MATSUDA,H. NAGAOKA, Y. FUKITA, T. IMAI, VAZQUEZ, M., N. MIZUKI, M. F. FLAJNIK, E. C. MCKINNEY, K. YOKOYAMA,E. SOEDA, and T. HONJO. 199 1. Physical and M. KASAHARA . 1992. Nucleotide sequence of a nurse map of the 3’ region of the human immunoglobulin heavy shark immunoglobulin heavy chain cDNA clone. Mol. Im- chain locus: clustering of autoantibody-related variable munol. 29:1157-l 158. segments in one haplotype. EMBO J. 10:3641-3645. WARR, G. W., D. L. MIDDLETON,N. W. MILLER,L. W. CLEM, SIU, G., E. A. SPRINGER,H. V. HUANG, L. E. HOOD, and S. T. and M. R. WILSON. 199 1. An additional family of Vn se- CREWS. 1987. Structure of the T15 Vu gene subfamily: quences in the channel catfish. Eur. J. Immunogen 18:393- identification of immunoglobulin gene promoter homolo- 397. gies. J. Immunol. 138:4466-447 1. WILSON, M. R., D. MIDDLETON, and G. W. WARR. 199 1. SLIGHTOM,J. L., A. E. BLECHL,and 0. SMITHIES. 1980. Hu- Immunoglobulin Vu genes of the goldfish Carussius au- man fetal Go- and Ay-globin genes: complete nucleotide se- ratus: a re-examination. Mol. Immunol. 28:449-457. quences suggest that DNA can be exchanged between these YAMASAKI-KATAOKA,Y., and T. HONJO. 1987. Nucleotide duplicated genes. Cell 21:627-638. sequences of variable region segments of the immunoglob- SMITH, G. P. 1974. Unequal crossover and the evolution of ulin heavy chain of Xenopus luevis. Nucleic Acids Res. 15: multigene families. Cold Spring Harb. Symp. Quant. Biol. 5888. 38:507-5 13. ZIMMER, E. A., S. L. MARTIN, S. M. BEVERLEY,Y. W. RAN, SMITH, G. P., L. HOOD, and W. M. FITCH. 197 1. Antibody and A. C. WILSON. 1980. Rapid duplication and loss of diversity. Annu. Rev. Biochem. 40:969- 10 12. genes coding for the c1chains of hemoglobin. Proc. Natl. STONEKING,M. 1993. DNA and . Evol. Acad. Sci. USA 77:2158-2162. Anthropol. 2:60-73. TAKAHATA, N., and M. NEI. 1990. Allelic genealogy under overdominant and frequency-dependent selection and JAN KLEIN, reviewing editor polymorphism of major histocompatibility complex loci. Genetics 124:967-978. Received November 5, 1993 TANAKA, T., and M. NEI. 1989. Positive Darwinian selection observed at the variable-region genes of immunoglobulins. Mol. Biol. Evol. 6:447-459. Accepted January 19, 1994