Evolution by the Birth-And-Death Process in Multigene Families of the Vertebrate Immune System
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Natl. Acad. Sci. USA Vol. 94, pp. 7799–7806, July 1997 Colloquium Paper This paper was presented at a colloquium entitled ‘‘Genetics and the Origin of Species,’’ organized by Francisco J. Ayala (Co-chair) and Walter M. Fitch (Co-chair), held January 30–February 1, 1997, at the National Academy of Sciences Beckman Center in Irvine, CA. Evolution by the birth-and-death process in multigene families of the vertebrate immune system MASATOSHI NEI*, XUN GU, AND TATYANA SITNIKOVA Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University, 328 Mueller Laboratory, University Park, PA 16802 ABSTRACT Concerted evolution is often invoked to explain the diversity and evolution of the multigene families of major histocompatibility complex (MHC) genes and immunoglobulin (Ig) genes. However, this hypothesis has been controversial because the member genes of these families from the same species are not necessarily more closely related to one another than to the genes from different species. To resolve this controversy, we conducted phylogenetic analyses of several multigene families of the MHC and Ig systems. The results show that the evolutionary pattern of these families is quite different from that of concerted evolution but is in agreement with the birth-and-death model of evolution in which new genes are created by repeated gene duplication and some duplicate genes are maintained in the genome for a long time but others are deleted or become nonfunctional by deleterious mutations. We found little evidence that interlocus gene conversion plays an important role in the evolution of MHC and Ig multigene families. Multigene families whose member genes have the same func- tion are generally believed to undergo concerted evolution that FIG. 1. Two different models of evolution of multigene families. E, Functional gene; F, pseudogene. homogenizes the DNA sequences of the member genes (1–3). A good example of this type of gene family is the cluster of ribosomal RNA genes, where all the member genes (several interlocus gene conversion or recombination plays an important hundred genes) have very similar DNA sequences within role in the generation and maintenance of genetic diversity of species even in nontranscribed spacer regions. For example, MHC and Ig gene families. the member genes of this cluster in humans are more similar To resolve this controversy, it is important to conduct a to one another than to most of the genes in chimpanzees (4). comprehensive study of the evolutionary patterns of immune This high degree of sequence homogeneity within species is system genes. Fortunately, the amount of DNA sequence data for believed to have been achieved by frequent interlocus recom- immune system genes is rapidly increasing, mainly because of the bination or gene conversion (Fig. 1A). genome project that is under way in various organisms. We have Concerted evolution was also invoked to explain the diversity therefore compiled DNA sequence data for immune system and evolution of the multigene families of major histocompati- genes from GenBank and other sources and studied the general bility complex (MHC) genes (5–7) and immunoglobulin (Ig) evolutionary pattern of these genes. In vertebrates there are three genes (8–11). In this case the genetic diversity (polymorphism) major gene families that play an important role in identifying and for a locus or a set of loci is assumed to be generated by removing invading foreign antigens or parasites (virus, bacteria, introduction of new variants from different loci through interlo- and eukaryotic parasites). They are the MHC, T cell receptor cus recombination or gene conversion. However, we have ques- (TCR), and Ig gene families. The function of MHC molecules is tioned these claims of concerted evolution on grounds that the to distinguish between self- and nonself- antigens and present member genes of MHC and Ig gene families from the same foreign peptides to TCRs of T lymphocytes (T cells), whereas the species are not necessarily more closely related to one another role of TCRs is to interact with peptide-bound MHC molecules than to the genes from different species (12–16). For example, and stimulate T cells to kill virus-infected cells or to give signals Ota and Nei (14) showed that the Ig heavy chain variable region for B lymphocytes (B cells) to produce immunoglobulins. Immu- V ( H) genes in humans can be divided into three major groups, and noglobulins are primarily responsible for humoral immunity, these major groups are shared by mice, amphibians, and teleost removing foreign antigens or parasites circulating in the blood- fishes, yet gene duplication apparently occurs quite frequently stream. The gene families encoding MHC, TCR, and Ig mole- and many duplicate genes subsequently become nonfunctional. cules are all evolutionarily related and are composed of several Therefore, they concluded that the evolution of this gene family is characterized by the ‘‘birth-and-death model of evolution’’ (Fig. 1B). However, some authors (10, 17–20) still maintain that Abbreviations: MHC, major histocompatibility complex; Ig, immuno- globulin; TCR, T cell receptor; b2m, b2-microglobulin; Myr, million years. © 1997 by The National Academy of Sciences 0027-8424y97y947799-8$2.00y0 *To whom reprint requests should be addressed. e-mail: nxm2@ PNAS is available online at http:yywww.pnas.org. psu.edu. 7799 Downloaded by guest on September 25, 2021 7800 Colloquium Paper: Nei et al. Proc. Natl. Acad. Sci. USA 94 (1997) separate gene clusters, each of which spans more than one The mouse MHC genes have also been studied extensively. The megabase (Mb) of DNA in the human genome. mouse class Ia genes are not orthologous with the human class Ia The purpose of this paper is to present major findings of our genes (24–26), and therefore different gene symbols are used for recent study on this subject. The immune system is one of the them (Fig. 2). Actually, most different orders of mammals seem most complicated genetic systems in vertebrates, and detailed to have nonorthologous class Ia genes. The number of class Ia results will be published elsewhere. In this paper, we will be genes in mammals is usually 1–3, but there are often a large concerned primarily with the evolution of MHC and VH gene number of class Ib genes. By contrast, class II genes from different families. orders of mammals usually have orthologous relationships, but the genes from birds and amphibians are not orthologous with the EVOLUTION OF MHC GENES mammalian genes, with a few possible exceptions (27). MHC molecules in vertebrates can be divided into two groups; Polymorphism. The hallmark of MHC genes is the ex- class I and class II molecules. The class I MHC molecule consists tremely high degree of polymorphism within loci, the extent of polymorphism being the highest among all vertebrate genetic of an a chain and a b2-microglobulin (b2m). The a chain is loci (28). The mechanism of maintenance of this polymor- encoded by a class I MHC gene, whereas b2m is produced by a gene that lies outside the MHC. The class I a chain has three phism has been debated for the last 30 years, and it still remains extracellular domains (a , a , a ), a transmembrane portion, and controversial (15, 19, 29). The hypotheses proposed to explain 1 2 3 the polymorphism include those of maternal–fetal incompat- a cytoplasmic tail. The a domain associates noncovalently with 3 ibility, mating preference, overdominant selection, frequency- b m. The class II MHC molecule consists of noncovalently 2 dependent selection due to minority advantage, and interlocus associated a and b chains, which are encoded by class II a-chain gene conversion. This problem has been discussed by Hughes A b B ( ) and -chain ( ) loci, respectively. Each chain is composed of and Nei several times (15, 16, 30, 31), and we are not going to two extracellular domains (designated as a1 and a2 in the a chain repeat the discussion here. However, we would like to mention and b1 and b2 in the b chain), a transmembrane portion, and a that in our view the simplest explanation is heterozygote cytoplasmic tail. advantage or overdominant selection. In this hypothesis, het- In humans class I and class II genes are located on chromosome erozygotes for a locus have selective advantage over homozy- 6 and form two separate clusters (Fig. 2). The class I MHC gotes, because they can cope with two different types of consists of three highly expressed and highly polymorphic loci, A, antigens, whereas the latter can deal with only one type of B, and C (classical class I or class Ia loci), and 25–50 nonclassical foreign antigen. Since there are several different functional (class Ib) loci, including pseudogenes. Nondefective class Ib genes MHC loci, heterozygotes for all these loci should have sub- are usually monomorphic and expressed in limited tissues, and stantial selective advantage over homozygotes. Evidence sup- their function is not well understood. So the definition of class Ib porting overdominant selection is also increasing (19, 31). genes is somewhat vague (23). However, the recently discovered In recent years a number of authors (19, 32) presented class Ib gene HLA-HH seems to have some important function, evidence that new alleles can be created by interallelic recom- because mutants of this gene apparently cause the genetic disease bination at the B locus. However, interallelic recombination is hemochromatosis (21). The human class II gene cluster contains powerless in producing new alleles unless there are abundant six major gene regions: DP, DN, DM, DO, DQ, and DR. Each of polymorphic alleles in the population, and the MHC polymor- the DP, DM, DQ, and DR regions consists of at least one a-chain phism seems to be maintained primarily by point mutation and and at least one b-chain functional gene.