Evolution of Multigene Families by Gene Duplication: a Haploid Model
Total Page:16
File Type:pdf, Size:1020Kb
Copyright 1998 by the Genetics Society of America Evolution of Multigene Families by Gene Duplication: A Haploid Model Hidenori Tachida and Tohru Kuboyama Department of Biology, Faculty of Science, Kyushu University 33, Fukuoka 812-8581, Japan Manuscript received October 12, 1997 Accepted for publication May 13, 1998 ABSTRACT Evolution of multigene families by gene duplication and subsequent diversi®cation is analyzed assuming a haploid model without interchromosomal crossing over. Chromosomes with more different genes are assumed to have higher ®tness. Advantageous and deleterious mutations and duplication/deletion also affect the evolution, as in previous studies. In addition, negative selection on the total number of genes (copy number selection) is incorporated in the model. First, a Markov chain approximation is used to obtain formulas for the average numbers of different alleles, genes without pseudogene mutations, and pseudogenes assuming that mutation rates and duplication/deletion rates are all very small. Computer simulation shows that the approximation works well if the products of population size with mutation and duplication/deletion rates are all small compared to 1. However, as they become large, the approximation underestimates gene numbers, especially the number of pseudogenes. Based on the approximation, the following was found: (1) Gene redundancy measured by the average number of redundant genes decreases as advantageous selection becomes stronger. (2) The number of different genes can be approximately described by a linear pure-birth process and thus has a coef®cient of variation around 1. (3) The birth rate is an increasing function of population size without copy number selection, but not necessarily so otherwise. (4) Copy number selection drastically decreases the number of pseudogenes. Available data of mutation rates and duplication/deletion rates suggest much faster increases of gene numbers than those observed in the evolution of currently existing multigene families. Various explanations for this discrepancy are discussed based on our approximate analysis. ENE duplication with subsequent diversi®cation is evolutionary processes. How such gene duplications G considered to have played very important roles in and formation of multigene families occur at the popu- the organismal evolution (Ohno 1970, 1988b, 1991). If lation level is an interesting evolutionary problem. a gene is duplicated, the selective constraint becomes Ohta (1987a,b, 1988a,b) theoretically studied evolu- less for the extra copy, and it can evolve to have a tion by gene duplication in haploid and diploid models (slightly) different function, while the original function using Monte Carlo simulation. She showed among oth- of the gene is kept in the other copy. Thus, gene duplica- ers that positive selection is necessary to acquire genes tion with subsequent diversi®cation is one of the sim- with new functions and that the variance of the gene plest ways to acquire a new function and is thought to number is generally large. Also the ratio (R) of the have been employed many times during evolution. For number of genes with different functions to the number example, Iwabe et al. (1996) suggested that gene dupli- of pseudogenes was proposed to measure effects of posi- cations contributed to the compartmentalization of a tive selection, and an approximate formula to compute cell and formation of multicellular organisms by creat- it was derived (Ohta 1987b), ing genes expressed tissue speci®cally. Also, homeobox u v genes (Carroll 1995) and MCM1, AGAMOUS, DEFI- R 5 1 1 , (1) CIENS, and SRF (MADS)-box genes (Thiessen et al. v2 1996) were apparently created by ancient gene duplica- where v1 and v2 are rates of mutation to a different gene tions. Gene duplications are not restricted to ancient with a new function and of mutation to a pseudogene, times. Changes of gene number are observed in closely respectively, and u1 is the ®xation probability of an related species in many taxa, and even examples of advantageous mutation introduced into a population. polymorphisms with regard to copy number within spe- Clark Walsh Lyckegaard Clark Later, (1994) and (1995) studied differ- cies are known (e.g., and 1989; ent aspects of two gene systems assuming slightly differ- Takano et al. 1989; Lange et al. 1990; Neitz and Neitz ent models. However, there are questions still to be 1995). Thus, gene duplications are still ongoing answered. First, because Ohta's results (Ohta 1987a,b, 1988a,b) were mostly from simulation except for Equa- tion 1 dependency of gene multiplication on parameters Corresponding author: Hidenori Tachida, Department of Biology, Faculty of Science, Kyushu University, 33, Fukuoka, 812-8581, Japan. like population size, intensity of selection, mutation E-mail: [email protected] rate, and duplication/deletion rate was not so clear. Genetics 149: 2147±2158 (August 1998) 2148 H. Tachida and T. Kuboyama As we do not have good estimates for most of these TABLE 1 parameters, deriving analytical expressions describing De®nitions of symbols frequently used in the text gene multiplication processes is desirable to quantify speed of gene duplication and gene redundancy. Sec- Symbol Meaning ond, negative selection on the total number of genes per chromosome (copy number selection) was not con- k Number of different alleles in a chromosome m Number of copies with the ith allelic state sidered in the previous works. A tendency for DNA loss i n l Number of live genes in a chromosome is observed in Drosophila (Petrov et al. 1996), and n p Number of pseudogenes in a chromosome gene silencing is observed when multiple copies are n t Number of total genes in a chromosome introduced into plants (Hollick et al. 1997). Thus, N Population size there is a possibility of the action of copy number selec- u a Mutation rate to a different allele tion. Third, the rate of gene duplication from a single u p Mutation rate to a pseudogene g Duplication/deletion rate gene and from multiple genes are considered to be s a Coef®cient for advantageous selection different. However, this difference was ignored to sim- s Selection coef®cient for copy number selection Ohta d plify the model in 's simulation. f1 Fixation probability of a chromosome with one In the present article, as an extension of Ohta's work, more different allele a haploid model of gene duplication is analyzed paying g1 Fixation probability of a chromosome with one attention to the change of gene number. Copy number more gene selection is explicitly incorporated into the model, and g2 Fixation probability of a chromosome with one less gene the duplication rate from a single gene was assumed to a Rate of increase of the number of different alleles be 0. First, the model is described. Then, a Markov chain p 1(t) Probability of the population ®xed by chromo- approximation of the model is derived, and its behavior somes with k 5 1 is analyzed assuming low mutation and duplication/ pÄ 1 Probability of the population ®xed by chromo- deletion rates. Results of extensive computer simulation somes with n t 5 1 to check the accuracy of the approximate analysis will R Ratio of the number of different alleles to the number of pseudogenes be described. Approximate formulas for the rate of in- crease of the number of different genes and gene redun- dancy were obtained as a function of population size, mutation rates, duplication/deletion rates, and selec- plication or deletion occurs with a rate g/2 per gene tion coef®cients. Copy number selection was shown to copy, respectively. If a chromosome has just one gene, be very effective in reducing the number of pseudo- no change occurs. Here, we are considering sister-chro- genes. Relationships of various haploid models also will matid unequal crossing over as a genetic mechanism be discussed. for copy number change. If the mechanism for copy number change is duplicative transposition, copy num- ber change can occur even inchromosomes with a single MODEL copy. De®nitions of symbols used are summarized in Table 1. Finally, N chromosomes of the next generation are A random-mating haploid population consisting of N sampled from the N chromosomes of the present gener- chromosomes is assumed. Generations are discrete, and ation with probabilities proportional to ®tness of each all chromosomes have two identical genes at generation chromosome. Let k and n t be the numbers of different zero. Each chromosome undergoes mutation, gene du- alleles and all genes in a chromosome, respectively. The plication/deletion, and random sampling with selec- ®tness of the chromosome is zero if k 5 0. Otherwise, we tion, in this order, in one generation. consider the following three ®tness functions, w(k,n t). We call a gene that never has had pseudogene muta- Ohta tion in its descent a live gene. Each live gene mutates 1. Model I ( 1987b), to a new different allele or to a pseudogene with rates Ohta exp(2s dn t)(k$k) u a or u p, respectively. Here, following the usage of w(k, n ) 5 (2) t (1988b), we use the word alleles to denote gene types, exp[2s a(k 2 k) 2 s dn t](k,k), although alleles may not be at the same locus. Each new where k is the average number of different alleles per allele is assumed to be different from any other alleles chromosome in the population, and s and s (s , s . 0) preexisting in the population (Kimura and Crow 1964). a d a d are positive and negative selection coef®cients, respec- Because ®tness of a chromosome is a monotone increas- tively. ing function of the number of different alleles, as de- scribed later, u a is the advantageous mutation rate. Once 2. Model II (Walsh 1995), a gene becomes a pseudogene, it stays a pseudogene.