Pseudosampling Method

Proc. Nat. Acad. Sci. USA Vol. 80, pp. 1048-1052, February 1983 Genetics Selective constraint in protein polymorphism: Study of the effectively neutral mutation model by using an improved pseudosampling method (population genetics/neutral theory of molecular evolution/polyallelic random drift) MOTOo KIMURA AND NAOYUKI TAKAHATA National Institute of Genetics, Mishima, 411 Japan Contributed by Motoo Kimura, September 30, 1982 ABSTRACT To investigate the pattern of allelic distribution tiallelic frequencies when mutation, selection, and random drift in enzyme polymorphismwith special reference to the relation- balance each other (see ref. 14, p. 394, for a review). Later, ship between the mean (H) and the variance (VH) of heterozy- Watterson (15) and Li (16) developed a method for calculating gosity, we used the model of effectively neutral mutations involv- moments of allele frequencies from Wright's formula. Theo- ing multiple alleles in which selective disadvantage of mutant retically, the formula and the method could be applied to treat alleles follows a r distribution. A simulation method was devel- the present model. It turned out, however, that when the pos- oped that enables us to study efficiently the process of random sible number of selectively different alleles involved is large or drift in a multiallelic genetic system and that saves a great deal the product of the effective population size and selection coef- ofcomputer time. Itis an improved version ofthe pseudosampling- ficient is large, or both are large, the application ofthis method variable (PSV) method [Kimura, M. (1980) Proc. NatL Acad. Sci. is very difficult. USA 77, 522-526] previously used to simulate random drift in a Here, we intend to present a method that can simulate the diallelic system. This method will be useful for simulating many process ofpolyallelic random drift very efficiently. This method models of population genetics that involve behavior of multiple (17) and alleles in a finite population. By using this method, it was shown is an improved version ofthe pseudosampling method that, as compared with the model ofstrictly neutral mutations, the it contains a device (to be called a "telescoping" method) ofsam- present model gives the reduction ofboth H and VH and an excess pling multiple alleles. As compared with other methods (18- of rare variant alleles. The results were discussed in the light of 20), it can treat a multiallelic system much more easily. Fur- recent observations on protein polymorphism with special refer- thermore, it has the merit of enabling us to incorporate a suit- ence to the functional constraint of proteins involved. able adjustment for rare alleles easily (see also ref. 21), which makes the simulation valid over all the combination of param- From the standpoint of the neutral mutation-random drift hy- eter values. This adjustment is particularly important when we pothesis (1) (or the neutral theory, for short; see refs. 2 and 3 try to obtain a satisfactory distribution for a whole domain of for review), protein polymorphism is a transient phase of mo- allele frequencies. lecular evolution (4); therefore, it is expected to be influenced by selective constraint in a similar way as in molecular evolution Telescoping method of sampling multiple alleles (5). In other words, the stronger the functional and structural constraints to which a protein is subject, the smaller the prob- The essential idea underlying the PSV method is that we sim- ability of an amino acid change by mutation being selectively ulate the diffusion model of the process of gene frequency neutral (i.e., not harmful) and, therefore, the lower the het- change by generating a sequence ofsome simple random num- erozygosity at its locus. bers, rather than faithfully following binomial or multinomial The observation that loci for substrate-specific enzymes are process by drawing, in each generation, individual gametes on the average less heterozygous than those for substrate-non- from a finite population. For example, in the simplest case of specific enzymes (6) is consistent with this expectation. Fur- two neutral alleles in a diploid population ofsize N, we generate thermore, Yamazaki (7) and more recently Gojobori (8) showed in each generation a random number from the uniform distri- that substrate specific enzymes not only have lower mean het- bution having the mean 0 and the variance x(l - x)/(2N), where erozygosity but also have smaller variance of heterozygosity x is the frequency ofone ofthe alleles at a given moment. Then, among species than expected from the standard neutral infinite this random number is added to x to form the allelic frequency allele model (9). in the next generation. This saves computer time enormously In this paper, we intend to show that these observations can and allows us to simulate easily a process of change in a very be explained by the model ofeffectively neutral mutations (10) large population. Also, we can make many replicate trials with- in which selective constraint (negative selection) is incorpo- out prohibitive computing time. Originally, the PSV method rated. This model is based on the idea that selective neutrality was intended to simulate the diffusion process itselfrather than is the limit when the selective disadvantage becomes indefi- the discrete binomial sampling process ("Fisher-Wright model") nitely small (5) and is an extention ofOhta's model (11) in which for which the diffusion model is usually regarded as an approx- the selection coefficients against the mutants follow an expo- imation (17). nential distribution. These models are ultimately traced to In the present paper, we shall be concerned with a multiallelic system and assume that there are K possible allelic states Ohta's hypothesis (12, 13) that very slightly deleterious muta- of ef- tions as well as the strictly neutral mutations play an important (K > 2) at a locus in a random-mating diploid population fective size Ne. Let Ak be the kth allele (k = 1, 2, .. ., K), whose role in molecular evolution and polymorphism. We first sam- One problem in our approach is that rigorous mathematical frequency in the population before sampling is Xk. study of the model is rather difficult. Previously, Wright de- rived a general formula for the equilibrium distribution ofmul- Abbreviation: PSV, pseudosampling variable. 1048 Genetics:Genetics:KimuraKimuraandandTakahataTakahata~~~~Proc.Nati. Acad. Sci. USA 80 (1983) 1049 pie allele Al so that its number follows binomial distribution we obtain Yk' by if is small, B(illn, x1 =()x~"1 -i [1] Yk' =- 7lk/nk nkYk or [6b] where n = 2Ne is the total number of gametes that contributes Yk' = 1 - ~k/nk if ~k(I - y~) is small, *to the next generation and il is the number ofgametes that contains allele Al after sampling. Actually, there is no need for the where ilk and 4k are Poisson random numbers with the mean distribution to follow faithfully the binomial distribution, but nk Yk and lk(1 - yk), respectively (see also ref. 21). we simply generate a uniform random number, U1 with mean The approximations given either by Eqs. 6a or 6b are satis- 4)~ and variance unity, and substitute the- frequency of Al after factory when nk is reasonably large, say larger than 20. If, on sampling (corresponding to il/n) by the other hand, nk 45 20, we must resort to a more exact procedure, and in this paper, we adopt the following, method: we Xr= x1 + U1 V{[xi(l - xi)]/n}. [2] generate nk random variates that follow the uniform distribution in count number of such variates that we is is the range (0, 1), and the One caution need to take here that when Al represented is a binomial a small number of individuals so that is less than happen to fall in the range (0, yb). This number, ik, by very nx, 3, random so we let say, then a more exact sampling procedure must be used (see number, below). A similar caution is also needed when x1 is near unity, Yi' = iknk (if nk -< 20), [6c] so that n(l - x1) is small, say less than 3. Next, we sample A2 so that the number of gametes (i2) containing this allele follows from which we obtain Xk' by using Eq. 5b. the distribution B0i21n, Y2), where n2 = n - il is the remainder The advantage ofthe telescoping method in saving computer of the gametes after AI is removed, and Y2 = X2/(1 - xj). Like- time comes from the fact that we can obtain each of the Yk'S wise, A3 is sampled so that the number ofgametes (i)containing (and therefore XkI, the frequency of Ak, by using Eq. 5b) by this allele follows the distribution B(iajn3, Y3), where n3 = n generating only one random number when nk is larger than 20. - il - i2 and y3 = X3/(1 - XI- X2). A similar procedure is re- In addition, the accuracy and reliability are ensured by incor- peated until all K - 1 alleles are sampled. In general, if ik (k porating -adjustments 6b and 6c for treating rare variant alleles. - 1, 2,..,K - 1) is the number ofAk-bearing gametes in the These adjustments are particularly pertinent ifwe note that, for sample, this number follows Yk =:0 or 1, application ofEq. 6a frequently produces Yk' values either ~negative or larger than 1, thus creating a possibility of serious bias in a simulation experiment. Such adjustments were not in the treatments Kimura and Ma- where = Xl- incorporated by. (17) by nk n-il-i2...--ikl1andYk Xk/(1- X2 ruyama and Takahata (22). Later, the problem was treated in . .. = n and = for k =1). Then, it can -Xk-1). (Note: nj yi x, a heuristic manner by Maruyama and Nei (19) and Takahata (20). be shown that Their heuristic treatment is concerned with allele frequency E1ik = n~,[4] changes due to mutation; thus, in the absence ofmutation, their simulation methods ofrandom drift are not satisfactory for treat- Varlik} = flXk(l - X) ing the changes of rare alleles.

Pseudosampling Method

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support