Genealogy of Neutral Genes and Spreading of Selected Mutations in a Geographically Structured Population
Total Page:16
File Type:pdf, Size:1020Kb
Copyright 0 1991 by the Genetics Society of America Genealogy of Neutral Genes and Spreadingof Selected Mutations ina Geographically Structured Population Naoyuki Takahata National Institute of Genetics, Mishima 41 I, Japan, and Institute of Molecular Evolutionary Genetics, The Pennsylvania State University, University Park, Pennsylvania 16802 Manuscript received February 13, 199 1 Accepted for publication June 19, 199 1 ABSTRACT In a geographically structured population, the interplay among gene migration, genetic drift and natural selection raises intriguing evolutionary problems, but the rigorous mathematical treatment is often very difficult. Therefore several approximate formulas were developed concerning the coales- cence process of neutral genes and the fixation process of selected mutations in an island model, and their accuracy was examined by computer simulation. When migration is limited, the coalescence (or divergence) time for sampled neutral genes can be described by the convolution of exponential functions, as in a panmictic population, but it is determined mainly by migration rate and the number of demes from which the sample is taken. This time can be much longer than that in a panmictic population with the same number of breeding individuals. For a selected mutation, the spreading over theentire population was formulated as abirth and death process, inwhich the fixation probability within a deme plays a key role. With limited amounts of migration, even advantageous mutations take a large number of generations to spread. Furthermore, it is likely that these mutations which are temporarily fixed in some demes may be swamped out again by non-mutant immigrants from other demes unless selection is strong enough. These results are potentially useful for testing quantitatively various hypotheses that have been proposed for the origin of modern human popula- tions. N this paper I attempt toprovide a theoreticalbasis exchangesbetween different populations. Inshort, I for understanding the origin of modern humans the existing hypotheses for the origin of modern H. (Homo sapiens). The study of human paleontology sapiens differ essentially in the role andextent of appears always to revolve around this enigma. Al- migration which might have occurredduring theMid- though avariety of hypotheses have beenput forward dle and UpperPleistocene. The problem thus appears (e.g., see SMITH and SPENCER1987; LEWIN 1988; to be the one that can be quantified by population MELLARSand STRINGER1989), they have one feature genetics. In this paper I shall derive several mathe- in common: based on fossil evidence the first demon- matical formulas which I believe are relevant to the strable migration of Homo erectus from Africa to Eu- problem. rope, Asia and Australia took place 1.O- 1.5 million The model of population structure used in this years ago.What has been extensively debated is paper is WRIGHT’S(193 1) island model, except that whether all living populations had a recent origin in the population consists of a finite number of demes the Late Pleistocene, some hundred thousand years or colonies (MARUYAMA1970a; CROWand MARU- ago, or whether they evolved in many different re- YAMA 1971). In the first part, the ancestral relation- gions from local archaicpopulations of H. erectus. ships of neutral genes at a locus sampled from such a There are two extreme hypotheses, the candelabra structured population is studied. The total coales- and theNoah’s Ark (HOWELLS1976). The candelabra cence time (or the time to an ancestral gene from assumes no migration and parallel evolution of mod- which all in the sample are descended) is of particular ern H. sapiens in several regional localities at the same interest in relation to intrapopulational gene geneal- time. The Noah’s Ark, on the other hand, assumes ogy inferredfrom DNA sequences (e.g., CANN, the complete replacement of populations in the Old STONEKINGand WILSON1987; SATTAand TAKAHATA World by anatomically modern H. sapiens from Af- 1990; HORAI1991; VIGILANTet al. 1991). Recently, rica. There can be many possibilities between the two the study of coalescence in a subdivided population extremes. One such is a modified version of the can- was initiated (TAKAHATA1988) and thegeneral math- delabra, called the multiregional hypothesis (WOL- ematical framework is now available (NOTOHARA POFF, ZHI and THORNE1987; WOLPOFF1989), which 1990). Yet, it appears very difficult to derive explicit allows continuous but presumably infrequentgene solutions except for some special cases. It is therefore (knetics 129: 585-595 (October, 1991) 586 N. Takahata importantto develop appropriate approximation deme 1 deme 2 deme 3 methods. Such an approach, as it turns out,leads to a simple but surprisingly accurate description of the ancestry of neutral genes in a structured population. In the second part, the fixation process of a favor- T2 able mutation is studied. Ofinterest is the probability that a new mutation fixed in one deme will spread 0 E through the whole population and the time this re- F quires. For such genes to be important in modern human evolution, they must spread within a reasona- bly short time period. Since the human population T3 was to some extent structured,it is worth investigating how rapidly fixation can take place in a subdivided population. Although some indirect approachesto this problem was developed by SLATKIN (1981)(see also T" r LANDE1979; SLATKIN1976), the present formulas seem to be in better agreementwith simulation results. 0 I- n =3 n =3 n =2 1 2 3 COALESCENCE OF NEUTRAL GENES FIGURE l.-Coalescence process in a structured population with The population consideredhere consists of L demes limited migration. Horizontal lines crossing thick lines (deme boundaries)i indicate migration events. T,, is the maximum value of each of which has effective size N (WRIGHT 1931; coalescence times for n genes sampled from r (SL) demes without MARUYAMA1970a). There areNL diploid individuals any migration. IImmediately before T,, there are r (= 3) ancestral in total. The per generation migration rateis denoted lineages for the sample. In further tracing back their ancestors, by m, and when emigration occurs from one deme, migration is necessarily involved. If there are j (2 c j C r) genes the L - 1 remaining demesreceive immigrants equally singly represented in demes, two of them must come from the same deme in which they diverged. This waiting time T, is given approx- likely. The average fraction of immigrants in a recip- imately by Equation 2 with r = j. When m (migration rate) is small, ient deme from a donoris m/(L - 1) every generation. the waiting time for a coalescence within a deme can be ignored. Assume that n, genes are sampled from the ith deme, but n, may be 0 for some demes (no samples). The isolated deme is bounded by 4N (KINGMAN1982) and totalnumbers of demes and genes sampled are r the mean waiting time for a migration is I/m (TAKA- (r < L) and n = n,. In this section, two situations, HATA 1988; NOTOHARA1990): the coalescence and low and high migration limits, are treated separately. migration processes are decoupled. Generations are measured backward in time, and Once r singletons for theancestry of sampled genes accordingly evolutionaryevents are so described. are achieved, it takes a long time for them to change Throughout this paper coalescence always refers to their residing demes by gene migration and makes an event at which a pair of sampled genes trace back further coalescences possible. Denote by T, the wait- to the most recent common ancestral gene. ing time at which r singletons change their residence Low migration limit: When migration is limited, it and a pair of r singletons came from the same deme is most likelythat orthologous geneswithin each deme forthe first time (Figure 1). If this happens, the coalesce to or are descended from a common ancestor coalescence of these two lineages is assured in that within the deme. It follows that as time goes back deme, reducing the numberof distinct ancestral line- there must be a time (Tni)at which all genes sampled ages by one. Denote by K, the number of migration from the ith demeare descended from a single ances- events during T,. generations. The value of K, is a tral gene that existedalso in this deme (Figure 1). By random variable and the probability of K, = k (k = 1, definition, T,i = 0 if only one gene is sampled from 2, . .) follows a geometric distribution the ith deme = l), and immediately before Tni (ni P(K, = k) = (1 - u,)~-'u, (1) there was a single lineage. Denote by T, the maximum value of T,* amongthe sampled demes. Then, T, in which a, = (r - 1)/(L - 1). Since a, is the probability generations ago, there were r distinct lineages of all that a pair of genes come from the same deme by a sampled genes, each of which is represented singly in single migration event,K, is geometric with parameter a deme. Such an ancestral lineage is called a singleton a,. As mentioned, the waiting time for migration of a and r specifies the previous coalescence. A key as- gene is exponentially distributed with mean l/m. For sumption is that T, is much shorter than the waiting r genes, the time until the kth migration occurs is time for a migration to occur. In fact, the inequality gammadistributed with mean l/rm (COX 1962; 4N << l/m must holdfor the low migration limit FELLER1970). However, since not all migrations re- (4Nm << l), since the expected coalescence time in an sult in a pair of genes (a doublet), it is necessary to Spreading Mutants in an Island Model 587 Coalescence (4 singletons) 2(r - 2) + (r - 2)(L - r + 1) rM Pd = {r(L - 1) r(L - 1) }G - (r - 2)(L - r + 3)M (L - 1)(1 + rM) The last possibility occurs when there is one deme in which three genes reside or when there are two demes each of which contains two genes.