3 Effective Population Size

36 3 EFFECTIVE POPULATION SIZE 3 Effective population size In the first two chapters we have dealt with idealized populations. The two main assump- tions were that the population has a constant size and the population mates panmictically. These ideal populations are good to start with because they allow us to derive some impor- tant results. However, natural populations are usually not panmictic and the population size may not be constant over time. Nevertheless, we can often still use the theory that we have developed. The trick is that we describe a natural population as if it is an ideal one by adjusting some parameters, in this case the population size. This is the idea of the effective population size which is the topic of this section. Human Population Size Example As an example, we will analyse a dataset from Hammer et al. (2004). The dataset, which may be found in the file TNFSF5.nex, contains data from different human populations: Africans, Europeans, Asians and Native Americans. It contains 41 sequences from 41 males, from one locus, the TNFSF5 locus. TNFSF5 is a gene and the sequences are from the introns of this gene, so there should be no constraint on these sequences, in other words every mutation should be neutral. The gene lies on the X-chromosome in a region with high recombination. What that means for the data will become clearer later in the course. Exercise 3.1. Import the data in DNASP and determine θbπ per site and θbS per site using all 41 sequences. 2 As you have seen in Section 2, both θbπ and θbS are estimators of the model parameter θ = 4Nµ where µ is the probability that a site mutates in one generation. However, the TNFSF5 locus is on the X-chromosome and for the X-chromosome males are haploid. Therefore the population of X-chromosomes can be seen as a population of 1.5N haploids (instead of 2N haploids for autosomes) and therefore in this case θbπ and θbS are estimators of 3Nµ. The reason that Hammer et al. (2004) looked at X-chromosomes is mainly because the sequencing is relatively easy. Males have only one X-chromosome, so you don’t have to worry about polymorphism within one individual (more about polymorphism within an individual in Section 4). The mutations in these data are single nucleotide polymorphisms. SNPs are frequently used to determine θbπ and θbS per site. Their (per site) mutation rate is estimated to be µ = 2 · 10−8 by comparing human and chimpanzee sequences. Exercise 3.2. Recall Section 1. Assume that the divergence time of chimpanzees and humans is 10MY with a generation time of 20 years and the mutation rate is 2 · 10−8 per nucleotide per generation.. 1. What is the expected percentage of sites that are different (or what is the divergence) between human and chimp? 3.1 The concept 37 2. Both θbπ and θbS are estimators for 3Nµ and both can be directly computed from the data. What estimate of N do you get, when using the estimated θb values from the last exercise? 3. There are about 6 · 109 people on earth. Does the human population mate panmictically? Is the population constant over time? Can you explain why your estimate of N is so different from 6 · 109? 2 The number of 6 · 109 people on earth is referred to as the census population size. This section is about a different notion of population size which is called the effective population size. 3.1 The concept Before we start with calculations using effective population sizes we introduce what they are. We use the following philosophy: Let • be some measurable quantity that relates to the strength of genetic drift in a population. This can be e.g. the rate of loss of heterozygosity or the probability of identity by descent. Assume that this quantity has been measured in a natural population. Then the effective size Ne of this population is the size of an ideal (neutral panmictic constant-size equilibrium) Wright-Fisher population that gives rise to the same value of the measured quantity •. To be specific, we call Ne the •-effective population size. In other words, the effective size of a natural population is the size of the ideal population such that some key measure of genetic drift is identical. With an appropriate choice of this measure we can then use a model based on the ideal population to make predic- tions about the natural one. Although a large number of different concepts for an effective population size exist, there are two that are most widely used. The identity-by-descent (or inbreeding) effective population size One of the most basic consequences of a finite population size - and thus of genetic drift - is that there is a finite probability for two randomly picked individuals in the offspring generation to have a common ancestor in the parent generation. This is the probability of identity by descent, which translates into the single-generation coalescence probability of two lines pc,1 in the context of the coalescent. For the ideal Wright-Fisher model with 2N (haploid) individuals, we have pc,1 = 1/2N. Knowing pc,1 in a natural population, we can thus define the identity-by-descent effective population size (i) 1 Ne = . (3.1) 2pc,1 38 3 EFFECTIVE POPULATION SIZE We will see in the next chapter that the degree of inbreeding is one of the factors that (i) (i) influences Ne . For historic reasons, Ne is therefore usually referred to as inbreeding effective population size. Since all coalescent times are proportional to the inverse coalescent probability, they are directly proportional to the inbreeding effective size. One also says (i) that Ne fixes the coalescent time scale. The variance effective population size Another key aspect about genetic drift is that it leads to random variations in the allele frequencies among generations. Assume that p is the frequency of an allele A in an ideal Wright-Fisher population of size 2N. In Section 2, we have seen that the number of A alleles in the next generation, 2Np0, is binomially distributed with parameters 2N and p, and therefore 1 p(1 − p) Var [p0] = Var[2Np0] = . WF (2N)2 2N For a natural population where the variance in allele frequencies among generations is known, we can therefore define the variance effective population size as follows p(1 − p) N (v) = . (3.2) e 2Var[p0] As we will see below, the inbreeding and variance effective sizes are often identical or at least very similar. However, there are exceptions and then the correct choice of an effective size depends on the context and the questions asked. Finally, there are also scenarios (e.g. changes in population size over large time scales) where no type of effective size is satisfactory. We then need to abandon the most simple ideal models and take these complications explicitly into account. Loss of heterozygosity As an application of the effective-population-size concept, let us study the loss of heterozygosity in a population. Heterozygosity H can be defined as the probability that two alleles, taken at random from a population are different at a random site (or locus). Suppose that the heterozygosity in a natural population in generation 0 is H0. We can ask, what is the expected heterozygosity in generation t = 1, 2, 3, if we assume no new mutation (i.e. we only consider the variation that is already present in generation 0). In particular, for t = 1, we find 1 1 1 H1 = (i) 0 + 1 − (i) H0 = 1 − (i) H0. (3.3) 2Ne 2Ne 2Ne Indeed, if we take two random alleles from the population in generation 1, the probability 1 that they have the same parent in generation 0 is (i) . When this is the case they have 2Ne probability 0 to be different at any site because we ignore new mutations. With probability 3.2 Examples 39 1 1 − (i) they have different parents in generation 0 and these parents have (by definition) 2Ne probability H0 to be different. By iterating this argument, we obtain 1 t Ht = 1 − (i) · H0 2Ne for the heterozygosity at time t. This means that, in the absence of mutation, heterozy- 1 gosity is lost at rate (i) every generation and depends only on the inbreeding effective 2Ne population size. Estimating the effective population size For the Wright-Fisher model, we have seen in Section 2 that the expected number of segregating sites S in a sample is proportional to the mutation rate and the total expected length of the coalescent tree, E[S] = µ[L]. The tree-length L, in turn, is a simple function (i) of the coalescent times, and thus of the inbreeding effective population size Ne . Under (i) the assumption of (1) the infinite sites model (no double hits), (2) a constant Ne over the generations (constant coalescent probability), and (3) a homogeneous population (equal calescent probability for all pairs) we can therefore estimate the effective population size from polymorphism data if we have independent knowledge about the mutation rate (e.g. (i) from divergence data). In particular, for a sample of size 2, we have E[S2] = 4Ne µ and thus E[S ] N (i) = 2 . e 4µ In a sample of size n, we can estimate the expected number of pairwise differences to be (i) Eb[S2] = θbπ (see (2.6)) and obtain the estimator of Ne from polymorphism data as θbπ Nb(i) = . e 4µ A similar estimate can be obtained from Watterson’s estimator θbS, see Eq.

3 Effective Population Size

Population Size and the Rate of Evolution

Transformations of Lamarckism Vienna Series in Theoretical Biology Gerd B

An Unusually Low Microsatellite Mutation Rate in Dictyostelium Discoideum,An Organism with Unusually Abundant Microsatellites

Adaptive Tuning of Mutation Rates Allows Fast Response to Lethal Stress In

Estimating Effective Population Size Or Mutation Rate Using the Frequencies of Mutations of Various Classes in a Sample of DNA Sequences

The Mutation Rate and Cancer (Tumorigenesis͞evolution͞selection͞mutator Phenotype͞genomic Instability)

The Wayward Dog: Is the Australian Native Dog Or Dingo a Distinct Species?

The Genetic Basis of Mutation Rate Variation in Yeast

Molecular Lamarckism: on the Evolution of Human Intelligence

Dancing with DNA and Flirting with the Ghost of Lamarck

Understanding the Introduction of Genetic Variations by Random

Heterozygous Colon Cancer-Associated Mutations of SAMHD1 Have Functional Significance