A Fundamental Relationship Between Genotype Frequencies and Fitnesses
Total Page:16
File Type:pdf, Size:1020Kb
Copyright Ó 2008 by the Genetics Society of America DOI: 10.1534/genetics.108.093518 A Fundamental Relationship Between Genotype Frequencies and Fitnesses Joseph Lachance1 Graduate Program in Genetics, Department of Ecology and Evolution, State University of New York, Stony Brook, New York 11794-5222 Manuscript received July 3, 2008 Accepted for publication August 7, 2008 ABSTRACT The set of possible postselection genotype frequencies in an infinite, randomly mating population is found. Geometric mean heterozygote frequency divided by geometric mean homozygote frequency equals two times the geometric mean heterozygote fitness divided by geometric mean homozygote fitness. The ratio of genotype frequencies provides a measure of genetic variation that is independent of allele frequencies. When this ratio does not equal two, either selection or population structure is present. Within-population HapMap data show population-specific patterns, while pooled data show an excess of homozygotes. HAT patterns of genetic variation are possible within the set of possible postselection genotype frequencies is W a population, and how does natural selection affect derived. Much like how the Hardy–Weinberg principle these patterns? R. A. Fisher remarked ‘‘it is often conve- describes population genetic states in the absence of nient to consider a natural population not so much as an selection, this novel equation describes population genetic aggregate of living individuals but as an aggregate of gene states in the presence of selection. In the context of ratios’’ (Fisher 1953, p. 515). This mathematical abstrac- genotype-frequency space, this is a multidimensional tion allows key questions in evolutionary genetics to be surface, the curvature of which is influenced by natural addressed. A population of diploid individuals can be selection (Figure 1). Evolution involves adaptive walks characterized by a set of genotype frequencies (PAA, PAB, toward regions of high mean fitness on this surface PBB, etc.). This population genetic state is represented by a (Wright 1932; Ewens 1989; Edwards 2000). The set of point in genotype frequency space, where each dimension possible genotype frequencies is analogous to the ecolo- corresponds to the frequency of a particular genotype. As gical concept of a fundamental niche (Hutchinson genotype frequencies change over time, evolving popula- 1957) and the Ramachandran diagrams of biochem- tions explore genotype frequency space (Rice 2004). istry (Ramachandran et al. 1963). The former describes However, not every possibility can be realized. Pop- the full range of environmental conditions under which ulations are constrained to a restricted set of genotype an organism can exist, while the latter describes the frequencies. Trivially, genotype frequencies must sum to possible conformations of dihedral angles for a poly- one. Mendelian segregation and patterns of mating peptide. In each case, valid regions of parameter space further restrict the set of possible genotype frequencies. are described. For example, in a randomly mating population it is unlikely that every individual will be the same heterozy- gous genotype. Natural selection also influences pat- MODEL terns of genetic variation, as high-fitness genotypes are A standard single-locus model of theoretical popula- found at higher frequencies than neutral expectations. tion genetics is considered (diploidy, autosomal inher- What genotype frequencies can one expect to find, and itance, random mating, and infinite population size). how does genotype-specific fitness influence this? Any Fitnesses are assumed to be constant and frequency equation summarizing the set of all possible population independent. If there are n segregating alleles at a single genetic states must contain frequency and fitness terms locus, nðn 1 1Þ=2 different genotypes are possible, of for every genotype. Subsequently, genotype frequency which n are homozygous and nðn À 1Þ=2 are heterozy- data can be used to infer a ratio of genotypic fitnesses. gous. Thus, genotype-frequency space spans nðn 1 1Þ=2 While mathematical descriptions exist for loci with two dimensions. Under random mating, each point in allele- annings dwards segregating alleles (C and E 1968), frequency space maps to a single point in genotype- such formulations are lacking for arbitrary numbers of frequency space. Consequently, the surface of possible segregating alleles. Here, a general equation describing genotype frequencies is n À 1 dimensional. The recur- sion equations of classical population genetics give 1 Author e-mail: [email protected] genotype frequency in the present generation (Pij)asa Genetics 180: 1087–1093 (October 2008) 1088 J. Lachance Figure 1.—De Finetti diagrams describing the set of possible geno- type frequenciesfortwo segregating alleles. The solid line represents ge- notype frequencies that satisfy the * * * * equation Pij wii ¼ 2Pii wij .Sta- ble equilibria are solid circles, and unstable equilibria are open circles. Hardy–Weinberg proportions are denoted by a dashed line. (A) Neu- trality (F ¼ 2). (B) Overdominance (F . 2). (C) Underdominance (F , 2). (D) Directional selection of a dominant advantageous allele (F . 2). (E) Directional selection with multiplicative dominance (F . 2). (F) Directional selection of a recessive advantageous allele (F , 2). Q Q function of genotype fitness (w ) and allele frequencies n n n nðnÀ1Þ ij ð Pij Þ p ðnðnðnÀ1Þ=2ÞÞ Qi¼1; j.i i¼1 i : in the past generation (pi). 2 n n ¼ ðnðnðnÀ1Þ=2ÞÞ ð4Þ ð wij Þ w Derivation of genotypic ratio: Subsequent to mating, i¼1; j.i but prior to selection, genotype frequencies are found Note that the right-hand sides of Equations 2 and 4 are in Hardy–Weinberg proportions. Postselection homo- identical. Further algebraic manipulation and the 2 = zygote frequencies are equal to Pii ¼ pi wii w while transitive property of equality (where A ¼ B and B ¼ C postselection heterozygote frequencies are equal to imply A ¼ C) allow a single equation containing every ice Pij ¼ 2pipj wij =w (R 2004). Mean fitness (w) equals genotypic term to be derived: the weighted sum of all genotype fitnesses. It is useful to Q Q n n n n algebraically manipulate these recursion equations so ð Pij Þ ð wij Þ Q i¼1; j.i ðnðnðnÀ1Þ=2ÞÞ Q i¼1; j.i : that a ratio of genotype frequency to genotype fitness is n nðnÀ1Þ=2 ¼ 2 n nðnÀ1Þ=2 ð5Þ ð PiiÞ ð wiiÞ on the left-hand side and a ratio of allele frequencies to i¼1 i¼1 mean fitness is on the right-hand side. Subsequently, Since every term in the above equation is positive, Equa- terms for multiple genotypes can be multiplied. tion 5 can be simplified by taking the nðnðn À 1Þ=2Þth A natural division of genotypes involves homozygotes root of both sides of the equation. This root is the prod- and heterozygotes. Every allele has a corresponding uct of the number of homozygote and heterozygote homozygous genotype, and the product of all homozy- states: gote ratios is qQffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qQffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðnðnÀ1Þ=2Þ n ðnðnÀ1Þ=2Þ n Q Q i¼1; j.i Pij i¼1; j.i wij n P n p2 pQffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2 pQffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð6Þ Qi¼1 ii i¼1 i n n n n n ¼ n : ð1Þ i¼1 Pii i¼1 wii i¼1 wii w Note that the geometric mean of n numbers is the nth Since all terms in the above equation are positive, each root of their product. In the absence of assortative mating, side of Equation 1 can be raised to the (n(n À 1)/2)th patterns of genetic variation reduce to a surprisingly power: elementary equation. The geometric mean heterozy- Q Q gote frequency divided by the geometric mean homo- ð n P ÞnðnÀ1Þ=2 n pnðnÀ1Þ Qi¼1 ii i¼1 i : zygote frequency equals two times the geometric mean n nðnÀ1Þ=2 ¼ ðnðnðnÀ1Þ=2ÞÞ ð2Þ ð i¼1 wiiÞ w heterozygote fitness divided by the geometric mean homozygote fitness. Denoting geometric means with Every allele also can be found in heterozygous geno- asterisks, types, and the product of all heterozygote ratios is Q Q P * w* n n nÀ1 ij ij Pij p ¼ 2 : ð7Þ Qi¼1; j.i ðnðnÀ1Þ=2Þ i¼1 i : * * n ¼ 2 nðnÀ1Þ=2 ð3Þ Pii wii i¼1; j.i wij w Moving the constant term to the left-hand side and Description of the genotypic ratio: The above geno- raising every term of Equation 3 to the nth power, typic ratio equation is marked by multiple axes of Genotypic Ratio 1089 TABLE 1 MATLAB simulations confirm analytic theory Selection Alleles Population size Expected F Observed F Observed f Overdominant 2 100,000 2.2 2.1981 (2.1 3 10À4) À0.0472 (1.1 3 10À5) Underdominant 2 100,000 1.8 1.8011 (1.5 3 10À4) 0.0508 (2.1 3 10À5) Neutral 2 100,000 2 2.0005 (1.4 3 10À4) À0.0001 (8.9 3 10À6) Stochastic fitness 2 100,000 .2 2.0318 (6.9 3 10À2) À0.0037 (4.3 3 10À3) Stochastic fitness 3 100,000 .2 2.0010 (2.8 3 10À2) 0.0013 (7.8 3 10À4) Stochastic fitness 4 100,000 ,2 1.9898 (1.5 3 10À2) 0.0016 (3.0 3 10À4) Directional 2 1,000 2.0976 2.1639 (6.8 3 10À2) À0.0195 (8.9 3 10À4) Directional 2 10,000 2.0976 2.0989 (6.3 3 10À3) À0.0149 (1.3 3 10À4) Directional 2 100,000 2.0976 2.1013 (6.2 3 10À4) À0.0156 (2.7 3 10À5) Directional 3 100,000 2.0646 2.0681 (1.3 3 10À3) À0.0115 (1.2 3 10À5) Directional 4 100,000 2.0482 2.0467 (2.6 3 10À3) À0.0097 (9.4 3 10À6) Simulations were run for 100 generations and mean and variance of F were computed (with variance in observed F within pa- rentheses).