Population Genetics
Joe Felsenstein
GENOME 453, Winter 2004
Population Genetics – p.1/47 Godfrey Harold Hardy (1877-1947) Wilhelm Weinberg (1862-1937)
Population Genetics – p.2/47 A Hardy-Weinberg calculation
5 AA 2 Aa 3 aa 0.50 0.20 0.30
Population Genetics – p.3/47 A Hardy-Weinberg calculation
5 AA 2 Aa 3 aa 0.50 0.20 0.30
0.50 + (1/2) 0.20 (1/2) 0.20 + 0.30
Population Genetics – p.4/47 A Hardy-Weinberg calculation
5 AA 2 Aa 3 aa 0.50 0.20 0.30
0.50 + (1/2) 0.20 (1/2) 0.20 + 0.30 0.6 A 0.4 a
0.6 A
0.4 a
Population Genetics – p.5/47 A Hardy-Weinberg calculation
5 AA 2 Aa 3 aa 0.50 0.20 0.30
0.50 + (1/2) 0.20 (1/2) 0.20 + 0.30 0.6 A 0.4 a
0.6 A 0.36 AA 0.24 Aa
0.4 a 0.24 Aa 0.16 aa
Population Genetics – p.6/47 A Hardy-Weinberg calculation
5 AA 2 Aa 3 aa 0.50 0.20 0.30
0.50 + (1/2) 0.20 (1/2) 0.20 + 0.30 0.6 A 0.4 a
0.6 A 0.36 AA 0.24 Aa Result: 0.36 AA 0.6 A 1/2 0.48 Aa 1/2 0.4 a 0.4 a 0.24 Aa 0.16 aa 0.16 aa
Population Genetics – p.7/47 Calculating the gene frequency (two ways)
Suppose that we have 200 individuals: 83 AA, 62 Aa, 55 aa
Method 1. Calculate what fraction of gametes bear A:
Genotype Number Genotype frequency Fraction of gametes all 83 0.415 AA 0.57 A 1/2 Aa 62 0.31 1/2 0.43 a
aa 55 0.275 all
Population Genetics – p.8/47 Calculating the gene frequency (two ways)
Suppose that we have 200 individuals: 83 AA, 62 Aa, 55 aa
Method 2. Calculate what fraction of genes in the parents are A:
Genotype Number A's a's 83 166 0 228 AA = 0.57 A 400 Aa 62 62 62 172 = 0.43 a 400 aa 55 0 110
228+ 172 = 400
Population Genetics – p.9/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
Population Genetics – p.10/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
Population Genetics – p.11/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
Population Genetics – p.12/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
Population Genetics – p.13/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
gametes
...
Population Genetics – p.14/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
gametes
...
Population Genetics – p.15/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
gametes
...
Population Genetics – p.16/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
gametes
...
Population Genetics – p.17/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
gametes
...
Population Genetics – p.18/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
gametes
...
Population Genetics – p.19/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
gametes
...
Population Genetics – p.20/47 The process of natural selection at one locus
genotypes are lethal in this case
gametes
zygotes
gametes
zygotes
gametes
...
Population Genetics – p.21/47 A numerical example of natural selection
Genotypes: AA Aa aa relative 1 1 0.7 (assume these are viabilities) fitnesses: Initial gene frequency of A = 0.2 Initial genotype frequencies (from Hardy−Weinberg) (newborns) 0.04 0.32 0.64
Population Genetics – p.22/47 A numerical example of natural selection
Genotypes: AA Aa aa relative 1 1 0.7 (assume these are viabilities) fitnesses: Initial gene frequency of A = 0.2 Initial genotype frequencies (from Hardy−Weinberg) (newborns) 0.04 0.32 0.64 x 1 x 1 x 0.7 Survivors (these are relative viabilities) 0.04 + 0.32 + 0.448 = Total: 0.808
Population Genetics – p.23/47 A numerical example of natural selection
Genotypes: AA Aa aa relative 1 1 0.7 (assume these are viabilities) fitnesses: Initial gene frequency of A = 0.2 Initial genotype frequencies (from Hardy−Weinberg) (newborns) 0.04 0.32 0.64 x 1 x 1 x 0.7 Survivors (these are relative viabilities) 0.04 + 0.32 + 0.448 = Total: 0.808
genotype frequencies among the survivors: (divide by the total) 0.0495 0.396 0.554
Population Genetics – p.24/47 A numerical example of natural selection
Genotypes: AA Aa aa relative 1 1 0.7 (assume these are viabilities) fitnesses: Initial gene frequency of A = 0.2 Initial genotype frequencies (from Hardy−Weinberg) (newborns) 0.04 0.32 0.64 x 1 x 1 x 0.7 Survivors (these are relative viabilities) 0.04 + 0.32 + 0.448 = Total: 0.808
genotype frequencies among the survivors: (divide by the total) 0.0495 0.396 0.554 gene frequency A: 0.0495 + 0.5 x 0.396 = 0.2475 a: 0.554 + 0.5 x 0.396 = 0.7525
Population Genetics – p.25/47 A numerical example of natural selection
Genotypes: AA Aa aa relative 1 1 0.7 (assume these are viabilities) fitnesses: Initial gene frequency of A = 0.2 Initial genotype frequencies (from Hardy−Weinberg) (newborns) 0.04 0.32 0.64 x 1 x 1 x 0.7 Survivors (these are relative viabilities) 0.04 + 0.32 + 0.448 = Total: 0.808
genotype frequencies among the survivors: (divide by the total) 0.0495 0.396 0.554 gene frequency A: 0.0495 + 0.5 x 0.396 = 0.2475 a: 0.554 + 0.5 x 0.396 = 0.7525 genotype frequencies: (among newborns) 0.0613 0.3725 0.5663 Population Genetics – p.26/47 The algebra of natural selection
New gene frequency is then (adding up A bearers and dividing by everybody) Genotype: AA Aa aa 2 p w + (1/2) 2pq w 2 2 AA Aa Frequency: p 2pq q p' = 2 2 p w + 2pq w + q w Relative fitnesses: w w w Aa AA Aa aa AA aa 2 2 mean fitness of A After selection: p w 2pq w q w AA Aa aa p ( p w + q w ) w AA Aa A p' = = p 2 2 w p w + 2pq w + q w Note that these don't add up to 1 AA Aa aa
mean fitness of everybody
Population Genetics – p.27/47 Is weak selection effective?
Suppose (relative) fitnesses are: So in this example each change of AA Aa aa a to A multiplies the fitness 1 2 by (1+s), so that it increases it
(1+s) 1+s 1 by a fraction s. A x (1+s) x (1+s)
The time for gene frequency change, in generations, turns out to be: change of gene frequencies 0.5 s 0.01 − 0.1 0.1 − 0.5 0.5 − 0.9 0.9 − 0.99 1 3.46 3.17 3.17 3.46 0.1 25.16 23.05 23.05 25.16
0.01 240.99 220.82 220.82 240.99 of frequency gene 0 0.001 2399.09 2198.02 2198.02 2399.09 generations
Population Genetics – p.28/47 An experimental selection curve
Population Genetics – p.29/47 Rare alleles occur mostly in heterozygotes
This shows a population in Hardy−Weinberg equilibrium at gene frequencies of 0.9 A : 0.1 a
Genotype frequencies: 0.81 AA : 0.18 Aa : 0.01 aa Note that of the 20 copies of a, 18 of them, or 18 / 20 = 0.9 of them are in Aa genotypes
Population Genetics – p.30/47 Overdominance and polymorphism
AA Aa aa 1 − s 1 1 − t
when A is rare, most A's are in Aa, and most a's are in aa
The average fitness of A−bearing genotypes is then nearly 1
The average fitness of a−bearing genotypes is then nearly 1−t
So A will increase in frequency when rare
when a is rare, most a's are in Aa, and most A's are in AA
The average fitness of a−bearing genotypes is then nearly 1
The average fitness of A−bearing genotypes is then nearly 1−s
So a will increase in frequency when rare
gene frequency of A 0 1 Population Genetics – p.31/47 Underdominance and unstable equilibrium
AA Aa aa 1+s 1 1+t
when A is rare, most A's are in Aa, and most a's are in aa
The average fitness of A−bearing genotypes is then nearly 1
The average fitness of a−bearing genotypes is then nearly 1+t
So A will decrease in frequency when rare
when a is rare, most a's are in Aa, and most A's are in AA
The average fitness of a−bearing genotypes is then nearly 1
The average fitness of A−bearing genotypes is then nearly 1+s
So a will decrease in frequency when rare
gene frequency of A 0 1 Population Genetics – p.32/47 Fitness surfaces (adaptive landscapes)
Overdominance Underdominance stable equilibrium
__ __ w w unstable equilibrium
(gene frequency changes) (gene frequency changes)
0 1 0 1 p p Is all for the best in this best of all possible worlds?
Can you explain the underdominance result in terms of rare alleles being mostly in heterozygotes?
Population Genetics – p.33/47 Genetic drift 1 Gene frequency
0 0 1 2 3 4 5 6 7 8 9 10 11
Time (generations) Population Genetics – p.34/47 Distribution of gene frequencies with drift
0 1
0 1 time
0 1
0 1
0 1 Note that although the individual populations wander their average hardly moves (not at all when we have infinitely many populations)
Population Genetics – p.35/47 A cline (name by Julian Huxley)
no migration 1 some
more gene frequency
0
geographic position
Population Genetics – p.36/47 A famous common-garden experiment
Clausen, Keck and Hiesey’s (1949) common-garden experiment in Achillea lanulosa
Population Genetics – p.37/47 Heavy metal
Population Genetics – p.38/47 House sparrows
Population Genetics – p.39/47 House sparrows
Population Genetics – p.40/47 Mutation Rates Coat color mutants in mice. From Schlager G. and M. M. Dickie. 1967. Spontaneous mutations and mutation rates in the house mouse. Genetics 57: 319-330
Locus Gametes tested No. of Mutations Rate
Nonagouti 67,395 3 4.4 × 10−6 Brown 919,619 3 3.3 × 10−6 Albino 150,391 5 33.2 × 10−6 Dilute 839,447 10 11.9 × 10−6 Leaden 243,444 4 16.4 × 10−6 ——- — ————- Total 2,220,376 25 11.2 × 10−6
Population Genetics – p.41/47 Mutation rates in humans
Population Genetics – p.42/47 Forward vs. back mutations Why mutants inactivating a functional gene will be more frequent than back mutations The gene
12 places can mutate to nonfunctionality
only one place can mutate back to function
function can sometimes be restored by a "second site" mutation, too
Population Genetics – p.43/47 A sequence space
For sequences of length 1000, there are 3 X 1000 = 3000 "neighbors" one step away in sequence space 1000 602 But there are 4 sequences, which is about 10 in all ! No two of them are more than 1000 steps apart. Hard to draw such a space
How do we ever evolve? Woiuldn't it be impossible to find one of the tiny fraction of possible sequences that would be even marginally functional? The answer seems to be that the sequences are clustered An example of such clustering is the English language, as illustrated by a popular word game:
W O R D But the word BCGH W O R E cannot be made into G O R E an English word G O N E G E N E
There are also only a tiny fraction of all 456,976 four−letter words that are English words But they are clustered, so that it is possible to "evolve" from one to another through intermediates
Population Genetics – p.44/47 Mutation as an evolutionary force
−6 If we have two alleles A and a, and mutation rate from A to a is 10 and mutation rate back is the same,
1
0.5
0 0 1 million generations
Mutation is critical in introducing new alleles but is very slow in changing their frequencies
Population Genetics – p.45/47 Estimation of a human mutation rate By an equilibrium calculation. Huntington’s disease. Dominant. Does not express itself until after age 40. 1/100, 000 of people of European ancestry have the gene. Reduction in fitness maybe 2%.
If allele frequency is q, then 2q(1 − q) of everyone are heterozygotes. 0.02 of these die. Each has half its copies the Huntington’s allele. So as the frequency of people with the gene is ' 1/100, 000, the fraction of all copies that are mutations that are eliminated is 0.00001 × 1/2 × 0.02 ' 10−7 If we are at equilibrium between mutation and selection, this is also the fraction of copies that have a new mutation.
Similar calculations can be done with recessive alleles.
Population Genetics – p.46/47 How it was done This projection produced using the prosper style in LaTeX, using Latex to make a .dvi file, using dvips to turn this into a Postscript file, using ps2pdf to mill a PDF file, and displaying the slides in Adobe Acrobat Reader.
Result: nice slides using freeware.
Population Genetics – p.47/47