21 March, 2016: Population genetics I
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Background reading
This book is OK primer to pop.gen. in genomics era - focus on coalescent theory and SNP data - unfortunately it contains lots of typos
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics
Definition - studies distributions & changes of allele frequencies in populations over time - effects considered: - natural selection, genetic drift, mutation and gene flow - recombination, population subdivision and population structure - allows inferring past events as well as predicting future
History - fundamental work by Haldane, Wright and Fisher on first half of 20th century - recent development: coalescent theory by Kingman in 1980’s - suitable for SNPs data - computationally highly efficient
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1)
Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T))
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1)
Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T))
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1)
Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T))
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1)
Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) Alleles in a “genetic locus” do not need to be functional - in many studies we are interested in neutral variation: we can then exclude natural selection and focus on genetic drift and gene flow, and e.g. infer historical events of populations - rs1805007 associated e.g. with ‘Skin sensitivity to sun’, ‘Hair color’, ‘Non- melanoma skin cancer’, ‘Freckles’: it may not be entirely neutral Genome provides millions of variable loci, majority of those neutral Inferring presence of function for a locus/allele is of special interest
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)
Population model Theoretical models assume a simplified population model Most commonly used model is Wright-Fisher model. It assumes: - haploid population - no sex - constant population size Wright-Fisher model (WFM) can be generalised: - diploid population - panmictic, random mating - variable population size WFM gives a good approximation for more complex populations
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)
Wright-Fisher model Evolution of an idealised population: generation 1
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)
Wright-Fisher model Evolution of an idealised population: generation 2
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)
Wright-Fisher model Evolution of an idealised population: generation 2
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)
Wright-Fisher model Evolution of an idealised population: generation 3
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)
Wright-Fisher model Evolution of an idealised population: generation 10
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)
Wright-Fisher model Evolution of an idealised population: generation 10
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3)
Population size One central parameter in population genetics is population size Abbreviated as N Population size defines - how quickly variation is lost (forwards) - how much frequencies change per generation (now) - how quickly sample coalesces to MRCA (backwards) Population size is measured in ’units’ of WFM population
- known as effective population size, Ne - can be very different from census population size - some violations of WFM can be corrected for
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3)
- loss of variation - change of allele frequencies -
Known as genetic drift
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3)
- loss of variation - change of allele frequencies -
Known as genetic drift, affects small populations more heavily
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (4)
Genetic drift At every locus, variation is eventually lost and one allele becomes fixed - in non-neutral loci, selection affects chances of fixation - variation is lost much more rapidly in small populations - in small populations genetic drift prevails selection and even harmful alleles may get fixed Variation once lost is lost forever - population bottleneck reduces variation and population - recovery cannot bring it back - new variation is created by mutations
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (4)
Genetic drift - gingers conquering a population in ten generations!
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3)
Coalescence time - small populations coalesce faster, more recent MRCA
- conversely: Ne defined by coalescence time
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (5)
Effective population size Some violations of WFM that can be corrected for - variation in population size: - 10, 100, 50, 80, 20, 500
- Ne = 30.8 - non-equal sex ratio: - 80 + 20 = 100
- Ne = 64 - variation in reproductive success - self-fertilisation
529053 Evolutionary- Genomics Ari Löytynoja / [email protected] Population genetics basics (5)
Effective population size
There are roughly 8 million Holstein cattle in the USA
- theoretical Ne is approximately 80 and declining
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Coalescence theory
Problems of classical population genetics theory - no analytical solutions for practical problems - typically, data are simulated under different assumptions - parameter values producing results similar to observed data considered ’good’
- typical questions about large populations over many generations - forward simulations of full populations time consuming - ‘sample’ only a tiny subset of total
- coalescence models only consider what happens for the sample
529053 Evolutionary Genomics Ari Löytynoja / [email protected] Coalescence theory
Simulation of full population (N=100) vs. sample (n=5) only
529053 Evolutionary Genomics Ari Löytynoja / [email protected]