21 March, 2016: I

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Background reading

This book is OK primer to pop.gen. in genomics era - focus on and SNP data - unfortunately it contains lots of typos

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics

Definition - studies distributions & changes of allele frequencies in populations over time - effects considered: - natural selection, , mutation and gene flow - recombination, population subdivision and population structure - allows inferring past events as well as predicting future

History - fundamental work by Haldane, Wright and Fisher on first half of 20th century - recent development: coalescent theory by Kingman in 1980’s - suitable for SNPs data - computationally highly efficient

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1)

Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T))

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1)

Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T))

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1)

Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T))

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1)

Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) Alleles in a “genetic locus” do not need to be functional - in many studies we are interested in neutral variation: we can then exclude natural selection and focus on genetic drift and gene flow, and e.g. infer historical events of populations - rs1805007 associated e.g. with ‘Skin sensitivity to sun’, ‘Hair color’, ‘Non- melanoma skin cancer’, ‘Freckles’: it may not be entirely neutral Genome provides millions of variable loci, majority of those neutral Inferring presence of function for a locus/allele is of special interest

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)

Population model Theoretical models assume a simplified population model Most commonly used model is Wright-Fisher model. It assumes: - haploid population - no sex - constant population size Wright-Fisher model (WFM) can be generalised: - diploid population - panmictic, random mating - variable population size WFM gives a good approximation for more complex populations

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)

Wright-Fisher model Evolution of an idealised population: generation 1

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)

Wright-Fisher model Evolution of an idealised population: generation 2

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)

Wright-Fisher model Evolution of an idealised population: generation 2

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)

Wright-Fisher model Evolution of an idealised population: generation 3

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)

Wright-Fisher model Evolution of an idealised population: generation 10

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2)

Wright-Fisher model Evolution of an idealised population: generation 10

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3)

Population size One central parameter in population genetics is population size Abbreviated as N Population size defines - how quickly variation is lost (forwards) - how much frequencies change per generation (now) - how quickly sample coalesces to MRCA (backwards) Population size is measured in ’units’ of WFM population

- known as effective population size, Ne - can be very different from census population size - some violations of WFM can be corrected for

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3)

- loss of variation - change of allele frequencies -

Known as genetic drift

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3)

- loss of variation - change of allele frequencies -

Known as genetic drift, affects small populations more heavily

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (4)

Genetic drift At every locus, variation is eventually lost and one allele becomes fixed - in non-neutral loci, selection affects chances of fixation - variation is lost much more rapidly in small populations - in small populations genetic drift prevails selection and even harmful alleles may get fixed Variation once lost is lost forever - population bottleneck reduces variation and population - recovery cannot bring it back - new variation is created by mutations

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (4)

Genetic drift - gingers conquering a population in ten generations!

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3)

Coalescence time - small populations coalesce faster, more recent MRCA

- conversely: Ne defined by coalescence time

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (5)

Effective population size Some violations of WFM that can be corrected for - variation in population size: - 10, 100, 50, 80, 20, 500

- Ne = 30.8 - non-equal sex ratio: - 80 + 20 = 100

- Ne = 64 - variation in reproductive success - self-fertilisation

529053 Evolutionary- Genomics Ari Löytynoja / [email protected] Population genetics basics (5)

Effective population size

There are roughly 8 million Holstein cattle in the USA

- theoretical Ne is approximately 80 and declining

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Coalescence theory

Problems of classical population genetics theory - no analytical solutions for practical problems - typically, data are simulated under different assumptions - parameter values producing results similar to observed data considered ’good’

- typical questions about large populations over many generations - forward simulations of full populations time consuming - ‘sample’ only a tiny subset of total

- coalescence models only consider what happens for the sample

529053 Evolutionary Genomics Ari Löytynoja / [email protected] Coalescence theory

Simulation of full population (N=100) vs. sample (n=5) only

529053 Evolutionary Genomics Ari Löytynoja / [email protected]