23rd Oct 2015

Population Genetics I (Introduction + Neutral Theory)

Gurinder Singh “Mickey” Atwal Center for Quantitative Biology Summary and definitions • Basic definitions/concepts

PART 1 • Neutral theory of single loci

PART 2 • Haplotype analyses DNA Sequence Variation : Single Polymorphisms

CAGCCAGACTGCCTTCCGGGTCACTGCCATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGC C T T CCCCCTCTGAGTCAGGAAACATTTTCAGACCTATGGAAACTG GTGAGTGGATCCATTGGAAGG C GCAGGCCACCACCCCGACCCCAACCCCAGCCCCCTAGCAGAGACCTGTGGGAAGCGAAAA

TTCATGGGACTGACTTTCTGCTCTTGTCTTTCAGACTTCCTGAAAACAACGTTCTGGTAAGGAC A CAAGGGTTGGGCTGGGACCTGGAGGGCTGGGGGGGCTGGGGGGCTGGGACCTGGTCCTCC G A A TGACTGCTCTTTTCACCCATCTACAGC TCCCCCTTGCCGTCCCAAGCAATGGATGATTTGATGC T TGTCCCCGGACGATATTGAACAATGGTTCACTGAAGACCCAGGTCCAGATGAAGCTCCCAGAG C ATGCCAGAGGCTGCTCCCCGCGTGGCCCCTGCACCAGCAGCTCCTACACCGGCGGCCCCT

GCACCAGCCCCCTCCTGGCCCCTGTCATCTTCTGTCCCTTCCCAGAAAACCTACCAGGGCA

GCTACGGTTTCCGTCTGGGCTTCTTGCATTCTGGGACAGCCAAGTCTGTGACTTGCACG Part of human p53 ( 2-4) EXONS / • Chromosome 17 Correlations in Genomic Studies

1. Correlations amongst alleles many possible correlation statistics (D, D’, r2, δ,Q)

GENOTYPE GCTCCCCGCGTGGCCCCTGCACC

2. Genotype-phenotype correlations many possible tests of association (Χ2, fisher exact, cochran-armitage)

PHENOTYPE e.g. onset of cancer, apoptosis rates Goal of

• Understand forces that produce and maintain inherited genetic variation

• Forces – – Recombination – Natural Selection – Population Structure – Random birth/death (drift) Hardy Weinberg Law

• Consider 2 alleles (A,a) with frequency • Allele frequency of A = p • Allele frequency of a = q = 1-p • Randomly-mating large diploid population with no mutation, migration, selection and drift

Genotype AA Aa aa

Hardy- Weinberg p2 2pq q2 Frequency Hardy Weinberg Law

• Only need few rounds of random matings to get HW equilibrium. (How many exactly for hermaphrodite and dioecious populations?) • Fast time scale

• Deviation from HW equilibrium mainly due to – Strong Selection – Inbreeding – Population Subdivision – *Genotyping Errors * Population Subdivision

Genotype AA Aa aa

2 2 Frequency p (1-FST)+pFST 2pq(1-FST) q (1-FST)+qFST

• Wahlund effect • Effect gets bigger the more different the subpopulations

• 0

Genotype AA Aa aa

2 2 Frequency p (1-FI)+pFI 2pq(1-FI) q (1-FI)+qFI

• Effect gets bigger the more related the population • 0

• What happens when we consider a finite population size ?

• Allele frequencies can change even if there is no natural selection. of a neutral mutant allele

Wright-Fisher Process

mutation Derived allele extinction! 2Nalleles Nindividuals

generation

Ancestral allele Derived allele Stochastic birth/death process (Moran model)

death time

• Overlapping generations • Distribution of time to replication Evolution of a neutral mutant allele

mutation Derived allele fixation ! Nindividuals

1.2

1 DIFFUSION 0.8 Kimura diffusion theory 0.6

0.4

allelefrequency 0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 time/generations Natural Selection is more effective in larger populations dominates in smaller populations

Genetic Drift Darwinian evolution

N, population size Neutral drift

~4N

Allele frequency

Generations/Time Most new are eventually lost Only a small fraction (1/2N) eventually fixate in the population Neutral Molecular Evolution

r = u

Substitution rate Mutation rate

• Rate of new fixations equals the mutation rate and does not depend on N • Implies substitution rate is constant • Gives a for neutral molecular evolution • Molecular divergence between 2 species should be proportional to number of generations since last common recent ancestor Effective Population Size, Neff

1 1 ! 1 1 1 $ Discrete time steps ... Ni=Population at = # + + + & T total time steps Neff T " N1 N2 NT % time step i

Human Population Expansion

• Neff~10,000 (European Hapmap) • Nonadiabatic expansion Heterozygosity, H

• Homozygosity, G=1-H

• Probability that 2 alleles drawn at random are different

• E.g. if biallelic then H=2p(1-p) G=p2+(1-p)2 Heterozygosity decay

• Wright-Fisher ⎛ t ⎞ Ht = H0 exp⎜− ⎟ ⎝ N ⎠ • Moran ⎛ 2t ⎞ Ht = H0 exp⎜− 2 ⎟ ⎝ N ⎠

Different microscopic models are equivalent upto rescaling of time Mutation-Drift Balance

• Drift decreases H • Mutation increases H • Two forces cancel out to give equilibrium variation in population 4Nu Heterozygosity H = 1+ 4Nu 1 Homozygosity G = 1+ 4Nu Mutation-Drift Balance

• Time scale of mutations ~ 1/u • Time scale of drift ~ 4N • Remember, drift eliminates variation and mutations create variation

• If 4N<<1/u, population mostly devoid of variation • If 4N>>1/u, population with much variation 4µN>>1

4µN<<1 Human SNP frequency distribution Distribution of allele frequencies in Chromosome 1 - 180 Northern European samples (HapMap consortium) Non-coding (intergenic)

Empirical data

Allelef frequency Coalescent Present 22 individuals 18 ancestors 16 ancestors 14 ancestors 12 ancestors 9 ancestors 8 ancestors Time 8 ancestors 7 ancestors 7 ancestors 5 ancestors 5 ancestors 3 ancestors 3 ancestors 3 ancestors 2 ancestors 2 ancestors 1 ancestor Bifurcating Tree Present

P(pair coalesce)=1/2N

Time After t generations ?

P(k coalesce to k-1)= k(k-1)/4N Present

Time

Many different trees can produce the present population !

Most recent common ancestor (MRCA) Properties of coalescent

• Random tree with random coalescent interval times ~ Wright-Fisher model

• Time to coalescence gets longer the further we go back in time

• The larger the population size the slower the rate of coalescence Mutation ? Present

Time

Most recent common ancestor (MRCA) Present TCGAGGTATTAAC TCTAGGTATTAAC

Time

mutation

Most recent common ancestor (MRCA) Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC

Time

Most recent common ancestor (MRCA) Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC Time * ** * *

Most recent common ancestor (MRCA) Efficient computer simulations of neutral mutation 1. Generate random genealogy of individuals back in time

2. Superimpose mutation