23rd Oct 2015
Population Genetics I (Introduction + Neutral Theory)
Gurinder Singh “Mickey” Atwal Center for Quantitative Biology Summary and definitions • Basic definitions/concepts
PART 1 • Neutral theory of single loci
• Natural Selection PART 2 • Haplotype analyses DNA Sequence Variation : Single Nucleotide Polymorphisms
CAGCCAGACTGCCTTCCGGGTCACTGCCATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGC C T T CCCCCTCTGAGTCAGGAAACATTTTCAGACCTATGGAAACTG GTGAGTGGATCCATTGGAAGG C GCAGGCCACCACCCCGACCCCAACCCCAGCCCCCTAGCAGAGACCTGTGGGAAGCGAAAA
TTCATGGGACTGACTTTCTGCTCTTGTCTTTCAGACTTCCTGAAAACAACGTTCTGGTAAGGAC A CAAGGGTTGGGCTGGGACCTGGAGGGCTGGGGGGGCTGGGGGGCTGGGACCTGGTCCTCC G A A TGACTGCTCTTTTCACCCATCTACAGC TCCCCCTTGCCGTCCCAAGCAATGGATGATTTGATGC T TGTCCCCGGACGATATTGAACAATGGTTCACTGAAGACCCAGGTCCAGATGAAGCTCCCAGAG C ATGCCAGAGGCTGCTCCCCGCGTGGCCCCTGCACCAGCAGCTCCTACACCGGCGGCCCCT
GCACCAGCCCCCTCCTGGCCCCTGTCATCTTCTGTCCCTTCCCAGAAAACCTACCAGGGCA
GCTACGGTTTCCGTCTGGGCTTCTTGCATTCTGGGACAGCCAAGTCTGTGACTTGCACG Part of human p53 gene (exons 2-4) EXONS / INTRONS • Chromosome 17 Correlations in Genomic Studies
1. Correlations amongst alleles many possible correlation statistics (D, D’, r2, δ,Q)
GENOTYPE GCTCCCCGCGTGGCCCCTGCACC
2. Genotype-phenotype correlations many possible tests of association (Χ2, fisher exact, cochran-armitage)
PHENOTYPE e.g. onset of cancer, apoptosis rates Goal of population genetics
• Understand forces that produce and maintain inherited genetic variation
• Forces – Mutation – Recombination – Natural Selection – Population Structure – Random birth/death (drift) Hardy Weinberg Law
• Consider 2 alleles (A,a) with frequency • Allele frequency of A = p • Allele frequency of a = q = 1-p • Randomly-mating large diploid population with no mutation, migration, selection and drift
Genotype AA Aa aa
Hardy- Weinberg p2 2pq q2 Frequency Hardy Weinberg Law
• Only need few rounds of random matings to get HW equilibrium. (How many exactly for hermaphrodite and dioecious populations?) • Fast time scale
• Deviation from HW equilibrium mainly due to – Strong Selection – Inbreeding – Population Subdivision – *Genotyping Errors * Population Subdivision
Genotype AA Aa aa
2 2 Frequency p (1-FST)+pFST 2pq(1-FST) q (1-FST)+qFST
• Wahlund effect • Effect gets bigger the more different the subpopulations
• 0 Genotype AA Aa aa 2 2 Frequency p (1-FI)+pFI 2pq(1-FI) q (1-FI)+qFI • Effect gets bigger the more related the population • 0 • What happens when we consider a finite population size ? • Allele frequencies can change even if there is no natural selection. Evolution of a neutral mutant allele Wright-Fisher Process mutation Derived allele extinction! 2Nalleles Nindividuals generation Ancestral allele Derived allele Stochastic birth/death process (Moran model) death time • Overlapping generations • Distribution of time to replication Evolution of a neutral mutant allele mutation Derived allele fixation ! Nindividuals 1.2 1 DIFFUSION 0.8 Kimura diffusion theory 0.6 0.4 allelefrequency 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 time/generations Natural Selection is more effective in larger populations Genetic Drift dominates in smaller populations Genetic Drift Darwinian evolution N, population size Neutral drift ~4N Allele frequency Generations/Time Most new mutations are eventually lost Only a small fraction (1/2N) eventually fixate in the population Neutral Molecular Evolution r = u Substitution rate Mutation rate • Rate of new fixations equals the mutation rate and does not depend on N • Implies substitution rate is constant • Gives a molecular clock for neutral molecular evolution • Molecular divergence between 2 species should be proportional to number of generations since last common recent ancestor Effective Population Size, Neff 1 1 ! 1 1 1 $ Discrete time steps ... Ni=Population at = # + + + & T total time steps Neff T " N1 N2 NT % time step i Human Population Expansion • Neff~10,000 (European Hapmap) • Nonadiabatic expansion Heterozygosity, H • Homozygosity, G=1-H • Probability that 2 alleles drawn at random are different • E.g. if biallelic then H=2p(1-p) G=p2+(1-p)2 Heterozygosity decay • Wright-Fisher ⎛ t ⎞ Ht = H0 exp⎜− ⎟ ⎝ N ⎠ • Moran ⎛ 2t ⎞ Ht = H0 exp⎜− 2 ⎟ ⎝ N ⎠ Different microscopic models are equivalent upto rescaling of time Mutation-Drift Balance • Drift decreases H • Mutation increases H • Two forces cancel out to give equilibrium variation in population 4Nu Heterozygosity H = 1+ 4Nu 1 Homozygosity G = 1+ 4Nu Mutation-Drift Balance • Time scale of mutations ~ 1/u • Time scale of drift ~ 4N • Remember, drift eliminates variation and mutations create variation • If 4N<<1/u, population mostly devoid of variation • If 4N>>1/u, population with much variation 4µN>>1 4µN<<1 Human SNP frequency distribution Distribution of allele frequencies in Chromosome 1 - 180 Northern European samples (HapMap consortium) Non-coding (intergenic) Empirical data Allelef frequency Coalescent Present 22 individuals 18 ancestors 16 ancestors 14 ancestors 12 ancestors 9 ancestors 8 ancestors Time 8 ancestors 7 ancestors 7 ancestors 5 ancestors 5 ancestors 3 ancestors 3 ancestors 3 ancestors 2 ancestors 2 ancestors 1 ancestor Bifurcating Tree Present P(pair coalesce)=1/2N Time After t generations ? P(k coalesce to k-1)= k(k-1)/4N Present Time Many different trees can produce the present population ! Most recent common ancestor (MRCA) Properties of coalescent • Random tree with random coalescent interval times ~ Wright-Fisher model • Time to coalescence gets longer the further we go back in time • The larger the population size the slower the rate of coalescence Mutation ? Present Time Most recent common ancestor (MRCA) Present TCGAGGTATTAAC TCTAGGTATTAAC Time mutation Most recent common ancestor (MRCA) Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC Time Most recent common ancestor (MRCA) Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC Time * ** * * Most recent common ancestor (MRCA) Efficient computer simulations of neutral mutation 1. Generate random genealogy of individuals back in time 2. Superimpose mutation