Population Genetics I

Total Page:16

File Type:pdf, Size:1020Kb

Population Genetics I 21 March, 2016: Population genetics I 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Background reading This book is OK primer to pop.gen. in genomics era - focus on coalescent theory and SNP data - unfortunately it contains lots of typos 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics Definition - studies distributions & changes of allele frequencies in populations over time - effects considered: - natural selection, genetic drift, mutation and gene flow - recombination, population subdivision and population structure - allows inferring past events as well as predicting future History - fundamental work by Haldane, Wright and Fisher on first half of 20th century - recent development: coalescent theory by Kingman in 1980’s - suitable for SNPs data - computationally highly efficient 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1) Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1) Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1) Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (1) Allele - one of alternative forms of a gene or same genetic locus - used to be visible gene product (e.g. blond vs. red hair) - now typically SNP (e.g. rs1805007(C) vs. rs1805007(T)) Alleles in a “genetic locus” do not need to be functional - in many studies we are interested in neutral variation: we can then exclude natural selection and focus on genetic drift and gene flow, and e.g. infer historical events of populations - rs1805007 associated e.g. with ‘Skin sensitivity to sun’, ‘Hair color’, ‘Non- melanoma skin cancer’, ‘Freckles’: it may not be entirely neutral Genome provides millions of variable loci, majority of those neutral Inferring presence of function for a locus/allele is of special interest 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2) Population model Theoretical models assume a simplified population model Most commonly used model is Wright-Fisher model. It assumes: - haploid population - no sex - constant population size Wright-Fisher model (WFM) can be generalised: - diploid population - panmictic, random mating - variable population size WFM gives a good approximation for more complex populations 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2) Wright-Fisher model Evolution of an idealised population: generation 1 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2) Wright-Fisher model Evolution of an idealised population: generation 2 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2) Wright-Fisher model Evolution of an idealised population: generation 2 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2) Wright-Fisher model Evolution of an idealised population: generation 3 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2) Wright-Fisher model Evolution of an idealised population: generation 10 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (2) Wright-Fisher model Evolution of an idealised population: generation 10 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3) Population size One central parameter in population genetics is population size Abbreviated as N Population size defines - how quickly variation is lost (forwards) - how much frequencies change per generation (now) - how quickly sample coalesces to MRCA (backwards) Population size is measured in ’units’ of WFM population - known as effective population size, Ne - can be very different from census population size - some violations of WFM can be corrected for 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3) - loss of variation - change of allele frequencies - Known as genetic drift 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3) - loss of variation - change of allele frequencies - Known as genetic drift, affects small populations more heavily 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (4) Genetic drift At every locus, variation is eventually lost and one allele becomes fixed - in non-neutral loci, selection affects chances of fixation - variation is lost much more rapidly in small populations - in small populations genetic drift prevails selection and even harmful alleles may get fixed Variation once lost is lost forever - population bottleneck reduces variation and population - recovery cannot bring it back - new variation is created by mutations 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (4) Genetic drift - gingers conquering a population in ten generations! 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (3) Coalescence time - small populations coalesce faster, more recent MRCA - conversely: Ne defined by coalescence time 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Population genetics basics (5) Effective population size Some violations of WFM that can be corrected for - variation in population size: - 10, 100, 50, 80, 20, 500 - Ne = 30.8 - non-equal sex ratio: - 80 + 20 = 100 - Ne = 64 - variation in reproductive success - self-fertilisation 529053 Evolutionary- Genomics Ari Löytynoja / [email protected] Population genetics basics (5) Effective population size There are roughly 8 million Holstein cattle in the USA - theoretical Ne is approximately 80 and declining 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Coalescence theory Problems of classical population genetics theory - no analytical solutions for practical problems - typically, data are simulated under different assumptions - parameter values producing results similar to observed data considered ’good’ - typical questions about large populations over many generations - forward simulations of full populations time consuming - ‘sample’ only a tiny subset of total - coalescence models only consider what happens for the sample 529053 Evolutionary Genomics Ari Löytynoja / [email protected] Coalescence theory Simulation of full population (N=100) vs. sample (n=5) only 529053 Evolutionary Genomics Ari Löytynoja / [email protected].
Recommended publications
  • Workshop in Molecular Evolution
    Workshop in Molecular Evolution Jan 7 - 11, 2019 Shanghai Two faces of one process: phylogenetics vs. population genetics Phylogenetics – model of speciation Tine et al., 2014 Population genetics – model of coalescence Population genetics Idealised population (Fisher-Wright population) • Random mating – each copy of the gene found in the new generation is drawn independently at random from all copies of the gene in the old generation • No selection • No migration • No mutation • Large population size, no drifting British biologist and statistician Ronald Fisher • In a series of papers starting in 1918 and culminating in his 1930 book The Genetical Theory of Natural Selection • Fisher showed that the continuous variation measured by the biometricians could be produced by the combined action of many discrete genes, and that natural selection could change allele frequencies in a population, resulting in evolution. British geneticist J.B.S. Haldane • worked out the mathematics of allele frequency change at a single gene locus under a broad range of conditions. • Haldane also applied statistical analysis to real-world examples of natural selection, such as peppered moth evolution and industrial melanism The American biologist Sewall Wright • animal breeding experiments, focused on combinations of interacting genes, and the effects of inbreeding on small, relatively isolated populations that exhibited genetic drift. • In 1932 Wright introduced the concept of an adaptive landscape and argued that genetic drift and inbreeding could drive a small,
    [Show full text]
  • 6.1 Standard GP
    Eingereicht von Bogdan Burlacu Angefertigt am Institute for Formal Models and Verification Erstbeurteiler FH-Prof. PD DI Dr. Michael Aenzeller Zweitbeurteiler a.Univ.-Prof. Dr. Tracing of Evolutionary Josef Küng Search Trajectories in August 2017 Complex Hypothesis Spaces Dissertation zur Erlangung des akademischen Grades Doktor der Technischen Wissenschaen im Doktoratsstudium der Technischen Wissenschaen JOHANNES KEPLER UNIVERSITÄT LINZ Altenbergerstraße 69 4040 Linz, Österreich www.jku.at DVR 0093696 Acknowledgements First and foremost, I would like to thank Prof. Michael Affenzeller for the opportunity to pursue a PhD within the Heuristics and Evolutionary Algorithms Laboratory (HEAL) at the University of Applied Sciences Upper Austria and for the continuous guidance and friendship. The work described in this thesis would not have been possible without the funding provided by the International PhD program in Informatics Hagenberg, offered as a specialization of the computer science PhD program at the Johannes Kepler University Linz (Austria). I would also like to extend my thanks and gratitude to my colleagues from HEAL who supported me during this time with many insights, criticisms, discussions and development advice: Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Andreas Beham, Stefan Vonolfen, and the whole Heuristiclab team. i Abstract Understanding the internal functioning of evolutionary algorithms is an essential require- ment for improving their performance and reliability. Increased computational resources available in current mainstream computers make it possible for new previously infeasible research directions to be explored. Therefore, a comprehensive theoretical analysis of their mechanisms and dynamics using modern tools becomes possible. Recent algorithmic achievements like offspring selection in combination with linear scaling have enabled genetic programming (GP) to achieve high quality results in system identification in less than 50 generations using populations of only several hundred individuals.
    [Show full text]
  • Evolutionary Genetics LV 25600-01 | Lecture with Exercises | 4KP
    Evolutionary Genetics LV 25600-01 | Lecture with exercises | 4KP 1 HS2019 Population Genetics Extinction vortex is caused by a positive feedback loop (Gilpin and Soule, 1986). Gilpin ME, Soulé ME (1986). "Minimum Viable Populations: Processes of Species Extinction". In M. E. Soulé. Conservation Biology: The Science of Scarcity and Diversity. Sinauer, Sunderland, Mass. pp. 19–34. !2 HS19 | UniBas | JCW Population Genetics ▷ Effective Population Size Ne “The number of breeding individuals in an idealised population that would show the same amount of dispersion of allele frequencies under random genetic drift or the same amount of inbreeding as the population under consideration.” Ronald Fisher and Sewall Wright 3 HS19 | UniBas | JCW Population Genetics ▷ Effective Population Size unequal sex-ratios chromosome (organelle) linkage number of progeny (k≠2) Ne variance in offsprings (Vk≠2) fluctuation in population size 4 HS19 | UniBas | JCW Population Genetics ▷ Non-Random Mating The inbreeding coefficient of an individual (F) is the probability that an individual has two alleles at a locus that are identical by descent. It measures the amount of inbreeding by comparing the observed frequency of heterozygotes (Ho) in the population to the frequency expected under random mating - Hardy-Weinberg (He). H F = 1− o H e In a panmixic population the observed (Ho) and the expected frequency of heterozygosity is not significantly different. H o ≈ H e → F = 0 !5 HS19 | UniBas | JCW Population Genetics ▷ Mutation In the irreversible mutation model the allele frequency decreases over time depending on the mutation rate. p : frequency of allele A after t generations t t p0 :starting frequency of allele A pt = p0 (1− µ) µ : muation rate The mutation rate in an ideal the population equals zero and the allele frequency does not change from one generation to the next.
    [Show full text]
  • Prediction and Estimation of Effective Population Size
    Heredity (2016) 117, 193–206 & 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 0018-067X/16 www.nature.com/hdy REVIEW Prediction and estimation of effective population size J Wang1, E Santiago2 and A Caballero3 Effective population size (Ne) is a key parameter in population genetics. It has important applications in evolutionary biology, conservation genetics and plant and animal breeding, because it measures the rates of genetic drift and inbreeding and affects the efficacy of systematic evolutionary forces, such as mutation, selection and migration. We review the developments in predictive equations and estimation methodologies of effective size. In the prediction part, we focus on the equations for populations with different modes of reproduction, for populations under selection for unlinked or linked loci and for the specific applications to conservation genetics. In the estimation part, we focus on methods developed for estimating the current or recent effective size from molecular marker or sequence data. We discuss some underdeveloped areas in predicting and estimating Ne for future research. Heredity (2016) 117, 193–206; doi:10.1038/hdy.2016.43; published online 29 June 2016 INTRODUCTION which correspond to the so-called inbreeding and variance effective The concept of effective population size, introduced by Sewall Wright sizes, respectively (Crow and Kimura, 1970). (1931, 1933), is central to plant and animal breeding (Falconer and Predictions of the effective population size can also be obtained Mackay, 1996), conservation genetics (Frankham et al.,2010; from the largest nonunit eigenvalue of the transition matrix of a Allendorf et al., 2013) and molecular variation and evolution Markov Chain which describes the dynamics of allele frequencies.
    [Show full text]
  • RS7069 18 Pp033 80 Waples.Pdf (921.1Kb)
    J. CETACEAN RES. MANAGE. 18: 33–80, 2018 33 Guidelines for genetic data analysis ROBIN S. WAPLES1, A. RUS HOELZEL2, OSCAR GAGGIOTTI3, RALPH TIEDEMANN4, PER J. PALSBØLL5, FRANK CIPRIANO6, JENNIFER JACKSON7, JOHN W. BICKHAM8 AND AIMÉE R. LANG9 Contact e-mail: [email protected] ABSTRACT The IWC Scientific Committee recently adopted guidelines for quality control of DNA data. Once data have been collected, the next step is to analyse the data and make inferences that are useful for addressing practical problems in conservation and management of cetaceans. This is a complex exercise, as numerous analyses are possible and users have a wide range of choices of software programs for implementing the analyses. This paper reviews the underlying issues, illustrates application of different types of genetic data analysis to two complex management problems (involving common minke whales and humpback whales), and concludes with a number of recommendations for best practices in the analysis of population genetic data. An extensive Appendix provides a detailed review and critique of most types of analyses that are used with population genetic data for cetaceans. KEYWORDS: ABUNDANCE ESTIMATE; BREEDING GROUNDS; CONSERVATION; DNA FINGERPRINTING; FEEDING GROUNDS; GENETICS; HUMPBACK WHALE; MIGRATION; MINKE WHALE; REPRODUCTION; TAXONOMY INTRODUCTION before the analyses considered here begin, the DNA data Recently, guidelines were adopted for quality control of quality-control guidelines have been consulted and followed DNA data intended for use within the International Whaling to the extent possible, and that any substantial deviations Commission (IWC, 2009; 2015a). Once the data have have been documented and explained. been collected, the next step is to analyse the data and make As discussed in detail later, genetic information can inferences that are useful for addressing practical problems provide insights relevant to many types of problems in the management of cetaceans.
    [Show full text]