An Introduction to Population Genetics
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 1 23rd Jan 2nd week An introduction to population genetics Gil McVean What is population genetics? Like so many branches of biology, what we think of today as population genetics would hardly be recognised by the founding fathers of the discipline. If you had been studying population genetics 80 years ago, you would probably have been working on the heritability of microscopic traits in Drosophila or developing efficient crossing schemes for agricultural breeding, 30 years ago you may have been analysing levels of protein polymorphism and population differentiation. These days, if you work in population genetics you are more likely to be interested in using DNA sequence variation to map disease mutations in humans, or sites of adaptive evolution in viral genomes. But of course there is a link between all three types of study: the genetics of variation. And broadly speaking, population genetics can be defined as the study of the genetical basis of naturally occurring variation, with the aim of describing and understanding the evolutionary forces that create variation within species and which lead to differences between species. For example, this picture represent sequence level variation in a human gene called LPL, thought perhaps to play some role in hereditary heart disease. The types of questions we might want to ask of the data are A) Can we detect a link between sequence variation and a predisposition to heart disease? B) What does variation in this gene tell us about the history of humans? C) Can we detect the influence of natural selection on the recent history of the gene? And in turn such questions raise other, more technical, but still critical issues such as A) How many individuals should I sequence from? B) How much sequence should I collect from each? C) What is the best way to sample individuals in order to answer my question? These questions are not easy to address, and of course they are not independent of one another. What I hope to achieve during this course is to give you some understanding of how to begin answering such questions, and a feeling for the underlying theoretical models and methods. Some definitions This lecture is meant to be an introduction to the subject of population genetics. One of the most important things is to know what a population geneticist means by terms you are already familiar with, because it is more than likely that the two are not the same. The most important term is the gene. To the molecular geneticist this means an open reading frame and all the associated regulatory elements. The classical geneticist’s view is only slightly different, but its starting point is the phenotype, not the genotype. A geneticist would call a gene a region of a chromosome to which they can map a mutation. Long before DNA was understood to be the material of heredity, geneticists were talking about genes. To the evolutionary biologist, a gene is also defined by its ability to recombine, but in the evolutionary sense. The best definition of what a gene means to an evolutionary biologist comes from GC Copyright: Gilean McVean, 2001 1 Williams, who talks about a region of genetic material sufficiently small that it is not broken up by recombination, and which can therefore be acted upon in a coherent manner by natural selection. This is not the most concise definition, but the essence is that the gene is the unit of selection. There are other terms which it is important to understand. An allele can be defined as one of two or more possible forms of a gene. Alleles can exist naturally, or may be induced by mutagenesis. The key point is that alleles are mutually exclusive. The final term of key importance is polymorphism. To be vague this means any gene or locus for which multiple forms exist in nature. However, because we can never sample every organism in a species, a more practicable, but arbitrary definition is used – notably a gene is polymorphic if the most common allele is at a frequency of less than 99%. Historical developments in the understanding of the genetic basis of variation. The field of population genetics was created almost exactly 100 years ago, prompted by the rediscovery of Mendel’s laws of inheritance. But to understand the importance of this discovery it is important to go back even further, to the experiments of Mendel, and of course, to Charles Darwin. Although Mendel didn’t realise it, his discovery that certain traits of seed coat and colour in peas are inherited in a particulate manner was critical to the widespread acceptance of Darwin and Wallace’s theory of evolution by natural selection. In its most simplified form, Darwin’s theory consists of just three statements. A) Organisms produce too many offspring B) Heritable differences exist in traits influencing the adaptation of an organism to its environment C) Organisms that are better adapted have a higher chance of survival The problem was, the way in which Darwin envisaged inheritance differences between organisms would be rapidly diluted through mating. In particular, he envisaged a form of blending inheritance in which offspring were intermediate between parental forms. Mendel’s discovery that traits could be inherited in a discrete manner of course changed that view. Though it was not until de Vries, Correns, and Tschermak von Seysenegg independently rediscovered both the phenomenon, and consequently Mendel’s work, that this was acknowledged. The idea that traits can be inherited in such a simple manner is extremely powerful. And following from de Vries and the others, many different traits showing such simple patterns of inheritance were rapidly described. For example in humans, the most well known examples are traits such as the ABO blood group and albinism. However, while the discovery of particulate inheritance solved one problem, it created an even greater one as well. The problem was that many geneticists, de Vries among them, came to understand the genetic nature of variation simply in terms of large, discrete differences. That is, the difference between round and wrinkly peas, or the difference between pink and white flowers. But the Darwinian view is one of gradualism; that there exists a continuum of variation, on which natural selection can act. De Vries was the first to use the term mutation – and by mutation he meant changes in genetic material that led to large differences in phenotype. On the other hand, naturalists and systematists were developing a coherent view of evolution by natural selection that rested almost entirely upon the notion of small changes. The views of saltationists like Goldscmidt, with his ‘hopeful monsters’ and empiricists such as Dobzhansky seemed to be almost entirely incompatible. Copyright: Gilean McVean, 2001 2 The solution is of course that the gradual, quantitative difference of the neo-Darwinians are in fact composed of the cumulative effects of many different loci, each behaving in a Mendelian, particulate fashion. By studying patterns of inheritance of bristle number in Drosophila, Morgan was able to show (1915) that even minute differences can behave in a Mendelian fashion. Similar results were found by Jennings in Paramecium. Also important were the artificial selection experiments of Castle and Sturtevant on quantitative traits in rats and Drosophila, which showed that selection acting on genes of small effect is effective. Nilsson-Ehle (1909), working on pigmentation in wheat kernels, showed that the additive contribution of just a few loci (three in his case) could generate an apparently continuous distribution of phenotype. In short, the link had been made been Mendelian inheritance and Darwin’s theory of evolution by natural selection. The population genetics of continuous variation The first major contribution of theoretical population genetics to the understanding of natural variation arose from the discovery that Mendelian inheritance could underlie apparently continuous traits. In 1918, RA Fisher published a paper demonstrating how the phenotypic variation in a trait, and correlations between relatives, could be used to partition variation into genetic and environmental components, and also how the genetic component could be further partitioned into terms representing additive, dominant and epistatic contributions across loci. This finding, along with earlier work on quantitative theories of inbreeding, had two important consequences. First, it naturally gave rise to a method for estimating the genetic contribution to variation for any quantitative trait. Second, it provided a means of predicting the effect of any artificial selection regime, as practiced by agriculturalists – and of course a framework within which to develop more efficient methods of breeding crops and animals with more desirable qualities and quantities. Traits affected by multiple loci are called polygenic traits, but the term multifactorial is often used in order to emphasise the importance of environmental influences. Multifactorial traits can be further broken down into three types A) Continuous traits: Height, birth weight, milk yield B) Meristic traits: Bristle number in Drosophila C) Discrete traits with continuous liability: E.g. polygenic disease, threshold traits Fisher’s results provided a means of directly estimating the contribution of genetics to variation in the phenotype, a factor which is term heritability. There are two formulations of the term heritability, one known as “narrow sense” heritability, the other as “broad sense” heritability. “Narrow-sense heritability” is defined as the correlation between parents and offspring for some trait. For example, if we plot mid-parent value against offspring value, and fit a linear model. The linear coefficient b, in the model y = c + bx is estimated by Cov(x, y) b = Var(x) and the relationship between b and heritability (h) is given by b = h 2 Copyright: Gilean McVean, 2001 3 What is the importance of this number? There are two ways this can be approached.