POPULATION GENETICS LECTURE NOTES-2016.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Training course in Quantitative Genetics and Genomics Biosciences East and Central Africa- International Livestock Research Institute (BecA-ILRI) Hub Nairobi, KENYA May 30-June 10, 2016 POPULATION AND QUANTITATIVE GENETICS GENOME ORGANIZATION AND GENETIC MARKERS SELECTION THEORY BREEDING STRATEGIES Samuel E Aggrey, PhD Professor Department of Poultry Science Institute of Bioinformatics University of Georgia Athens, GA 30602, USA [email protected] Preface This lecture notes was written in an attempt to cover parts of Population Genetics, Quantitative Genetics and Molecular Genetics for postgraduate students and also as a refresher for field geneticists. The course material is not a text book and not meant to be copied, duplicated or sold. This text is unedited and I am solely responsible for all conceptual mistakes, grammatical errors and typos. Genetics is a life-long course and cannot be covered in a few lectures. Only selected parts of the population- and quantitative-, and molecular genetics will be covered in this course because of time constraints. This course will cover some of the evolutionary changes in allele frequency between generations such as natural selection and gene flow, and some aspects of Quantitative and Molecular Genetics. To those men who have kept us awake for over two centuries and I believe would continue to do so for many more centuries! POPULATION GENETICS The study of composition of biological populations, and changes in genetic composition that result from operation of various factors including (a) natural selection, (b) genetic drift, (c) mutations and (d) gene flow Genetic composition Population 1. The number of alleles at a locus A group of breeding 2. The frequency of alleles at a locus individuals 3. The frequency of genotypes at a locus 4. Transmission of alleles from one generation to the next Single locus: Locus A with two alleles A1 and A2 p =P +½H q =Q +½H Derivation of the Hardy-Weinberg principle Ideal population 1. Two sexes and the population consist of sexually mature individuals 2. Mating between male and female are equal in probability (independent of distance between mates, type of genotype, age of individuals 3. Population is large and actual frequency of each mating is equal to Mendelian expectation 1 4. Meiosis is fair. We assume that there is no segregation distortion, no gamete competition, no differences in the developmental ability of eggs or fertilizing ability of sperms 5. All mating produce the same number of offspring, on average. Thus, frequency of a particular genotype in the pool of newly formed zygote is: ∑(frequency of mating) (frequency of genotype produced from mating) Frequency (A1A1 in zygotes) = P2 + ½PH +½PH +¼H2 =(P+½H)2 =p2 Frequency (A1A2) =2pq Frequency (A2A2) =q2 6. Generations do not overlap 7. There is no difference among genotype groups in the probability of survival 8. There is no migration, mutation, drift and selection Hardy-Weinberg Law In a large random mating population in the absence of mutation, migration, selection and random drift, allele frequency remains the same from generation to generation. Furthermore, there is a simple relationship between allele frequency and genotypic frequency Why is Hardy-Weinberg principle so important? Is there any population anywhere in the world or outer space that satisfies all assumptions? Possible evolutionary forces within populations cause a violation of at least one of these assumptions, and departure from Hardy-Weinberg are one way in which we detect those forces and estimate their magnitude. The most significant evolutionary factors are selection (natural or artificial), non-random mating and gene flow. 2 Fig. 1 shows the relationship between allele frequency and three genotypic frequencies for a population under Hardy-Weinberg proportions: 1. The heterozygote is the most common genotype for intermediate allele frequencies 2. One of the homozygotes is the most when the allele frequency is not intermediate 3. Only ⅓ of the time when q is between ⅓ and ⅔, is the heterozygote the most common genotype 4. When q is between 0 and ⅓ A1A1 is the most common, and when q is between ⅔ and 1, A2A2 is the most common. 5. The maximum frequency of the heterozygote occurs when q=0.5 This can be shown directly by setting the derivatives of the H-W heterozygosity, 2pq=2q(1-q), equal to zero and solving for q or d[2q(1 − q) = 2 − 4푞 = 0 푑푞 Here, we assume that the generations are non-overlapping, i.e. the parents die after producing progeny, and the progeny then become the next parental generation. Testing for deviation from Hardy-Weinberg Equilibrium Departure from Hardy-Weinberg equilibrium can be tested from a sample scored for their genotypes. The genetic model provided by Hardy-Weinberg generates the expected frequency at equilibrium. We can now compare observed and expected allele frequencies under the assumptions of Hardy-Weinberg proportions. The chi- square test of goodness of fit and the likelihood ratio test can be used to test departure or lack thereof from Hardy-Weinberg equilibrium. The chi-square test is an approximation to the likelihood ratio test. To perform a chi-square goodness of fit test, we first have to estimate the observed genotypic frequency from the data, 3 then use that to generate the expected genotypic frequencies. We can compute the chi-square statistic as: (푂 − 퐸)2 푋2 = ∑ 퐸 Where O and E are the observed and expected number of a particular genotype and n is the number of genotypic classes. From the calculated value of X2 and the table value of X2 we can obtain the probability that the observed numbers deviates from the expected numbers. The degrees of freedom used to determine the significance of X2 value are equal to the number of genotypic classes, n, minus one, then minus the number of parameters estimated from the data. One degree of freedom is always lost because we use the data to estimate allele frequency. We can use the chi-square distribution to test whether the value of X2 is too large to be the result of sampling error. In doing so we are performing a one-tailed test. The chi-square expression for two alleles is given as: (푁 − p̂2N)2 (푁 − 2p̂q̂N)2 (푁 − q̂2N)2 푋2 = 11 + 12 + 22 p̂N 2p̂q̂ N q̂2N An alternate way to estimate differences of observed frequencies from expected frequencies is to calculate the standardized deviation of the observed frequency from the Hardy-Weinberg expectation of heterozygotes, which provides the fixation index or generally inbreeding, F. 2푝푞 − 퐻 퐻 퐹 = = 1 − 2푝푞 2푝푞 It can be shown that 푋2 = 퐹2푁 For two alleles, the Chi-square good of fit test for Hardy-Weinberg proportions is equivalent to the test for inbreeding, F=0. However, F is unstable as the expected (E) value approaches zero, and therefore not useful for rare and very common alleles. For E=0, O>0, F=-∞, and for E=0, and O=0, F is undefined. Deviation from Hardy-Weinberg proportions can also be tested using the likelihood ratio test which is described in most statistical texts. 4 The B/b locus is responsible for plumage color in chickens found in the Rift Valley. The B allele expresses black plumage which is completely dominant over the b allele for brown plumage. Phenotype Genotype Observed number Expected number Black BB 290 p̂ 2N=289.444 Black Bb 496 2p̂ q̂ =497.112 Brown bb 214 q̂ 2N=213.444 Total 1,000 1,000 P=290/1000=0.29; H=496/1000=0.496; Q=214/1000=0.214; P+H+Q=1.0 p̂ =P+½H = 0.29+½(0.496)=0.538; q̂ =Q+½H = 0.214+½(0.496)=0.462; p̂ +q̂ =1.0 Note: Chi-square is allergic to fraction and ratios, but really likes integers! (290 − 289.444)2 (496 − 497.112)2 (214 − 213.444)2 푋2 = + + = 0.0050 289.444 497.112 213.444 The X2-Table at p=0.05 at 1 degree of freedom is 3.84. Since the X2 calculate is lower than X2 table, we can conclude that the data does not deviate from Hardy- Weinberg proportions. 퐻 0.496000 퐹 = 1 − = 1 − = 0.002237 2푝푞 0.497112 푋2 = 퐹2푁 = 0.0050 5 Extension of Hardy-Weinberg’s Law: Multiple Alleles Let us consider a single locus with three alleles A1, A2 and A3 with frequencies, p, q and r, respectively. Hardy Weinberg frequencies for three autosomal alleles at a single locus Allele/ A1 A2 A3 frequency p q r A1 A1A1 A1A2 A1A3 p p2 pq pr A2 A2A1 A2A2 A2A3 q qp q2 qr A3 A3A1 A3A3 A3A3 r rp rq r2 Genotype Frequency Number 2 A1A1 p N11 A1A2 pq+pq=2pq N12 A1A3 pr+pr=2pr N13 2 A2A2 q N22 A2A3 qr+qr=2qr N23 2 A3A3 r N33 TOTAL 1.0 N Please note that, 푝 + 푞 + 푟 = 1, and they key to solving multiple alleles is to break in order for the problem to resemble a two allele problem 푁 푓(퐴3퐴3) = 푟2 = 33 푁 푁 푟 = √ 33 푁 From here, let’s reduce the problem to a two allele locus involving the allele, A3 Expected genotypes under H-W: A2A2, A2A3 and A3A3 with expected frequency 푁 +푁 +푁 푞2 + 2푞푟 + 푟2 = 22 23 33. 푁 From basic algebra: (푎 + 푏)2 = 푎2 + 2푎푏 + 푏2. This implies: (푞 + 푟)2 = 푞2 + 2푞푟 + 푟2 푁 +푁 +푁 Therefore: (푞 + 푟)2 = 22 23 33 푁 6 푁 +푁 +푁 푞 + 푟 = √ 22 23 33 푁 푁 +푁 +푁 푁 푟 = √ 22 23 33 − √ 33 푁 푁 Since, 푝 + 푞 + 푟 = 1, then 푝 = 1 − (푞 + 푟) 푁 +푁 +푁 푝 = 1 − √ 22 23 33 푁 The ABO blood group in humans is determined by three alleles, A, B and O.