Distribution of Genes in Populations Virginia Pallante, M.S.

REQUIRED READING

Nussbaum, RL, McInnes, RR, and Willard HF (2007) Thompson and Thompson Genetics in Medicine, 7th edition. Saunders: Philadelphia. Chapter 9, pp. 192 ­ 204.

Complete problems 3, 6 ­ 11 from Chapter 9

LECTURE OBJECTIVES

1. To introduce the binomial distribution and discuss its application in family and population studies. 2. To define gene and frequencies and the Hardy­Weinberg equilibrium. 3. To illustrate estimation of gene and genotype frequencies of autosomal and X­linked traits. 4. To discuss the importance of population genetics in genetic counseling and forensics. 5. To discuss factors that cause departure from Hardy­Weinberg equilibrium.

STUDENT OBJECTIVES:

1. Understand the terms gene frequency and genotype frequency. 2. Use the binomial distribution to derive the probabilities of affected offspring in sibships of different size from various mating types. 3. Use the principles of the Hardy­Weinberg Law to derive genotype frequencies. 4. Estimate gene frequencies for autosomal and X­linked diseases, given their population frequency. 5. Include genotype frequencies in recurrence risk calculations. 6. Understand the assumptions required for the maintenance of Hardy­Weinberg equilibrium. 7. Define the coefficient of inbreeding.

IMPORTANT TERMS

Gene pool Genotype frequency Hardy­Weinberg equilibrium Gene flow Stratification Inbreeding (consanguinity) Fitness Gene/ frequency Binomial distribution Selection Assortative mating Coefficient of inbreeding Heterozygote advantage I. Useful Definitions

A. Gene pool refers to all (which is twice the size of the number of individuals) in a given population.

B. Gene or refers to the prevalence of an allele in the gene pool. Thus, for the population of individuals with AA, AA, AA, AA and Aa, the gene/allele frequency of a is 10%, and the gene/allele frequency of A is 90%.

C. Genotype frequency refers to the prevalence of a genotype in the population of genotypes. In the example above, the genotype frequency of AA is 80%; of Aa, 20%; and of aa, 0%.

II. The Binomial Distribution and Its Applications

A. The binomial distribution describes the frequency of the possible combinations of two mutually exclusive outcomes in n trials.

B. The binomial formula can be used to derive the distribution of:

1. Males and females within sibships of different size.

2. Affected offspring of various mating types, within sibships of different size.

3. Genotypes within the population, given gene frequencies. (e.g., How many carriers of the cystic fibrosis mutant allele are there if the population gene frequency is 0.022?)

4. In each case, we must know the probability of the occurrence of each outcome. These probabilities are denoted p and q. We assume that the two outcomes are the only possible outcomes, so that p + q = 1.

Type of Random Trial Mutually Exclusive Outcomes Outcome 1 (probability p) Outcome 2 (probability q) Sex of Child Boy (1/2) Girl (1/2) Affection Status (of an Affected (1/4) Unaffected (3/4) autosomal recessive disease) for a child from Aa x Aa mating Cystic Fibrosis Genotype of Allele A (population gene Allele a (population gene individual in the European frequency of normal allele = frequency of mutant allele = Caucasian population 0.98) 0.02) Table 1 A. For n trials, the sum of the probabilities of all possible combinations of events is given through the expansion of (p + q) n

1. For a sibship of size two (n = 2), the probabilities of gender outcomes are p 2 (two boys) + 2pq (1 boy, 1 girl) + q 2 (two girls).

When p = q = 1/2, then probability (2 boys) = p 2 = 0.25 probability (1 boy, 1 girl) = 2pq = 0.50 probability (2 girls) = q 2 = 0.25

2. Similarly, for a sibship of size two (n = 2), the probabilities of affection status outcomes (for an AR trait, from Aa x Aa mating) are p 2 (two affected) + 2pq (1 aff, 1 unaff) + q 2 (two unaff).

When p = 1/4 and q = 3/4, then probability (2 aff) = p 2 = 1/16 probability (1 aff, 1 unaff) = 2pq = 6/16 probability (2 unaff) = q 2 = 9/16.

3. Finally, for organisms receiving one maternal and one paternal allele at the CF locus (n = 2), the probabilities of genotypes are p 2 (genotype AA) + 2pq (genotype Aa) + q 2 (genotype aa).

If p = 0.98 and q = 0.02, then probability (AA) = p 2 = 0.96 probability (Aa) = 2pq = 0.04 = 1/25 probability (aa) = q 2 = 0.0004 = 1/2500

Note: In all cases, p 2 + 2pq + q 2 = 1.0. The sum of the probabilities of all possible outcomes is 1.0

III. Hardy­Weinberg Equilibrium

A. The Hardy­Weinberg Equilibrium states that gene frequencies are expected to stay the same from one generation to the next, IF all of the following conditions are met

1. Mating is random. 2. No selection for or against a phenotype occurs. 3. No migration into or out of the population occurs. 4. No new mutations at the genetic locus occurs.

(See discussion of these assumptions in section VI). B. Specifically, for a locus with two alternative alleles, A and a, assume the population gene frequencies are p and q, respectively. The possible genotypes are AA, Aa, and aa with population frequencies of p 2 , 2pq, and q 2, respectively.

C. The Hardy­Weinberg Law states that the population frequencies of these genotypes will remain the same from one generation to the next if the above mentioned conditions are met.

1. The frequency of the AA genotype will be p 2. 2. The frequency of the Aa genotype will be 2pq. 3. The frequency of the aa genotype will be q 2. 4. The mathematical derivation of this law is shown in Appendix A.

IV. Uses of the Hardy­Weinberg Equilibrium.

The estimation of gene frequencies is important, since it allows us to estimate the frequency of carriers, the probability of mating types, and the recurrence risk of genetic disorders.

A. Autosomal recessive

1. Assume all aa genotypes have the disorder. AA and Aa do not have the disorder. q is the frequency of a (the mutant allele) and p is the frequency of A.

2. Genotype frequency of aa is q 2.

3. The frequency of the disorder in the population (the incidence) directly estimates q 2.

4. q equals the square root of the frequency of the disorder.

5. Example: In U.S. Caucasians, the frequency of cystic fibrosis is 1/2500, so q 2 =1/2500. Thus, q = 0.02 and p = 1 ­ 0.02 = 0.98. The frequency of carriers is 2pq or 2 x 0.02 x 0.98 = 0.04 which is equal to 1/25.

6. Note: for rare recessive diseases, q is very small and p is nearly equal to 1. Thus, the frequency of carriers is approximately 2q.

B. Autosomal dominant

1. Assume all AA and Aa genotypes have the disorder, aa is unaffected. q is the frequency of A (the mutant allele) and p is the frequency of a.

2. The disease frequency is equal to 2pq + q 2. 3. If q is quite small and p quite large (near 1), the majority of affected individuals are heterozygotes (2pq >> q 2 ) and the disease frequency is approximately equal to 2q.

4. Consequently, an estimate of q is given by the disease frequency divided by 2.

5. Example: the frequency of Achondroplasia is 1/20,000, thus q = 1/40,000.

C. X­linked dominant

For an X­linked dominant trait, males only have one copy of the gene, so the frequency of the disease in males is the frequency of the disease allele. From female data, estimation of gene frequency may be carried out in the same way as for autosomal dominant traits.

Note: remember that the values of p and q do not differ in males and females. What differs is the way in which they are computed from male and female disease frequencies.

D. X­linked recessive

For an X­linked recessive trait, males again have only one copy of the gene, so the disease frequency in males is the frequency of the disease allele (q). In females, the disease frequency will be the square of the disease frequency in males (q 2) , which can be almost zero for rare X­linked recessive traits. Note in the table below how the incidence of disease differs in males and females.

The incidence of X­linked recessive disorders in males and females (assuming no selective disadvantage)

Incidence in Males (q) 0.01 0.05 0.1 0.2

Incidence in Females (q 2) 0.0001 0.0025 0.01 0.04 Ratio Males : Females 100 to 1 20 to 1 10 to 1 5 to 1 Table 2

V. Detecting Departures from H­W Equilibrium.

If we can compute all genotype frequencies in a population, then we can test whether these frequencies are significantly different than those expected under Hardy­Weinberg equilibrium. For example, for the sickle cell gene (S=sickle mutation, A=normal allele), the following are the observed and expected frequencies in one population survey: Genotype Observed Frequency Expected Frequency 1 Ratio Observed/Expected SS 29 187 0.155 SA 2,993 2,673 1.12 AA 9,365 9,527 0.983 Total 12,387 12,387 Table 3 1 Observed frequency of S allele = [(29 x 2) + 2,993]/(2 x 12,387)] = 0.123. Expected frequency of S/S: (0.123) 2 x 12,387 = 187.

These frequencies indicate that H­W equilibrium does not hold. There appears to be an over­ abundance of heterozygotes and a large under­representation of S/S homozygotes.

VI. Factors Contributing to Departures from Hardy­Weinberg Equilibrium

A. Non­random mating, including stratification, assortative mating and consanguinity.

1. Stratification describes mate selection which is (fully or partially) restricted to a given sub­group within the population. An example is African Americans, Caucasians, Native Americans and Hispanics in the U.S. Stratification may result in:

a. Markedly different gene frequencies in different sub­groups. (Example: increased frequency of familial hypercholesterolemia in French Canadians)

b. Increased homozygosity in the population (Example: Tay Sachs disease in Ashkenazi Jews)

2. Assortative mating describes selection of mates based on the phenotype. Examples: Congenital deafness and achondroplasia. Assortative mating may result in:

a. Increased homozygosity in the population. b. Increased risks to the offspring of assortative matings, if the mutation is the same in both parents.

3. Consanguinity refers to inbreeding or matings between relatives. Consanguineous matings have an increased risk of homozygosity in their offspring. Rare recessive disorders usually result from such matings.

a. Consanguinity is measured by the coefficient of inbreeding. This is the probability that an offspring is homozygous and has received both alleles from an identical ancestral source. b. Coefficients of inbreeding become smaller as the relationship becomes more distant.

c. The higher the coefficient of inbreeding, the greater the risk that two rare recessive alleles will come together in a homozygote.

VII.

Table 4: Coefficients of relationship and inbreeding plus risk to offspring for autosomal recessive disease

B. Selection against mutant alleles

If one genotype is more likely to successfully reproduce, departure from H­W equilibrium will occur.

1. The fitness of a particular genotype (denoted f) may be described on a scale of 0 (no reproduction) to 1 (average reproduction). Medical advances may alter the fitness of a particular genotype. 2. f is equal to 1 ­ s, where s is the coefficient of selection.

3. Selection against autosomal dominant alleles is much stronger than against autosomal recessive alleles, since the latter are not expressed in heterozygote form.

4. For X­linked recessive disorders, males are more exposed to selective pressures than are females. Example: males affected with DMD do not reproduce. The mutant gene is transmitted through carrier females.

5. Gene frequencies are reduced by reduced fitness, but restored through new mutations. H­W equilibrium can be maintained if the alleles lost in one generation through selection are replaced in the next generation through mutation. This is known as mutation­selection equilibrium.

a. If an autosomal dominant disease is a genetic lethal (i.e., everyone carrying the mutant allele either dies or does not reproduce) then every case must arise as a new mutation. The frequency of mutation of the abnormal allele can be calculated as the frequency of disease. (This calculation assumes that affected individuals are heterozygotes. If there are “A” affected people in a population of size “N”, then there are “A” mutant alleles in an allele population of size “2N”. The disease frequency is A/N and the mutation frequency A/(2N).) Mutation frequencies for dominant diseases range from 10 ­6 to 10 ­4.

b. For an X­linked recessive disease that is genetically lethal in males, the frequency of new mutations is 1/3 the frequency of the disease in males. This is because 1/3 of all mutant X­linked alleles occur in males and are lost in every generation (The other 2/3 occur on the 2 X chromosomes of females. They are typically not lost since carrier females transmit the mutant allele). If the disease frequency in males is equal to q, then 1/3q disease alleles are lost. If the disease frequency does not change, the mutant alleles which are lost must be replaced by new mutations in the next generation. Example: Duchenne muscular dystrophy.

C. Selection for heterozygotes.

In some cases, heterozygotes have an increased selective advantage over both homozygotes. Example: Sickle cell anemia. Such an advantage can maintain the presence of both alleles in the gene pool.

D. refers to changes in gene frequencies due to the random sampling of a small number of genotypes. Drift can occur when a small sub­group separates from a larger population. If the founder of a new subgroup carries a very rare allele, the allele may become fixed in the new subgroup. This is known as the founder effect.

E. Gene flow refers to the diffusion of genes across racial or ethnic barriers. Gene flow typically accompanies migration.

F. An increase or decrease in mutation rate (due to changes in environmental conditions) can also cause departures from H­W equilibrium.

KEY WORDS

Gene frequency, carrier frequency, Hardy­Weinberg equilibrium, assortative mating, selection.

APPENDIX A

Derivation of the Hardy­Weinberg Equilibrium

1. Assume that at a given locus there are two alternative alleles, A and a, which respectively have population gene frequencies p and q.

2. Since humans are diploid organisms, their genotypes may be AA, Aa, or aa. These genotypes have frequency p 2, 2pq, and q 2.

3. Now, we must calculate the frequency of genotypes in the offspring. To do this, we need to know (a.) the frequency of mating types and (b.) the proportions of different offspring genotypes that a mating combination will produce. For example, if the father has 2 genotype AA (frequency fF=p ) and the mother has genotype Aa (frequency fM = 2pq) the 3 frequency of these matings will be fF x fM = 2p q. Half the offspring will be AA (frequency 0.5 x 2p 3q) and half will be Aa (frequency 0.5 x 2p 3q) .

a. Create a table of all possible mating types. b. Add up the offspring genotype frequencies from the AA columns of the table. c. Repeat steps 2 and 3 for Aa and aa genotypes. d. Simplify the results. Table 5: (Table 7­4 in text)

APPENDIX B

Application of Probability I Genetic Counseling

Q. Albinism is an autosomal recessive disease that produces a lack of skin and hair pigmentation. The frequency of the gene or allele for albinism in a population is known to be 1/190. A woman with albinism marries an unrelated and unaffected man with no family history of albinism. What is the risk of albinism in their offspring?

A. This problem can be answered by using (1) the principles of Hardy Weinberg equilibrium to derive the frequency of carriers of the mutant allele for albinism and (2) the multiplication rule of probability. Specifically, we must calculate:

P male carrier x P male carrier transmits mutant allele x P affected female transmits mutant allele

Since:

P male carrier is approximately equal to 2q or 2/190 = 1/95. and

P male carrier transmits mutant allele = 1/2 and P affected female transmits mutant allele = 1

The overall probability is 1/95 x 1/2 x 1 = 1/190.

Application of Probability II Forensics In the New York case of People vs. Shi Fu Huang, the defendant was indicted for second degree murder. Blood samples which did not match the victim’s were found at the scene of the crime. When Mr. Huang was taken into custody three days after the murder, a one­inch cut was noted on his forearm. A DNA southern blot analysis of two loci (“A” and “B”) was conducted on the blood sample from the crime scene and Mr. Huang’s blood. This analysis revealed that the genotype at the “A” locus was 7/8 and at the “B” locus, 21/31, in both blood samples.

The prosecution argued that, based on allele frequencies in the U.S. Caucasian population (shown below), the probability of Mr. Huang’s DNA matching the perpetrator’s DNA was 2.4 in 100 million if Mr. Huang did not commit the crime.

The defendant objected to the admission of DNA evidence at the trial. He argued that the data used to support the statistical probability of the DNA match was inadequate. His objection was sustained and the DNA results were not admitted as evidence. Mr. Huang was eventually acquitted.

Q. After the trial, allele frequencies for the “A” and “B” locus were determined on a large sample of individuals from mainland China (also shown below). If these frequencies had been available at the time of the trial, do you think the outcome would have been the same?

Why?

Locus Allele Allele Frequencies U.S. Caucasians China A 7 0.02 0.22 8 0.001 0.11 B 21 0.01 0.1 31 0.03 0.39

A. The principles of Hardy­Weinberg equilibrium and the multiplication rule of probability can be used to determine the probability that Mr. Huang was innocent but had the genotype 7/8, 21/31 given that he was either part of the U.S. Caucasian population or the Chinese population.

This probability can be computed as

P 7/8 in a given population x P 21/31 in a given population which equals

(2 x 0.02 x 0.001) x (2 x 0.01 x 0.03) = 2.4 x 10 ­8 for U.S. Caucasians and

(2 x 0.22 x 0.11) x (2 x 0.10 x 0.39) = 0.0037 for Chinese

Thus, nearly 4 out of every 1000 Chinese individuals are expected to have the genotype of the perpetrator. DNA evidence, alone, would not be sufficient to convict Mr. Huang.

Application of Probability III Genetic Counseling Q. Tay­Sachs disease, an autosomal recessive condition, has an incidence of about 1 in 3000 in the Jewish population. A healthy man of Jewish ancestry presents for care because his sister died from Tay­Sachs at age three years, and he is concerned about the chance for him and his wife (also of Jewish ancestry) to have a child with Tay­Sachs disease. The remainder of the family history on both sides is negative for other individuals with Tay­ Sachs. What is the chance for this couple to have a child affected with Tay­Sachs disease?

A. The principles of the Hardy­Weinburg equilibrium and the multiplication rule of probability may be used to solve this problem. To determine the risk of any child of this couple to be affected with Tay­Sachs disease, you will need to determine the chance for each of the parents to be carriers of Tay­Sachs and the chance for an affected child when both parents are carriers. This may be represented as

P father is carrier x P mother is carrier x P affected child/both parents carriers

Since the man’s sister had Tay­Sachs disease, an autosomal recessive condition, the mother and father of this man and his sister are both presumed to be carriers. Thus the chance for this man to be a carrier of Tay­Sachs is 2/3 (not ½ because we know the man is unaffected with Tay­Sachs disease).

So, P father is carrier is 2/3.

We may use the H­W equilibrium to determine the chance of the wife being a carrier of Tay­Sachs from the incidence of Tay­Sachs in the population. Recall that for autosomal recessive conditions the frequency of carriers is approximately 2q where q is the frequency of the disease allele, and q may be determined from the square root of the incidence. So incidence = 1/3000 = q 2, thus q = sq root(1/3000) = 1/55 frequency of carrier = 2q = 2(1/55) = 1/28 So, P mother is carrier is 1/28.

When both parents are carriers of an autosomal recessive condition, the chance for a child to be affected is ¼.

So, P affected child/both parents carriers is ¼.

The overall probability is P father is carrier x P mother is carrier x P affected child/both parents carriers = 2/3 x 1/28 x 1/4 = 1/168.