PopGen for WindowsTM Ver. 1.0 (Beta)

Jouni Aspi & Jaakko Lumme (1995) University of Oulu Department of Genetics Contents:

Page Contents: 2 What is PopGen 3 General description of the model 3

How to run the program? 4 Using Simulation Models 5 1. Selection 6 2. Genetic Drift 7 F-statistics 8 3. Migration 10 Stepping stones model 10 4. Hardy & Weinberg equation 11 Subdivided populations 12 Hardy-Weinberg with selfing 13 Sex differences in allele frequencies 14 What is PopGen?

PopGen is a simulation program designed to clarify various population genetic events. It has been programmed by Jouni Aspi using Microsoft's Visual Basic (ver. 3.0). The code is based on previous GW-Basic and QuickBasic programs by Jaakko Lumme and Jouni Aspi.

General description of the model

With PopGen you can simulate some deterministic and stochastic population genetic processes in a simple one locus, two allele system. There are two alleles A1 and A2. Frequency of allele A1 is p and frequency of allele A2 is q. Genotypes of individuals and their frequencies are:

Genotype A1A1 A1A2 A2A2 Frequency p2 2pq q2

With the models of PopGen you can study how these allele frequencies are affected by:

1. Genetic drift 2. Selection 3. Migration (stepping stone)

You can also study sample sizes you need to detect significant deviations from Hardy Weinberg proportions, when:

1. The studied population is divided to two subpopulations 2. Allele frequencies are not similar in different sexes 3. The mating is not random, but there is some inbreeding in the population How to run the program?

From Windows Program Manager select File from the Menubar and then Run. Then write a:popgen to the command line and press Enter.

Figure 1. The opening window of the program

Just click the window and you should be in the main window Figure 2. The main window of the program

You choose your model from the main window. To select the model choose Model from the Menubar and the desired model. The selected model is written in to the heading of the window (in the beginning it is Drift).

Generally you can change the parameters of the model by writing them to the text boxes or by using horizontal scroll bar to change them. You can get Help anytime by pressing F1-key or choosing Help and PopGen Help from the menubar of the Main Window. Using of simulation models

1. Selection

Natural selection is the driving force of evolution. By this simple one locus model you can study how selection changes the allele frequencies. This model is deterministic, it doesn't take into account the effect of genetic drift. You can only vary the fitnesses of different genotypes. Normally to the best genotype are given fitness one and the other fitnesses are relative to that.

Write the fitness values in text boxes of the frame titled: Fitness values of the genotypes or use horizontal scrollbars to change them. Then press OK button and the mean relative fitness of the population with respect of the frequency of allele A1 (p) will be drawn in the Fitness function picture box. The magnitude of a change in p with respect to its frequency during one generation will also be drawn in the Delta p frame.

Figure 3. The main window while running the Selection model To change the initial allele frequency of A1 allele (p), change Initial p from the frame titled Population parameters. You can also change the number of generations, but you can not change the population. In the Selection model of PopGen the population size is indefinite.

To start simulation click Start simulation button.

The allele frequency of A1 allele will be drawn in the frame on the bottom of the screen. The final frequency of the latest simulation is also written to the frame.

You can clear the bottom frame by clicking the Clear button.

2. Genetic drift

This stochastic model demonstrates the chance fluctuations in allele frequencies. Such fluctuations occur particularly in small populations as a result of random sampling among gametes. Like in the Selection model you can change (1) Initial frequency of A (p) and (2) The number of generations, but you can also define (3) the number of individuals in the studied populations.

If you are studying only the effects of random drift set all the fitness values as 1. To start simulation click Start simulation button. The program will simulate the fate of 100 populations (unless you stop the simulation by clicking the Stop simulation button) by starting from the give Initial p value. The number of the simulation will be shown on the bottom right corner of the window. The allele frequency of each simulated population will be shown in the bottom frame. The distribution of final frequencies will be shown in the frame on the upper right corner.

You can also study the joint effects of selection and drift by varying the fitnesses of different genotypes. Figure 4. The main window while running the Drift model

F-statistics

When you have finished your simulation, you can choose F-statistics from the Menubar. F-statistics is a statistical method to describe population subdivision. In this case it describes the accumulated differences between the simulated populations, which has started from the same initial p value. Because population subdivision entails an inbreeding-like effect, it is traditional to measure the effects in terms of decrease in the proportion of heterozygous genotypes. A subdivided population has three levels of complexity i.e. (1). Individual organisms (I), (2). Subpopulations (S), and (3). Total population The heterozygosity can also be measured in these three levels:

HI = the observed average heterozygosity of an individual in subpopulations

HS = the expected average heterozygosity of an individual in subpopulations

HT = the expected average heterozygosity of an individual in total population with no population subdivision

FIS is the INBREEDING COEFFICIENT. It measures the reduction in heterozygosity of an individual due to non-random mating within its subpopulation. Thus,

H S - H I F IS = H S

Because the mating within subpopulations in our simulations is random, the overall Fis should be near zero.

FST is the FIXATION INDEX. It is the reduction in heterozygosity of a subpopulation due to the random genetic drift. Thus,

HT - H S F ST = HT

It quantifies the effects of population subdivision and is always positive.

FIT is the OVERALL INBREEDING COEFFICIENT. It includes a contribution due to actual non-random mating within subpopulations (Fis) and another contribution due to the subdivision itself (Fst). Thus,

HT - H I F IT = HT

FIT measures the reduction in heterozygosity of an individual relative to the total population. 3. Migration

Migration, which refers to the movement of the individuals between subpopulations, is the "glue" that holds subpopulations together, that sets a limit to how much genetic divergence can occur. If the subpopulations are finite in size, then genetic drift may result in random differences among them even with migration.

Stepping-Stone Model

If migration is restricted to adjacent populations, then movement forms a stepping-stone pattern. The stepping-stone model can be one- or two-dimensional. The current simulation model is one-dimensional:

┌───┐ m ┌───┐ m ┌───┐ m ┌───┐ m ┌───┐ │ N │ -> │ N │ -> │ N │ -> │ N │ -> │ N │ └───┘ <- └───┘ <- └───┘ <- └───┘ <- └───┘

N = population size of the island m = migration rate between the islands

In the Migration: stepping stone model of PopGen you can change:

(1) Population size of an island (N) (2) initial frequency of A1 (p) (3) Migration rate (m)

The PopGen model demonstrates fluctuations in allele frequencies in the subpopulations. They are presented in the frame of the upper right corner titled p in subpopulations.

The model also shows the change in total heterozygosity due to population subdivision. The program calculates and draws the expected heterozygosity (2pq = 1-p²-q²) and the expected minus observed heterozygosity in the frame on the bottom during one hundred generations (unless you stop the simulations from by clicking the Stop simulation button). The program also shows the mean p of all populations in the bottom frame. Figure 5. The main window while running the Migration: stepping stones model

In the Migration: stepping stones model of PopGen you can also study the joint effects of drift, selection and migration by varying the fitnesses of different genotypes.

4. Hardy & Weinberg equation

With PopGen you can also simulate how large samples are generally needed to detect statistically significant deviations from expected Hardy & Weinberg proportions, when:

1. The studied population is divided into two subpopulations 2. Allele frequencies are not similar in different sexes 3. The mating is not random, but there is some inbreeding in the population To select the desired model choose Model from the Menubar. The selected model is written in the frame on the upper left corner.

Subdivided populations (Wahlund effect) Figure 6. The Hardy-Weinberg window while running the Wahlund effect model

This model simulates the situation when the population you have sampled consist in reality of two populations with different allele frequencies.

Write sample size and allele frequencies in different populations in text boxes (or use horizontal scrollbars to change the optional values), and click the Start sampling button. The expected genotype frequencies in different populations are drawn in the picture box in right upper corner (titled Genotype frequencies). The program writes the number of different genotypes drawn randomly from the different population in the text boxes in the frame titled Number of individuals sampled. They are also drawn in the Genotype frequency picture box as circles. The program writes the sampled and also the a priori (expected if allele frequency of A1 is the mean of the two populations) and expected genotype frequencies in the frame on the bottom left. A priori, expected and sampled allele frequencies are also given. Note that the two latter are always the same.

The program also calculates the test value (G2) and probability of a log- likelihood test given the null hypothesis that there are no differences between the sampled and expected genotype frequencies.

Hardy-Weinberg with selfing

With this model you can simulate the situation in which pairing of gametes is not random, because of some inbreeding in the population. You can change the selfing rate (between 0-1), allele frequency of A1 (p) and sample size.

The program draws the expected genotype frequencies in different sexes in the picture box in right upper corner (titled Genotype frequencies). The program writes the number of different genotypes, drawn from the population with the given selfing rate, in the text boxes in the frame titled Number of individuals sampled. They are also drawn in the Genotype frequency picture box as circles. The program writes the sampled and also the a priori (expected if allele frequency of A1 is the given) and expected genotype frequencies in the frame on the bottom left. A priori, expected and sampled allele frequencies are also given. Note that the two latter are always the same. Finally, the program also calculates the test value (G2) and probability of a log-likelihood test given the null hypothesis that there are no differences between the sampled and expected genotype frequencies. Figure 7. The Hardy-Weinberg window while running Hardy-Weinberg with selfing model

Sex differences in allele frequencies

With this model you can simulate the situation in which allele frequencies in different genotypes are different. You can change the allele frequencies of A1 (p) in females and males and sample size.

The program draws the expected genotype frequencies in different sexes in the picture box in right upper corner (titled Genotype frequencies). The program writes the number of different genotypes drawn from the population in the text boxes in the frame titled Number of individuals sampled. They are also drawn in the Genotype frequency picture box as circles. The program writes the sampled and also the a priori (expected if allele frequency of A1 is the given) and expected genotype frequencies in the frame on the bottom left. A priori, expected and sampled allele frequencies are also given. Note that the two latter are always the same. Finally, the program calculates the test value (G2) and probability of a log-likelihood test given the null hypothesis that there are no differences between the sampled and expected genotype frequencies.

Figure 8. The Hardy-Weinberg window while running Sex differences in allele

frequencies model

Acknowledgement: We thank David Clancy for his comments on this program.