MISAT - Microsatellite analysis by maximum likelihood Copyright 1997 (c) by Rasmus Nielsen. Any injury or loss due to the use of this software is not the responsibility of the author. This software is provided "as is" without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.

What does this program do? This program enables estimation of  = 4N (N = population size,  = mutation rate) by maximum likelihood for a microsatellite locus. It may also be applied to test the one-step mutation model against a multi-step mutation model by a likelihood ratio test. It uses the data from a microsatellite population sample, i.e. it assumes that the data is obtained by random sampling from a randomly mating population.

How does the program work? The program estimates the likelihood surface of the fundamental population genetical parameter  for a microsatellite locus by a Markov chain recursion method. It thereby provides a maximum likelihood estimate of  and an approximate confidence interval for . It can also estimate the joint likelihood surface for  and a parameter p = proportion of multi-step mutations. The hypothesis of no multi-step mutations can thereby be tested by a likelihood ratio test. For details regarding the estimation procedure please see Nielsen, R. 1997. A likelihood approach to microsatllite population samples. Genetics. Because this program applies a Monte Carlo method for estimating the likelihood surface, the estimation procedure may be very time consuming. Likewise, the likelihood values obtained are estimates that may deviate from the true likelihood value. In the current version of the program, runs through the Markov chains are truncated such that a third decimal place error may occur. If you for some reason would like greater precision in the estimate of the likelihood value, please contact the author. How do you use the program? To use the program you first need to create an infile (see below). Place the infile in the same directory as MISAT and start the program. The program will first prompt you for the name of the infile. Thereafter, you will be asked about the type of the locus. If it is a dimer locus you press 2, if it is a tetramer then you press 4 etc. Thereafter, you will be asked for four options:

1. Gridsize? 2. Use moments estimate for theta0? 3. Number of runs through Markov chain? 4. Estimate proportion of multi-step mutations? 5. Adaptive runs?

Option 1: The gridsize determines the number of likelihood values on a grid the program should obtain. The default value is 40 and this value will be sufficient in most cases. Increasing the gridsize will slow down the estimation procedure. However, the time it takes the program to finish does not increase linearly with the number of gridpoints because an importance sampling scheme is applied to estimate the likelihood for many values of  at the same time. One initial value of  (0) is used to drive the Markov chain simulations.

Options 2: 0 is the value of  that is used to drive the simulations. The closer this value is to the true maximum likelihood estimate of , the better the estimation procedure will perform. The default value in the program is the method of moments estimate under the one-step model.

Option 3: The number of runs through the Markov chain determines how large the variance in the estimate of the likelihood will be. For most data sets the default value of 100,000 runs will be sufficient. However, for large data sets (more than 50 genes) more runs through the Markov chain should be performed. Options 4: Estimation of the proportion of multi-step mutations is useful for testing the one-step mutation model (see Nielsen 1997). However, the procedure is extremely time consuming and can only be recommended for people with access to one or more very fast computers that can be dedicated to the Markov chain estimation procedure. There are two reasons for this. First, each run through the Markov chain is much slower under the multi-step model. Second, at least 10 times as many runs are required in order to estimate the extra parameter (p). If you choose to estimate the number of multi-step mutations you should use a lower value of 0 than recommended for the 0ne-step mutation model.

Option 5. When option 5 is chosen the program continuously updates the value of 0 . This option should be chosen in most cases.

If you do not want to estimate the proportion of multi-step mutations or perform the test of the one-step mutation model and your data set is of small or moderate size you will in most cases not want to change any of the options.

Creating the infile The infile should contain the data from one population sample from one locus. It should consist of two columns. The first column should contain the amplification fragment sizes of the alleles and the second column should contain the counts of each allele. The end of the file should contain a 0 (zero). Example:

22 1 24 9 28 4 30 23 32 2 36 4 0 In the above sample there are 1 copy of size 22, 9 copies of size 24 etc. You can use any text editor to create the infile. If you use an editor such as MS-Word that creates formatted files as a default, then remember to save the file as a text file (ASCII on a PC).

Interpretation of the output The output of the program is a likelihood surface and it is stored in a new file, ‘likesurface’, created by the program. The maximum likelihood estimate of  is the value of theta for which the largest (least negative) log likelihood value is obtained.

Please report any bugs to Rasmus Nielsen at e-mail: [email protected]