(2003) 90, 459–467 & 2003 Nature Publishing Group All rights reserved 0018-067X/03 $25.00 www.nature.com/hdy

Mapping viability loci using molecular markers

L Luo and S Xu Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA

In genetic mapping experiments, some molecular markers likelihood (ML) method that uses the observed marker often show distorted segregation ratios. We hypothesize that as data and the proportions of the genotypes of these markers are linked to some viability loci that cause the the viability as parameters. The ML solutions are observed segregation ratios to deviate from Mendelian obtained via the expectation–maximization algorithm. Appli- expectations. Although statistical methods for mapping cation and efficiencies of the method are demonstrated and viability loci have been developed for line-crossing experi- tested using a set of simulated data. We conclude that ments, methods for viability mapping in outbred populations mapping viability loci can be accomplished using similar have not been developed yet. In this study, we develop a statistical techniques used in quantitative trait locus mapping method for mapping viability loci in outbred populations using for quantitative traits. a full-sib family as an example. We develop a maximum Heredity (2003) 90, 459–467. doi:10.1038/sj.hdy.6800264

Keywords: EM algorithm; four-way cross; maximum likelihood; segregation distortion

Introduction (Lander and Botstein, 1989). Fu and Ritland (1994a,b) first utilized a QTL mapping approach to map viability The genetic consequence of selection is the change in (a fitness component) loci under the maximum like- frequencies of the affecting fitness. The process of lihood (ML) framework. Mitchell-Olds (1995) also is reflected by the dynamic change of proposed a similar ML method for viability mapping in frequencies by selection and other evolutionary agents. F2 families. Recently, Vogl and Xu (2000) investigated a Fitness is a complicated trait, which can be decomposed Bayesian method to map viability loci in a backcross into many fitness components (Falconer and Mackay, family. All the aforementioned existing methods deal 1996; Hartl and Clark, 1997). Therefore, the genetic with line-crossing experiments that require inbred lines. variance of fitness is considered to be controlled by the Inbred lines, however, may not be available for many segregation of multiple genes. Fitness behaves like a species, such as humans, large animals and trees quantitative trait. It responds to natural selection with a (Hedrick and Muona, 1990). Mapping viability loci may response equal to the genetic variance of fitness (Fisher, be more relevant to natural populations than to line 1958). To study the of fitness, it is crosses. This is equivalent to the situation where important to explore the change of gene frequency of mapping QTLs is more relevant to breeding populations at individual loci. However, only in very limited than to designed line crosses. However, it is easier to situations, for example, where allozyme markers are map QTLs in line-crossing experiments because we can available, can we evaluate natural selection on individual control the genetic background and environments. After loci. In most situations, we do not know what the genes QTL are mapped in line crosses, the results may be are and where in the genome the genes are located. extended to natural populations or used to find homo- With the rapid development of molecular technology, logous loci in closely related species. Similarly, viability large amounts of molecular data are now available, loci may be mapped in line crosses and the inference which provide a great opportunity to estimate the effects later extended to natural populations. In this study, we and locate the chromosomal positions of loci responsible attempt to map viability loci directly in outbred popula- for complicated traits, for example, quantitative traits. tions. Full-sib families are the simplest outbred popula- The technology is now called quantitative trait locus tions. Although not necessarily natural populations, they (QTL) mapping. Since fitness is just another complicated are one step closer to natural populations than are line trait with a polygenic background, a similar technology crosses. can be applied to map loci determining variation in The fitness of a at a locus is the average fitness. fitness of all individuals bearing this genotype. If we Although it does not seem easy to map fitness loci, assign the fitness for the ‘best’ genotype a value of one, statistical methods of mapping QTL can be adopted the selection coefficient for an arbitrary genotype is defined as the reduction in fitness from this maximum value. Therefore, we only describe the measurement of fitness (rather than the selection coefficient) in subse- Correspondence: S Xu, Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA. quent discussion. Viability is only one of many compo- E-mail: xu@.ucr.edu nents of fitness. Fecundity is another important Received 7 June 2002; accepted 17 January 2003 component. In this study, however, we focus only on Mapping viability loci L Luo and S Xu 460 loci responsible for viability selection, assuming that all of the three independent parameters, as shown surviving individuals have an equal fecundity. below: We develop a model of viability mapping that uses a 1 s d full-sib family derived from the mating of two unrelated w11 ¼ 4ð1 þ w Þð1 þ w Þþd outbred parents. A full-sib family contains four different 1 s d w12 ¼ ð1 þ w Þð1 À w ÞÀd alleles at a single locus, rather than two as is usually 4 ð3Þ 1 s d assumed in inbred line crosses. Mapping in a full-sib w21 ¼ 4ð1 À w Þð1 þ w ÞÀd family requires the general rule of allelic transmission w ¼ 1ð1 À wsÞð1 À wdÞþd from parents to children and thus the algorithm can 22 4 be extended to pedigree analysis. The method can be This model is important in hypothesis tests and directly applied to fitness analysis for open-pollinated computer simulations that will be discussed in later plants. sections.

Theory and methods ML estimation We first assume that the four alleles of the viability locus Genetic model of fitness in the parents are distinguishable and the genotypes are Consider a single viability locus and a full-sib family. observable. Suppose that we sample n individuals from Denote the genotypes of the sire (paternal parent) and the full-sib family in question. Let us define dam (maternal parent) by As As and AdAd, respectively. 1 2 1 2 y ¼½y y y y Š for j ¼ 1; ...; n Mating between the two parents will generate progenies j jð11Þ jð12Þ jð21Þ jð22Þ 0 0 each with one of the four possible genotypes: where yjðklÞ ¼ 1 and yjðk0l0Þ ¼ 0 for k 6¼ k and l 6¼ l if f s d; s d; s d; s dg s d A1A1 A1A2 A2A1 A2A2 . Under the assumption of individual j takes genotype AkAl . We now have the Mendelian segregation, the four genotypes will have an data, y, and the parameter, w, which allow the construc- equal frequency, that is, 1. If this locus is subject to 4 tion of the log"# likelihood: viability selection, we will observe two or more Xn X2 X2 genotypes, which have frequencies different from Men- LcðwÞ¼ yjðklÞ lnðwklÞ ð4Þ delian expectations. j¼1 k¼1 l¼1 To model viability selection, we define the underlying frequencies of the four genotypes in the progeny The ML estimate of w is simply Xn byP a vector w ¼½w11 w12 w21 w22 Š for 0  wkl  1, 1 w ¼ 1 and k; l ¼ 1; 2. These frequencies are now w^ kl ¼ yjðklÞ ð5Þ kl kl n defined as the relative fitness of the four genotypes. This j¼1 is a little different from the usual definition of relative for k; l ¼ 1; 2. fitness in which the maximum fitness is set to one In fact, the genotype of a viability locus cannot be and the rest expressed as reduced values relative to observed and we must use markers to infer the genotype. one.Âà Deviation of w from the Mendelian vector Unless the viability locus is located exactly at a fully ¼ 1 1 1 1 w0 4 4 4 4 reflects the intensity of viability informative marker, inference will be subject to error. The selection. amount of error depends on the distances of the viability The fitness of a genotype can be decomposed into the locus from marker loci, the level of marker polymorph- product of the fitness of the two alleles that make up the ism and the genotypes of the markers. As a result genotype and a deviation reflecting the interaction of the error, we are not certain about the actual genotype between the two alleles, called the effect, of the viability locus for each individual, even though that is, we can observe the marker genotypes. The viability locus can take any one of the four genotypes, but w ¼ wswd þ d ð1Þ kl k l kl with a different probability for each genotype given s d where wk and wl denote the relative fitness of the kth the marker information. Define the four condi- of the sire and the lth allele of the dam, tional probabilities of the given viability locus markers respectively, and dkl is the dominance effect. This Pby p P¼½pjð11Þ pjð12Þ pjð21Þ pjð22Þ Š for 0  pjðklÞ  1 and 2 j 2 partitioning of the fitness is important because we can k¼1 l¼1pjðklÞ ¼ 1. This is a typical problem of missing separate gametic selection from zygotic selection using values in statistics where we can use the expectation- statistical technology. Note that there are four possible maximization (EM) algorithm to solve for the MLE. The genotypes in the progeny, but after the decomposition actual incomplete-data log likelihood! is we have eight parameters. Therefore, we must impose Xn X2 X2 some restriction to the parameters to make the model LðwÞ¼ ln pjðklÞwkl ð6Þ estimable. We take the restrictions similar to those used j¼1 k¼1 l¼1 in the four-way cross model (Xu, 1998) and define three new independent parameters: where the missing data y have been ‘integrated out’. There are several ways to solve the MLE, but we take the s w ¼ w11 þ w12 À w21 À w22 EM algorithm (Dempster et al, 1977). ð0Þ d First, we choose an initial value w and calculate the w ¼ w11 À w12 þ w21 À w22 ð2Þ ð0Þ expectation of yjðklÞ conditional on w ¼ w , d ¼ w11w22 À w12w21 ð0Þ pjðklÞw E½ y Š¼y^ ¼ P P kl ð7Þ It is interesting to know that the fitness values of jðklÞ jðklÞ 2 2 ð0Þ the four genotypes can be expressed as functions k0¼1 l0¼1 pjðk0l0Þwk0l0

Heredity Mapping viability loci L Luo and S Xu 461 which is also called the posterior probability of yjðklÞ.We to enforce the restriction is to make the substitutions, 1 1 have now completed the expectation step (E-step). The w12 ¼ 2 À w11 and w22 ¼ 2 À w21, which reduces maximization step (M-step) is simply to replace yjðklÞ in the number of parameters to two, w11 and w21. The EM equation (7) by the conditional expectation, solutions of these two parameters are P Xn n Xn ð1Þ 1 1 j¼1y^jð11Þ 1 w ¼ y^ ð Þ ð8Þ w^ ¼ P P ¼ y^ kl n j kl 11 n ^ n ^ jð11Þ j¼1 2 j¼1yjð11Þ þ j¼1yjð12Þ n j¼1 This concludes the first iteration of the EM algorithm. and P The iteration continues until convergence at the tth n Xn ðtÞ 1 j¼1y^jð21Þ 1 iteration and the MLE takes wˆ ¼ w . According to the w^ ¼ P P ¼ y^ 21 n ^ n ^ jð21Þ invariance property of MLE, we have 2 j¼1yjð21Þ þ j¼1yjð22Þ n j¼1 s w^ ¼ w^ 11 þ w^ 12 À w^ 21 À w^ 22 n because the denominators equal 2 due to the restrictions. d ^ 1 ^ w^ ¼ w^ 11 À w^ 12 þ w^ 21 À w^ 22 ð9Þ The MLE of the remaining parameters are w12 ¼ 2 À w11 and w^ ¼ 1 À w^ . Under H , l will approximately ^ 22 2 21 s s d ¼ w^ 11w^ 22 À w^ 12w^ 21 follow a w2 distribution with one degree of freedom. The EM algorithm provides a convenient way to solve The null hypothesis that the two alleles carried by the the MLE, but it does not automatically give the dam have identical fitness is formulated by d s asymptotic variance–covariance matrix of wˆ , which must Hd: w ¼ 0; w 6¼ 0; d 6¼ 0, where the test statistic for Hd be obtained separately through some additional compu- is ld ¼À2½Lðwˆ dÞÀLðwˆ ފ, with Lðwˆ dÞ being the log tation (Louis, 1982). This is the drawback of the EM likelihood value obtained by maximizing LðwÞ under d algorithm compared to Fisher’s scoring method, which the restriction of w ¼ðw11 þ w21ÞÀðw12 þ w22Þ¼0. The EM solutions of the parameters are automatically provides an asymptotic variance–covar- P iance matrix for the MLE. However, Fisher’s scoring n y^ Xn 1 P j¼1 jPð11Þ 1 method requires calculation of the information matrix, w^ 11 ¼ ¼ y^ ð Þ 2 n y^ þ n y^ n j 11 which is not easy in the missing value problem. In j¼1 jð11Þ j¼1 jð21Þ j¼1 practice, we can use the bootstrap method (Efron, 1979) and to assess the variance–covariance matrix. The bootstrap P n ^ Xn method is computationally demanding, but the method 1 j¼1yjð12Þ 1 ^ ¼ P P ¼ ^ is executed only once after convergence has been reached w12 n n yjð12Þ 2 j¼1y^jð12Þ þ j¼1y^jð22Þ n and only on the positions that show significant evidence j¼1 ^ 1 ^ of viability selection. The MLE of the remaining parameters are w21 ¼ 2 À w11 ^ 1 ^ and w22 ¼ 2 À w12. 2 Hypothesis test Again, under Hd, ld will approximately follow a w Recall that the conditional probability of the viability distribution with one degree of freedom. locus genotype is calculated from marker information The null hypothesis that the dominance effect is absent d ¼ ; s 6¼ ; d 6¼ with the assumption that the location of the viability is formulated as Hd: 0 w 0 w 0. Let us define locus relative to the markers is known. Therefore, the 0 1 Xn Xn hypothesis test on the effects of the viability locus is 1 @ ^ ^ A actually a conditional test given the position of the a ¼ yjð11Þ þ yjð12Þ and n ¼ ¼ viability locus. If the test is not significant, we will 0 j 1 j 1 1 conclude that the current position of the Xn Xn 1 @ A being tested does not segregate for a viability locus. To b ¼ y^jð11Þ þ y^jð21Þ test the overall hypothesis of no viability selection, we n j¼1 j¼1 need to scan the entire genome (multiple tests). The null hypothesis (no viability selection) will be rejected if none The MLE under this restriction are w^ 11 ¼ ab, of the locus-specific tests is significant. We will discuss w^ 12 ¼ að1 À bÞ, w^ 21 ¼ð1 À aÞb and w^ 22 ¼ð1 À aÞð1 À bÞ. the overall test later and now focus on the test of an Again, the test statistic for Hd is ld ¼À2½Lðwˆ dÞÀLðwˆ ފ, individual locus. which follows approximately a w2 distribution with one The first null hypothesis is H0: w ¼ w0, which tests no degree of freedom. segregation distortion for the locus of interest. The test 1 statistic is l ¼À2½Lðw0ÞÀLðwˆ ފ, where Lðw0Þ¼n lnð4Þ¼ À1:3863n. Under the null hypothesis, l will approxi- Genome scanning mately follow a w2 distribution with three degrees of To scan viability loci for the entire genome, we need to freedom. move the putative position from one end to the other end If this null hypothesis is rejected, we can further test of the genome. The genotype of each chromosome the significance of each component. The null hypothesis position for each individual is inferred from marker that the two alleles carried by the sire have identical information, that is, pjðklÞ ¼ PrðyjðklÞ ¼ 1jIMÞ, where IM s d fitness is formulated by Hs: w ¼ 0; w 6¼ 0; d 6¼ 0. The stands for marker information. For outbred populations, test statistic for Hs is ls ¼À2½Lðwˆ sÞÀLðwˆ ފ where Lðwˆ sÞ not all markers are fully informative. Therefore, we is the log likelihood value obtained by maximizing adopted the multipoint method developed by Rao and s LðwÞ under the restriction of w ¼ðw11 þ w12Þ Xu (1998) to infer the probabilities of viability loci. This Àðw21 þ w22Þ¼0, which is achieved by using the multipoint method is identical to that of Kruglyak and Lagrange multiplier. A more intuitive and easier way Lander (1995) when the linkage phases of the parents are

Heredity Mapping viability loci L Luo and S Xu 462 Table 1 Parameter values used in the simulation experiments

Parameters Genetic model

Additive (A) Dominance (D) Both A and D

High Medium Low High(À) Low High(+)

ws 0.300 0.200 0.100 0.000 0.000 0.000 0.150 wd 0.300 0.200 0.100 0.000 0.000 0.000 0.150 d 0.000 0.000 0.000 À0.150 0.050 0.150 0.100

w11 0.4225 0.3600 0.3025 0.100 0.300 0.400 0.4306 w12 0.2275 0.2400 0.2475 0.400 0.200 0.100 0.1444 w21 0.2275 0.2400 0.2475 0.400 0.200 0.100 0.1444 w22 0.1225 0.1600 0.2025 0.100 0.300 0.400 0.2806

s11 0.0000 0.0000 0.0000 0.750 0.0000 0.000 0.0000 s12 0.4615 0.3333 0.1818 0.000 0.3333 0.750 0.6647 s21 0.4615 0.3333 0.1818 0.000 0.3333 0.750 0.6647 s22 0.7100 0.5555 0.3305 0.750 0.0000 0.000 0.3483

known. In our study, we focus on developing the genetic The mode of viability selection was investigated under model of viability mapping rather than the multipoint three levels: an additive model, a dominance model and method. Therefore, we assume that the parental marker a combination of both additive and dominance. For the linkage phases are known without error. This assump- additive model, we set d ¼ 0 and ws ¼ wd ¼ 0:1; 0:2; 0:3. tion holds very well when the family size is sufficiently From these parameters, the fitness values of the four large because the true linkage phases can be easily genotypes were generated. Under the dominance model, recovered using marker information of the progeny. we set ws ¼ wd ¼ 0 and d ¼À0:15; 0:05; 0:15. We also To find the optimal location of the viability locus on investigated one model with both the additive and the chromosome, we test all putative positions. However, dominance effects, that is ws ¼ wd ¼ 0:15 and d ¼ 0:1. the chromosome is a continuous linear structure, and From the three effects of the viability locus, we use there are an infinite number of putative positions. As equation (3) to calculate the actual fitness values of the usually done in interval mapping (Lander and Botstein, four possible genotypes. For example, when ws ¼ 1989), we scan the whole chromosome from one end to wd ¼ 0:15 and d ¼ 0:1, the four fitness values are the other by evaluating a position in every one or two w ¼ 1ð1 þ 0:15Þð1 þ 0:15Þþ0:1 ¼ 0:4306 cM. The likelihood ratio test statistic is then plotted 11 4 1 against the chromosomal position to form a test statistic w12 ¼ 4ð1 þ 0:15Þð1 À 0:15ÞÀ0:1 ¼ 0:1444 profile. The MLE of the position of viability locus takes 1 w21 ¼ 4ð1 À 0:15Þð1 þ 0:15ÞÀ0:1 ¼ 0:1444 the one where the peak occurs. The critical value used for w ¼ 1ð1 À 0:15Þð1 À 0:15Þþ0:1 ¼ 0:2806 declaring at least one viability locus on the entire genome 22 4 with a type I error rate of a ¼ 0:05 is found using the Following the conventional notation of natural selection, permutation test (Churchill and Doerge, 1994). we calculate the selection coefficient for the fitness of s d genotype AkAl using skl ¼ 1 À wkl=wmax (Hartl and Clark, 1997). These selection coefficients were used to Monte Carlo simulation s d determine whether an individual with genotype AkAl We simulated one chromosome of length 100 cM with 11 should be deleted from the mapping population markers evenly spaced. The two alleles of each parent at (Table 1). each locus were randomly assigned from five distin- Three different sample sizes under each of the above guishable alleles (randomly selecting two out of five). models were investigated, n ¼ 50; 100; 200. The estimated This generates markers with a range of information location of the putative viability locus under each content. A single viability locus was simulated at analysis took the position where the peak of the test position 25 cM, that is, between markers 3 and 4. The statistic profile occurred. The simulation was replicated following factors were considered in the simulations: the 100 times under each setting. The means and standard mode of viability selection, the intensity of viability deviations of the 100 replicates were used to evaluate the selection and sample size of the mapping population. performance of each parameter combination. The purpose of the simulation was not to compare the The empirical statistical power for each setting was relative efficiencies of different methods for viability calculated as the percentage of the replicates (out of 100 mapping (since there are no other methods to compare), simulations) with the highest (overall) test statistic (along nor to investigate the range of parameter values where the chromosome) greater than the empirical critical the method works best. Instead, we simply attempted to value. Thep expectedffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi standard error for the empirical demonstrate that the method works well and the test power is bð1 À bÞ=Nr where Nr is the number of statistic behaves as expected. From this simulation study, replicates. For example, if the true power is 1 À b ¼ 0:8, we try to validate our method and program of viability the standard error is 0.04, which is reasonably small. The mapping. critical value was obtained by simulating additional 1000

Heredity Mapping viability loci L Luo and S Xu 463 30 0:03410 and w^ 22 ¼ 0:3528. These estimated fitness values were converted into w^ s ¼ 0:2361; w^ d ¼ 0:2385 and d^ ¼ : 25 0 2071 using equation (2). The estimated effects are larger than the simulated effects, but maintain the same trend. The deviations are not larger than expected 20 considering the sampling errors with n ¼ 100. The average test statistic profile of the 100 replicates for this

15 setting is shown in Figure 1 (solid line), clearly showing the expected property of the test statistic profile for QTL mapping. 10 The empirical critical values appear to be quite Likelihood ratio test statistic independent of the sample size and they are about 14.5 at a ¼ 0:05 and about 18.0 at a ¼ 0:01. These empirical 5 2 critical values are clearly larger than w3;0:95 ¼ 7:815 and 2 w3;0:99 ¼ 11:34. Therefore, we used the empirical critical 0 values to declare significance. 0 102030405060708090100 Map position (cM) Means and standard deviations of the estimated parameters for various genetic models are given in Figure 1 Likelihood ratio test statistic profiles for the combined A/D s d Table 2 for n ¼ 50, Table 3 for n ¼ 100 and Table 4 for model w ¼ w ¼ 0:15 and d ¼ 0:1 with sample size n ¼ 100. The n ¼ 200. The results do follow the expected trends: the simulated position of the viability locus is located at position 25 cM (indicatedbythesolidbar).Thesolidlineistheaverageprofileof100 viability locus location is more accurately estimated as replicates, the dotted line is the profile of a randomly picked single the sample size and the selection intensity increase. run from the 100 replicates and the dashed horizontal line is the When the sample size is small, the estimated position of threshold value for the test statistic at a ¼ 0:05. the locus is severely biased towards the center of the chromosome. Besides these general trends, we found that the additive models are more sensitive to the samples under the null hypothesis. The highest test intensity of selection. Under different levels of para- statistics of the 1000 samples were ranked from the meters (high, medium and low), the accuracy of both the lowest to the highest. The empirical critical value took estimated viability locus location and parameters varies the 95th percentile of the distribution of the null samples. more than the dominance models and both additive and The test statistic profile of a single replicate for dominance (A/D) model. Overall, the A/D model gives the combined additive and dominance model the highest accuracy of estimation. (ws ¼ wd ¼ 0:15 and d ¼ 0:1) with sample size n ¼ 100 The empirical statistical powers under various genetic is demonstrated in Figure 1 (the dotted line). From the models and sample sizes are given in Table 5. The total test statistic profile, we can see that the viability powers are quite low for small sample size (n ¼ 50) and locus has been identified at position 23 cM, very close to are reasonably high when sample size reaches 200. These the true position (25 cM). The estimated effects for this observations are the same as those expected in the more particular run are w^ 11 ¼ 0:5902; w^ 12 ¼ 0:03290; w^ 21 ¼ usual QTL mapping studies. The results of these

Table 2 Means and standard deviations (in parentheses) of estimated parameter values for the EM algorithm with sample size 50

Parameters Genetic model

Additive (A) Dominance (D) Both A and D

High Medium Low High(À) Low High(+)

cMA 31.01 39.38 46.43 27.22 50.8 26.94 30.16 (18.35) (26.66) (31.52) (5.96) (34.59) (8.63) (20.01) ws 0.3219 0.2171 0.1254 0.0047 0.0399 -0.0327 0.1841 (0.1768) (0.2039) (0.1882) (0.1594) (0.1942) (0.1702) (0.1567) wd 0.3064 0.2814 0.0844 À0.0362 À0.0034 0.0076 0.1902 (0.1685) (0.1877) (0.2195) (0.1465) (0.1829) (0.1874) (0.1526) d 0.0042 À0.0098 0.0129 À0.1515 0.0391 0.1494 0.1052 (0.0474) (0.0516) (0.0532) (0.0297) (0.0639) (0.0354) (0.0411) w11 0.4392 0.3828 0.3231 0.0967 0.3036 0.3985 0.4623 (0.0919) (0.0875) (0.0965) (0.0550) (0.0816) (0.0827) (0.0771) w12 0.2318 0.2358 0.2496 0.4158 0.2264 0.0953 0.1398 (0.0832) (0.0972) (0.0899) (0.0735) (0.0896) (0.0666) (0.0579) w21 0.2240 0.2680 0.2291 0.3952 0.2047 0.1154 0.1429 (0.0842) (0.0885) (0.0877) (0.0653) (0.0861) (0.0520) (0.0623) w22 0.1250 0.1335 0.2182 0.1125 0.2853 0.4110 0.2751 (0.0553) (0.0721) (0.0843) (0.0526) (0.1052) (0.0838) (0.0732)

cMA: the estimated location of the viability locus.

Heredity Mapping viability loci L Luo and S Xu 464 Table 3 Means and standard deviations (in parentheses) of estimated parameter values for the EM algorithm with sample size 100

Parameters Genetic model

Additive (A) Dominance (D) Both A and D

High Medium Low High(À) Low High(+)

cMA 26.52 30.43 45.13 26.05 37.63 25.91 28.79 (11.58) (20.09) (33.50) (3.61) (26.76) (3.30) (12.13)

ws 0.3135 0.2241 0.0945 0.0001 À0.0035 À0.0223 0.1486 (0.1098) (0.1094) (0.1325) (0.1081) (0.1428) (0.1025) (0.1189) wd 0.3036 0.2105 0.1129 0.0103 À0.0002 À0.0083 0.1542 (0.1235) (0.1322) (0.1345) (0.1003) (0.1370) (0.1098) (0.1221) d 0.0013 À0.0056 À0.0037 À0.1521 0.0514 0.1470 0.1013 (0.0343) (0.0317) (0.0443) (0.0216) (0.0386) (0.0236) (0.0302)

w11 0.4320 0.3679 0.3030 0.1030 0.3027 0.3920 0.4356 (0.0651) (0.0679) (0.0677) (0.0365) (0.0607) (0.0521) (0.0654) w12 0.2297 0.2491 0.2492 0.4020 0.2005 0.1017 0.1437 (0.0555) (0.0560) (0.0677) (0.0471) (0.0625) (0.0315) (0.0473) w21 0.2247 0.2423 0.2584 0.4071 0.2022 0.1088 0.1465 (0.0512) (0.0482) (0.0634) (0.0537) (0.0611) (0.0361) (0.0441) w22 0.1234 0.1505 0.1993 0.0978 0.3045 0.4074 0.2842 (0.0463) (0.0456) (0.0595) (0.0291) (0.0665) (0.0536) (0.0530)

cMA: the estimated location of the viability locus.

Table 4 Means and standard deviations (in parentheses) of estimated parameter values for the EM algorithm with sample size 200

Parameters Genetic model

Additive (A) Dominance (D) Both A and D

High Medium Low High(À) Low High(+)

cMA 26.80 29.91 35.43 25.98 34.43 25.89 26.53 (6.43) (15.26) (27.34) (1.84) (22.34) (1.85) (4.34)

ws 0.2986 0.1862 0.1125 0.0021 À0.0095 À0.0050 0.1528 (0.0643) (0.0837) (0.0907) (0.0765) (0.0933) (0.0801) (0.0728) wd 0.3024 0.2177 0.1040 0.0081 0.0068 À0.0074 0.1539 (0.0681) (0.0898) (0.1009) (0.0651) (0.0894) (0.0659) (0.0717) d 0.0017 0.0019 À0.0008 À0.1479 0.0498 0.1517 0.1004 (0.0201) (0.0203) (0.0246) (0.0169) (0.0276) (0.0148) (0.0167)

w11 0.4255 0.3639 0.3073 0.1058 0.3004 0.3998 0.4339 (0.0366) (0.0385) (0.0432) (0.0229) (0.0472) (0.0372) (0.0340) w12 0.2262 0.2316 0.2514 0.3977 0.1972 0.1000 0.1449 (0.0337) (0.0413) (0.0412) (0.0365) (0.0382) (0.0195) (0.0262) w21 0.2281 0.2473 0.2471 0.4007 0.2054 0.0988 0.1454 (0.0315) (0.0359) (0.0444) (0.0348) (0.0426) (0.0236) (0.0301) w22 0.1250 0.1619 0.1990 0.1007 0.3018 0.4061 0.2806 (0.0237) (0.0315) (0.0390) (0.0239) (0.0402) (0.0347) (0.0309)

cMA: the estimated location of the viability locus.

simulations have verified the derivations of our methods individuals are collected. Another major fitness compo- and the computer programs; more importantly, they nent is the fecundity, that is, the number of progenies have demonstrated that viability locus mapping can be produced by the individual of interest. Fecundity is also accomplished following the usual approach of QTL related to the change of gene frequencies, but it affects mapping. the gene frequencies in the next generation. Fecundity is measured quantitatively and thus mapping fecundity Discussion loci can be directly accomplished using standard QTL mapping approaches. Therefore, we only focused on the The fitness considered here is a special fitness compo- statistics of mapping viability loci in this study. nent, the viability, which relates to the change of gene The ultimate result of viability selection in a population frequencies in the current generation where the mapping is the change in gene frequencies, but if we concentrate

Heredity Mapping viability loci L Luo and S Xu 465 Table 5 Empirical statistical powers (%) under type I error rates of 0.05 and 0.01

Sample size Type I error Genetic model

Additive (A) Dominance (D) Both A and D

High Medium Low High(À) Low High(+)

50 0.05 52 36 14 82 12 86 57 0.01 29 18 2 68 2 64 37

100 0.05 82 47 13 100 19 99 86 0.01 74 22 2 98 12 98 68

200 0.05 100 84 27 100 40 100 100 0.01 99 68 10 100 24 100 100

Table 6 The definitions of fitness parameters for an outbred population with F founder alleles

Paternal Maternal

w1 w2 ? wF P F w w ¼ w :w: þ d w ¼ w :w: þ d ? w ¼ w :w: þ d 1 11 1 1 11 12 1 2 12 1F 1 F 1F w1: ¼ Pk¼1 w1k F w w ¼ w :w: þ d w ¼ w :w: þ d ? w ¼ w :w: þ d 2 21 2 1 21 22 2 2 22 2F 2 F 2F w2: ¼ k¼1 w2k ^^ ^& ^^P d d ? d F wF wF1 ¼ wF:w:1 þ F1 wF2 ¼ wF:w:2 þ F1 wFF ¼ wF:w:F þ FF wF: ¼ k¼1 wFk P P P P F F F ? w:: ¼ w ¼ 1 w:1 ¼ k¼1 wk1 w:2 ¼ k¼1 wk2 w:F ¼ k¼1 wkF kl kl on one particular family or pedigree, the result of by our simulation studies where we converted the fitness viability selection is the deviation of allelic segregation values into selection coefficients by setting wmax ¼ 1 and from the expected Mendelian ratio. The non-Mendelian expressed the selection coefficients as skl ¼ 1 À wkl=wmax. segregation of a viability locus causes deviation from The estimated fitness values are very close to the true Mendelian segregation for markers linked to the viability values simulated. In fact, researchers often convert locus. The viability considered in this study is defined in the relative fitness into selection coefficients as we did the adult stage (genotype). However, the statistics in the simulations and investigate the magnitudes of the developed allow us to separate the gametic selection selection coefficients. In natural populations, people from zygotic selection. The maternal and paternal allelic often concentrate on the biallelic situation with only effects represent the gametic selection and the dom- three : A1A1, A1A2 and A2A2. Using the inance effect represents the zygotic selection. selection coefficients, researchers are able to investigate The purpose of the simulation studies is to demon- the degree of