| INVESTIGATION

Estimating Relatedness in the Presence of Null

Kang Huang,* Kermit Ritland,† Derek W. Dunn,* Xiaoguang Qi,* Songtao Guo,* and Baoguo Li*,1 *Shaanxi Key Laboratory for Animal Conservation, and College of Life Sciences, Northwest University, Xi’an, Shaanxi 710069, China, and †Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada ORCID ID: 0000-0002-8357-117X (B.L.)

ABSTRACT Studies of genetics and ecology often require estimates of relatedness coefficients based on genetic marker data. However, with the presence of null alleles, an observed genotype can represent one of several possible true genotypes. This results in biased estimates of relatedness. As the numbers of marker loci are often limited, loci with null alleles cannot be abandoned without substantial loss of statistical power. Here, we show how loci with null alleles can be incorporated into six estimators of relatedness (two novel). We evaluate the performance of various estimators before and after correction for null alleles. If the frequency of a null is ,0.1, some estimators can be used directly without adjustment; if it is .0.5, the potency of estimation is too low and such a should be excluded. We make available a software package entitled PolyRelatedness V1.6, which enables researchers to optimize these estimators to best fit a particular data set.

KEYWORDS relatedness coefficient; null alleles; method-of-moment; maximum likelihood

AIRWISE relatedness is central to studies of population site (Brookfield 1996). These incorrect genotypes can cause se- Pgenetics, quantitative genetics, behavioral ecology, and rious problems in the estimation of genetic relationships. For sociobiology (e.g., Mattila et al. 2012). The relatedness coefficient instance, in parentage analysis, genotyping error can mistakenly between individuals can be calculated from a known pedigree reject the true father due to an observed lack of shared alleles (Karigl 1981). In the absence of pedigree information, this co- with the offspring (Blouin 2003).Similarly,whenpairwisege- efficient can be estimated by using genetic markers. netic relatedness is estimated, genotyping error can cause true When codominant genetic markers are used to estimate re- relatives to be omitted. This is the subject of this article. latedness, some genotyping errors may occur. For example, in The first two error types can be eliminated by multiple-tube polymerase chain reaction-based markers, allelic dropout, false methods because the errors appear at random (Taberlet et al. allele, and null allele errors are common, especially with micro- 1996; He et al. 2011). Genotyping errors due to null alleles satellites. In allelic dropout, thelowqualityoftheDNAtemplate cannot be resolved in this manner. Null alleles are pervasive in inhibits the amplification of one of the two alleles in a heterozy- markers and many studies report null alleles (e.g., gote, causing the heterozygote to be observed as a homozygote Kokita et al. 2013). Solutions for dealing with null alleles are (Taberlet et al. 1996). False alleles are artificial alleles produced limited, but when null alleles are detected, they can be elimi- during amplification, resulting in homozygotes mistyped as het- nated by redesigning the primers or by developing new micro- erozygotes (Taberlet et al. 1996). Null alleles are alleles that satellite loci. However, both solutions involve additional time and cannot be detected because of , often within the primer material costs and are not freely available (Wagner et al. 2006). There are a number of estimators that have been developed fi Copyright © 2016 by the Genetics Society of America using different assumptions, each with varying ef ciencies under doi: 10.1534/genetics.114.163956 different conditions. A single estimator cannot fulfill the require- Manuscript received March 16, 2015; accepted for publication October 17, 2015; ments of all studies. In general, estimators can be classified into published Early Online October 21, 2015. Supporting information is available online at www.genetics.org/lookup/suppl/ two categories: (i) method of momentand (ii) maximum likelihood. doi:10.1534/genetics.114.163956/-/DC1. Method-of-moment estimators equate sample moments 1Corresponding author: Shaanxi Key Laboratory for Animal Conservation, and College of Life Sciences, Northwest University, Xi’an, Shaanxi 710069, China. with unobservable population moments and are generally E-mail: [email protected] unbiased but they are not optimal in terms of statistical

Genetics, Vol. 202, 247–260 January 2016 247 efficiency (Huang et al. 2015a). These estimators have been a population is large, outbred, and panmictic; (ii) the locus is developed to estimate only the relatedness coefficient (e.g., autosomal and inheritance Mendelian; and (iii) the loci used Queller and Goodnight 1989; Li et al. 1993; Loiselle et al. are unlinked and are not in linkage disequilibrium (i.e., Lynch 1995; Ritland 1996) or both relatedness and four- coef- and Ritland 1999; Wang 2002; Milligan 2003). Under these ficients simultaneously (i.e., Lynch and Ritland 1999; Wang assumptions, the presence of alleles and genotypes is random 2002; Thomas 2010). and concurs with the Hardy–Weinberg equilibrium. Thus, the The maximum-likelihood approach models the probability probability of occurrence for a genotype (i.e., Wang 2002; of observing a specific pairwise allele pattern, given two Milligan 2003) or a genotype conditioned on another genotype “higher-order” coefficients (f and D) and allele frequencies (i.e., Lynch and Ritland 1999) can be expressed as a function of (Milligan 2003; Anderson and Weir 2007). By searching the D and u. Subsequently, linear algebra methods (method-of- parameter space for values of f and D that maximize the prob- moment estimators) or numerical algorithms (maximum- ^ ability of the genotype pattern observed, maximum-likelihood likelihood estimators) can be used to solve D and f^: values can be determined for these parameters. The presence of null alleles can berealizedintwoways:(i)the Here, we modify four existing relatedness estimators observed pairwise genotypes will become either more or less (Lynch and Ritland 1999; Wang 2002; Anderson and Weir similar than their true values, and (ii) the sum of the frequencies 2007; Thomas 2010). We also present two new estimators of visible alleles becomes unified, but the ratio among frequen- based on two of these previously published works (Lynch and cies of visible alleles is unchanged. These two effects can cause Ritland 1999; Wang 2002) to estimate relatedness coeffi- an over- or underestimation of the relatedness coefficient. cients using codominant markers with null alleles. First, we Some methods can estimate the frequency of null alleles (e.g., compare the performances of all six estimators before and Brookfield 1996; Vogl et al. 2002; van Oosterhout et al. 2004; after correction, to evaluate the influence of null alleles on Kalinowski et al. 2006; Hall et al. 2012; Dabrowski et al. 2013). each estimator under different frequencies of null alleles. We thus model the probability that a genotype pair (Anderson Second, we calculate the minimum number of loci needed and Weir 2007) or a genotype pattern (Lynch and Ritland to achieve a predefined criterion of statistical power. Finally, 1999) or a genotype pair pattern (Wang 2002; Thomas 2010) we simulate a finite population with inbreeding and genetic is observed under the assumption that the true frequencies of drift to imitate natural conditions and evaluate the efficiency alleles are already known. Here, we describe briefly six estima- of each estimator. We make available the software entitled tors and give a correction of null alleles for each estimator. polyrelatedness V1.6, to help other researchers calculate and Data availability compare these estimators and simulation functions. File S1 contains (i) derivations of Expressions (3a), (3b), (9a) ^ Theory and Modeling (9b), (12), (14a) and (14b), (ii) derivations of ^r: and D of corrected Lynch and Ritland’s (1999) estimator, (iii) der- In this article, we adopt the identical-by-descent (IBD) defi- ivations of biases of method-of-moment estimators before nition of relatedness, which is the probability that an allele and after correction, and (iv) a summary of expressions of sampled from one individual at a locus is IBD to an allele from method-of-moment estimators. another individual. This definition is often asymmetric in inbred populations (the two reciprocal relatedness coeffi- cients from different individuals may be unequal), with the Lynch and Ritland’s (1999) Estimator geometric mean being equivalent to Wright’s (1921) defini- ’ tion in diploids (Huang et al. 2015b). The genetic correlation Lynch and Ritland s (1999) estimator models the proba- between a pair of individuals (x and y) sampled from an bility of an observed proband genotype conditioned on a outbred population has three modes: (i) each allele in x is reference genotype. To classify the degree of similarity for a IBD to one allele in y, (ii) a single allele in x is IBD to one allele pair of genotypes, this estimator employs a similarity index fi in y, and (iii) no alleles in x are IBD to any allele in y. If the denoted by S and de ned by probabilities of occurrences of the first and second events are 8 > 2 2 ; denoted as D (four-gene coefficient) and f (two-gene coeffi- <> 1ifAiAi AiAi or AiAj AiAj 3=4if A A 2 A A ; cient), respectively, the relatedness coefficient r between in- S ¼ i i i j (2) > 1=2if A A 2 A A ; dividuals x and y can be expressed as :> i j i k 0 otherwise; f r ¼ D þ : (1) where A denotes the ith allele, and A ; A ; and A are differ- 2 i i j k ent. By modeling the probability of each observed genotype For example, in an outbred population, parent–offspring pairs (i.e., Gy) when the reference genotype (i.e., Gx) is homozy- share an IBD allele with a probability of 1, so f ¼ 1 and gous or heterozygous, the following systems of linear equa- D ¼ 0; full sibs share one or two IBD alleles with the proba- tions with D and f as the unknowns are established (for the bility 1=2or1=4; so f ¼ 1=2 and D ¼ 1=4: Most relatedness derivation, see Supporting Information, File S1): for the case estimators are assumed to have the following conditions: (i) of homozygotes,

248 K. Huang et al. " # " # 2 Lynch and Ritland (1999) introduced the Kronecker oper- PrðGy ¼ Ai AijGx ¼ Ai AiÞ p ¼ i d : fi d ¼ ð ¼ j ¼ Þ ator ab This is de ned as ab 1ifAa and Ab are identical- Pr Gy Ai Ax Gx Ai Ai 2pi px " # by-state; otherwise dab ¼ 0: Using the Kronecker operator, 1 2 p2 p 2 p2 D b D^ þ i i i ; the expressions of rxy and xy in homozygous and heterozy- f 22pi px px 2 2pi px gous reference genotypes can be unified to write as (3a) ðd þ d Þ þ ðd þ d Þ 2 ^ ¼ pa bc bd pb ac ad 4papb; for the case of heterozygotes, rxy ð1 þ dabÞðpa þ pbÞ 2 4papb 2 3 2 3 2 PrðGy ¼ AiAijGx ¼ AiAjÞ p 6 7 6 i 7 2p p 2 p ðd þ d Þ 2 p ðd þ d Þþd d þ d d 6 PrðG ¼ A A jG ¼ A A Þ 7 6 2 7 D^ ¼ a b a bc bd b ac ad ac bd ad bc; 6 y j j x i j 7 6 pj 7 xy 6 7 6 7 ð1 þ dabÞð1 2 pa 2 pbÞþ2papb 6 PrðGy ¼ AiAjjGx ¼ AiAjÞ 7¼6 2p p 7 6 7 6 i j 7 4 ð ¼ j ¼ Þ 5 4 5 Pr Gy AiAx Gx AiAj 2pipx ð ¼ j ¼ Þ where pa and pb denote the frequencies of the two alleles in Pr Gy AjAx Gx AiAj 2pjpx 2 3 the reference genotype. The locus-specific weights w and wD 2 2 1 2 2 r pi pi pi ^ ^ 6 2 7 are defined by the inverses of the variances of rxy and Dxy; 6 7 ^ 6 2 2 1 2 2 7 respectively. However, the two variances (^rxy and Dxy) are 6 pj pj pj 7 6 2 7 D 6 7 D functions with r and as the independent variables, in which þ 6 2 1 þ 1 2 7 : 61 2 pipj 2 pi 2 pj 2 pipj7 r and D are the quantities we are attempting to estimate. Due 6 7 f 6 7 to a lack of a priori information, Lynch and Ritland (1999) 6 22 p p 1 p 2 2 p p 7 4 i x 2 x i x 5 assumed that the two individuals are nonrelatives and then 2 1 2 ^ D^ 2 pjpx 2 px 2 pjpx used the variances of rxy and xy to weight the locus. The (3b) weights wr and wD of each locus are given by

Here pi and pj denote the allele frequencies of Ai and Aj; re- 1 ð1 þ dabÞðpa þ pbÞ 2 4papb spectively, and px is the sum of the frequencies of visible w ¼ ¼ ; r Varð^r Þ 2p p alleles that do not appear in the reference genotype. Expres- xy a b ð þ d Þð 2 2 Þþ sion (3a) or (3b) is simply written as the matrix equation ¼ 1 ¼ 1 ab 1 pa pb 2papb: wD ^ VarðD Þ 2papb P ¼ E þ MD; (4) xy b Db ^ D^ which is the general form of all moment estimators that we The weighted averages of rxy and xy (denoted rxy and xy) subsequently describe, where the elements of E are the probabil- for all loci used in the estimation can be calculated by aver- ities that certain genotype patterns are observed conditional on aging the estimates of all loci by using the inverses of the two zero relatedness, the elements of the sum of E plus the first col- variances (wr and wD). Because there is no particular reason ^ D^ umn (or second column) of M are the probabilities that there are to choose the individual x or y as a reference, ryx and yx can two pairs of IBD alleles (or one pair of IBD alleles) between x and also be calculated by using another individual as a reference. fi ^ Db y,andD is the column vector consisting of the higher-order coef- The nal estimates r and are the arithmetic means of both ^ parameters: ficients D and f.TheestimateDxy can be solved from the formula ^ T 21 21 T 21 ^ D ¼ðM V MÞ M V ðP 2 EÞ; (5) ^rxy þ ^ryx ^r ¼ ; 2 where P^ is the observed value of P; and V is the variance– covariance matrix of P; assuming x and y are nonrelatives. For D^ þ D^ M xy yx the case of homozygotes, because is a square matrix with Db ¼ : order two, it can be seen from Equation 4 or Equation 5 that 2 D^ ¼ M21ðP^ 2 EÞ: Moreover, under the assumption that x and When a null allele (i.e., A ) is present, the heterozygote A A y are nonrelatives, V can be used to find the weighted least- y i y will be mistakenly identified as A9A9 (where 9 represents the squares solution of Equation 4. i i corresponding quantity of with the presence of null alleles), The elements of V are whereas the genotype AyAy will remain undetected and ¼ 2 2 ¼ 2 ð 6¼ Þ; regarded as a negative result (no bands detected from electro- Vii Ei Ei and Vij EiEj i j (6) phoresis). Negative results may also be obtained due to other where Ei is the ith element of E; and i; j ¼ 1; 2 for the case of reasons (e.g., operational errors), with null alleles being indis- homozygotes, as well as i; j ¼ 1; 2; 3; 4; 5 for the case of het- tinguishable from homozygotes. Therefore the unobservable erozygotes. Then, using Equation 5, we can calculate the genotype should be discarded from the probability of the ob- ^ analytical solution Dxy: On the other hand, ^rxy can be solved served genotype pattern. These facts can be summarized by the from Equation 1. potential genotype patterns of

Estimating Relatedness with Null Alleles 249 X or equivalently, Pr G y9 G9x; Gy 6¼ AyAy ¼ Pr Gx ¼ gxijG9x PrðGy ¼ gyjjGx ; i j X c c c c c ¼ g ; G 6¼ A A Þ; P9 3j þ 1j 2 3j ; 2j 2 3j D : xi y y y i d d d d d j 3j 1j 3j 2j 3j where gxi and gyj are the ith and jth possible genotypes of x P9 and y, respectively; PrðGx ¼ gxijG9xÞ is the probability that gxi It is now clear that can be expressed approximatively as P9 E þ M D; E M is the true reference genotype on condition that G9x is ob- * * where the matrices * and * are as fol- served; and lows: for the case of homozygotes, "# 2 þ = 2 2 pi 2pipy 1 py PrðGy ¼ gyjjGx ¼ gxi; Gy 6¼ AyAyÞ E* ¼ and 2p p = 1 2 p2 2 i x y 3 ð þ Þð 2 Þþ ð þ Þ is the probability that G is g given G ¼ g and assuming that 1 pi pi py 2 py 2py py 2pi = y yj x xi 6 7 9 ¼ 9 9 9 ¼ 9 9; M ¼ 4 ð þ Þð 2 Þ 52 ½E ; E ; Gy is observable. For example, if G y A iA i and G x A iA i * pi 2py 2 py * * letting r be PrðG9y ¼ A9iA9i Gx9 ¼ A9iA9i; Gy 6¼ AyAyÞ; then 0 pipxð2 2 pyÞþ2pxpy = ð pi þ 2pyÞð2 2 pyÞ (9a) r ¼ ¼ j 9 ¼ 9 9 ¼ j Pr Gx AiAi G x A iA i Pr Gy AiAi or AiAy Gx for the case of heterozygotes, ¼ ; 6¼ þ ¼ j 9 ¼ 9 9 2 3 AiAi Gy AyAy Pr Gx A iAy Gx A iA i Pr Gy 2 þ = 2 2 2 3 p 2pipy 1 py ð þ Þ= 6 i 7 0 pi py 2 6 p2 þ 2p p = 1 2 p2 7 6 ð þ Þ= 7 ¼ A A or A A jG ¼ A A ; G 6¼ A A p þ 1 p 6 j j y y 7 6 0 pj py 2 7 i i i y x i y y y y i 2 y E ¼ 6 ð Þ= 2 2 7 M ¼ 6 ð þ Þ= 7 2 ½E ; E ; * 6 2pipj 1 py 7 and * 6 1 pi pj 2 7 * * 6 2 7 4 = 5 2 þ ð 2 f 2 DÞþð þ Þf þ D 4 ð2p pxÞ= 1 2 p 5 0 px 2 pi pi 2 pipy 1 pi py i y ¼ ð Þ= 2 2 0 px=2 þ 2 2 ð 2 f 2 DÞþf þ D 2pjpx 1 py pi 2py 1 py 1 (9b) 2 1 2py p þ 2pipy ð1 2 f 2 DÞþ pi þ py f þ D þ i 2 : in which ½E*; E* is the partitioned matrix with two columns p þ 2p 1 2 p2 ð1 2 f 2 DÞþ 1 2 1 p f þ D i y y 2 y and every column consisting of E* [the derivation of ex- pressions (9a) and (9b) is shown in File S1]. Replacing E by Denote P9 for the corresponding column vector of P on the E*andM by M*inEquation5,thefollowingformula left side of expression (3a) or (3b), with the presence of null follows, 9 alleles and Gy being visible [e.g., P1 for homozygous reference i h 21 individual is PrðG ¼ A9A9 G ¼ A9A9; G 6¼ A A Þ]. It is clear 2 2 y i i x i i y y y D^ ¼ ðM*ÞTðV*Þ 1M* ðM*ÞTðV*Þ 1 P^92E* ; from the previous example that the ith element P9i of P9 can be written as where P^9 is the observed value of P9; and the elements in V* [refer to system (6) of equalities] are X c D þ c f þ c ð1 2 f 2 DÞ 9 ¼ 1j 2j 3j ; Pi D þ f þ ð 2 f 2 DÞ (7) d1j d2j d3j 1 * ¼ * 2 * 2 * ¼ 2 * * ð 6¼ Þ: j Vii Ei Ei and Vij Ei Ej i j ¼ ; ; where ckj and dkj (k 1 2 3) are polynomials with pi and py Applying the above formula, an estimate of Dxy can be calcu- ¼ ; ¼ 2 þ ; ^ as the variables [such as c11 pi c12 pi pipy and lated. Moreover, using Equation 1 and the elements of Dxy; a ¼ð þ Þð 2 2Þ D^ d13 pi 2py 1 py ]. Unfortunately, although can be less biased ^rxy of a single locus can be found. The weights ^ solved from expression (7) and Equation 5, the estimator across loci are still the inverses of the variances of ^rxy and Dxy: D ^ becomes more biased because the relationship between The symbolic expressions of the variances for ^rxy and Dxy are and P9 is not linear. As an alternative, the following approx- complex and are presented in File S1. imate form of P9i for the estimation in expressions (3a) and (3b) can be used: Novel Estimator A X c c c fi ’ P9 1j D þ 2j f þ 3j ð1 2 f 2 DÞ : (8) This estimator is a modi cation of Lynch and Ritland s (1999) i d d d j 1j 2j 3j estimator, but it does not directly use a probability matrix to solve the corresponding system of linear equations. Instead, By extracting D and u, expression (8) can be rewritten as the method-of-moment approach is used. The definition of the similarity index S is stated in expression (2). The similar- " # X ity index S as a random variable determines a probability c3j c1j c3j c2j c3j P9 þ 2 D þ 2 f mass function whose values are listed as follows: for the case i d d d d d j 3j 1j 3j 2j 3j of homozygotes,

250 K. Huang et al. 2 3 2 3 2 3 ð ¼ Þ 2 2 2 2 2 where S9 is the moment vector consisting of the first and Pr S 1 pi 1 pi pi pi D 4 ð ¼ = Þ 5 ¼4 5 þ4 2 2 5 ; second moments of S9: Then, the estimate D^ can be calcu- Pr S 3 4 2 pipx 2 pipx px 2 pipx f xy PrðS ¼ 1=2Þ 0 00 lated from the following formula: 2 D^ ¼ðCM*Þ 1 S92CE* : for the case of heterozygotes, 2 3 2 3 On the other hand, ^rxy can be calculated from Equation 1. The PrðS ¼ 1Þ 2pipj 4 ð ¼ = Þ 5 ¼ 4 2 þ 2 5 locus-specific weights are still the inverses of the variances of Pr S 3 4 pi pj ^ ð ¼ = Þ ^r and D ; and the remaining procedure is the same as Lynch Pr S 1 2 2pxðpi þ pjÞ xy xy 2 3 ’ 1 and Ritland s (1999) estimator. 1 2 2pipj ðpi þ pjÞ 2 2pipj 6 2 7 þ 6 2 2 2 2 1 ð þ Þ 2 2 2 2 7 Wang’s (2002) Estimator 4 pi pj 2 pi pj pi pj 5 ’ 22pxðpi þ pjÞ px 2 2pxðpi þ pjÞ Wang s (2002) estimator also uses the similarity index S as definedinexpression(2),butdiffersfromLynchand D : Ritland’s (1999) estimator. Wang’s(2002)estimatordoes f not use reference and proband individuals and is obtained These two systems of functions can be unified as P ¼ E þ MD: by modeling the probability of each similarity index ob- ¼ ; The symbol S is used to denote the moment vector consisting served. For example, for the event S 1 the probability ð ¼ Þ fi of the first and second moments of S. Then the relationship Pr S 1 is de ned by between S and D can be described as X PrðS ¼ 1Þ¼ PrðG ¼ A A ; G ¼ A A Þ S ¼ CE þ CMD; (10) x i i y i i i where X þ ð ¼ ; ¼ Þ Pr Gx AiAj Gy AiAj = = i6¼j C ¼ 13412 : 19=16 1=4 ¼ 2 2 þ 2 2 2 D 2a2 a4 1 2a2 a4 Expression (10) is actually a system of linear equations with two unknowns (i.e., D and f) and two equations. þ 2 2 2 f; a2 2a2 a4 With the presence of null alleles, using the same approx- P imation as before, the matrices E and M in expression (10) are ¼ m; ¼ ; ; ; ¼ Pwhere am i pi m 1 2 3 4 (especially a1 1 because modified and written as the following matrices E* and M*: for ¼ i pi 1). Generally, the following system of linear equa- the case of homozygotes, tions can be established, 2 3 2 þ 2 2 p 2pipy 1 p 2 3 2 3 2 3 6 i y 7 E* ¼ 4 2 2 5 and PrðS ¼ 1Þ l1 1 2 l1 a2 2 l1 2 pipx 1 py 4 5 4 5 4 5 PrðS ¼ 3=4Þ ¼ l2 þ 2l2 2a2 2 2a3 2 l2 0 2 3 PrðS ¼ 1=2Þ l3 2l3 1 2 3a2 þ 2a3 2 l3 1 pið pi þ pyÞð2 2 pyÞþ2pyð py þ 2piÞ = 6 7 D 6 ðp þ 2p Þð2 2 p Þ 7 M ¼ 6 i y y 7 2 ½E ; E ; 3 ; * 4 5 * * f 0 pi pxð2 2 pyÞþ2pxpy = ðpi þ 2pyÞð2 2 pyÞ 00 (12) l ¼ 2 2 ; l ¼ 2 ; l ¼ 2 2 2 for the case of heterozygotes, where 1 2a2 a4 2 4a3 4a4 and 3 4a2 4a2 8a3 þ 8a4 [the derivation of expression (12) is shown in File S1]. 2 3 ^ 2p p 1 2 p2 Expression (12) can also be expressed as Equation 4, and D 6 i j y 7 6 2 þ þ 2 þ 2 2 7 can be solved from Equation 5. However, the symbolic solu- E* ¼ 4 p 2pjpy p 2pipy 1 p 5 and ^ j i y tions of ^r and D are long (see Wang 2002, for details). The 2p ðp þ p Þ 1 2 p2 ’ 2 x i j 3 y weighting of Wang s (2002) estimator differs from that of fi ’ 1 ðpi þ pjÞ=2 most other estimators. The locus-speci c weight in Wang s 6 7 (2002) estimator is defined by the inverse of the expected M* ¼ 4 0 ðpi þ pjÞ=ð2 þ pyÞ 5 2 ½E*; E*: value of the similarity index and is used to calculate the 0 px ; ; 2 weighted averages of Pi ai and ai and to simultaneously D^: ð Þ 2 Substituting S9 for S; E* for E; and M* for M in expression solve The expected value E S is given by 2a2 a3 (Li (10), we now derive an approximate expression as et al. 1993; Wang 2002). If null alleles appear, expression (12) will be modified. For S9 CE* þ CM*D; (11) example, for the event S9 ¼ 1; this is given as

Estimating Relatedness with Null Alleles 251 0 1 Pr S9 ¼ 1 ½ð f1 2 f3ÞD þðf2 2 f3Þf þ f3 X X þ p f@b p 2 p2A y 1 i i X i6¼y i6¼y ¼ Pr G9x ¼ A9iA9i; G9y ¼ A9iA9i i6¼y ¼ 4p ð1 2 f 2 DÞðb b 2 b Þþp f b2 2 b X y 1 2 3 y 1 2 þ Pr Gx9 ¼ A9iA9j; Gy9 ¼ A9iA9j i6¼y ¼ 4pyðb1b2 2 b3Þ 2 4pyðb1b2 2 b3ÞD j6¼y h i j6¼i 2 þ py b 2 b2 2 4pyðb1b2 2 b3Þ f: X 1 ¼ PrðGx ¼ AiAi; Gy ¼ AiAiÞ i6¼y The result of this calculation can also be expressed as X r ¼ 4p ðb b 2 b Þ hy 1 2 3 i þ PrðGx ¼ AiAy; Gy ¼ AiAyÞ þ 2 ð 2 Þ; 2 2 2 ð 2 Þ D: i6¼y 4py b1b2 b3 py b1 b2 4py b1b2 b3 X þ ð ¼ ; ¼ Þ Pr Gx AiAy Gy AiAi It is now clear that the probabilities of S9 ¼ 1; 3=4; and 1=2 6¼ i y can be unified as the formula X þ ð ¼ ; ¼ Þ 21 Pr Gx AiAi Gy AiAy P9 ¼½ðf12f3ÞD þðf22f3Þf þ f3 E9 þ M9D ; (13) i6¼y X where þ ð ¼ ; ¼ Þ; Pr Gx AiAj Gy AiAj 2 3 2 2 i6¼y 2b 2 b þ 4p b þ 4p b 4 2 4 y 2 y 3 5 j6¼y E9 ¼ 4b1b3 2 4b4 þ 8pyb1b2 2 8pyb3 ; (14a) 6¼ 2 2 2 2 þ j i 4b1b2 8b1b3 4b2 8b4 where ðf 2 f ÞD þðf 2 f Þf þ f is the probability that 1 3 2 3 3 2 3 the genotypes of both individuals are visible, in which b2 þ 2p b b b þ p2b þ 3p b h i ¼ 2 2; ¼ 2 2ð 2 Þ; ¼ 2: 6 1 y 1 1 2 y 1 y 2 7 f1 1 py f2 1 py 2 py and f3 f1 From the M9 ¼ 4 2 þ 2 2 5 2 E9; E9 : 02b1b2 2b3 2pyb1 2pyb2 previous example, the probability for S9 being equal to a 3 0 b 2 3b1b2 þ 2b3 specific value is the sum of the probabilities of a series of 1 (14b) genotype pairs. To denoteP such a probability by symbols, m; we let bm be the sum i6¼ypi then bm is the counterpart of The derivation of expressions (14a) and (14b) is shown ^ am when null alleles are present. In general, am in File S1. Similar to expression (7), the estimation D ; differs slightly from bm and their relationship is solved by expression (13) is heavily biased. As an alter- ¼ þ m: r describedP as am bm py For instance, let be the sum native, the approximate form of Equation 4 is used for ð ¼ ; ¼ Þ: i6¼y Pr Gx AiAy Gy AiAj Then estimation, j6¼y 6¼ j i P9 E* þ M*D; (15) X X r ¼ð 2 f 2 DÞ 2 þ f 1 4pi pjpy pipjpy where i6¼y i6¼y 2 j6¼y j6¼y E* ¼ f 1E9 and 32 3 6¼ 6¼ j i ! j i ! f 21 b2 þ 2p b f 21 b b þ p2b þ 3p b 6 1 1 y 1 2 1 2 y 1 y 2 7 X X X X 6 7 M ¼ 6 21 2 þ 2 2 7 2 ½E ; E : ¼ ð 2 f 2 DÞ 2 þ f * 0 f2 2b1b2 2b3 2pyb1 2pyb2 * * 4py 1 pi pj py pi pj 4 5 6¼ 6¼ 6¼ 6¼ 21 3 2 þ i y j y i y j y 0 f2 b1 3b1b2 2b3 j6¼i j6¼i X X (16) ¼ ð 2 f 2 DÞ 2ð 2 Þ þ f ð 2 Þ 4py 1 pi b1 pi py pi b1 pi Replacing E by E* and M by M* in Equation 5, the following i6¼y ! i6¼y X X formula is obtained, ¼ ð 2 f 2 DÞ 2 2 3 4py 1 b1 pi pi ^ T 21 21 T 21 ^ i6¼y i6¼y D ¼½ðM*Þ ðV*Þ M* ðM*Þ ðV*Þ P92E* ;

252 K. Huang et al. where P^9 is the observed value of P9 and the elements in V* are * ¼ * 2 * 2 * ¼ 2 * * ð 6¼ Þ: Vii Ei Ei and Vij Ei Ej i j

Now, the estimate D^ can be calculated from the above for- mula, which has a smaller bias. There are more parameters ^ ^ ^ that should be weighted, including b (i ¼ 1; 2; 3; 4), b2; b3; i 1 1 Figure 1 Modes of identity-by-descent between two diploids. In each ^2; ^ ; ^2; P^9: b2 py py and The weight of each locus is the inverse of plot, the two top circles represent the two alleles in one individual, the expected value of S for nonrelatives, which is given by whereas the bottom circles represent the alleles in the second individual. The lines indicate alleles that are identical-by-descent. 3 1 E S9 ¼ 1; ; E9 4 2 also be applied to regulate this estimator [refer to expression p ð6b b þ 4b 2 2b Þþ2b2b 2 b b ¼ y 1 2 2 3 1 2 1 3: (17) (15)]. The weighting method is the same as that of Wang; the 2 2 2 1 py inverse of the expected value of the similarity index is calcu- lated by Equation 17, while the inverse of the variance is obtained from the formula VarðXÞ¼EðX2Þ 2 E2ðXÞ: Thomas’s (2010) Estimator Thomas’s (2010) estimator is a modification of Wang’s Novel Estimator B ’ (2002) estimator. In Thomas s (2010) estimator, the two This estimator is also based on Wang’s (2002) method and ¼ = ¼ = events S 3 4 and S 1 2 are combined into a single event uses the moment vector consisting of the first and second ¼ = (i.e., the event S 3 4 or 1/2). Then, the second and third moments of the similarity index S to solve D^: The probability P; E; M ’ rows of and in Wang s (2002) estimator are combined mass function of S is stated in expression (12), which can also P; E; M ’ as the second row of and in Thomas s (2010) estima- be denoted as Equation 4. Hence, the relationship between tor, respectively. This results in the moment vector S and D can be established [which is D^ ð ¼ Þ 2 2 identical to expression (10)], and can be solved from Equa- P ¼ Pr S 1 ; E ¼ 2a2 a4 ; 2 tion 5. The weight of each locus is defined by the inverse of PrðS ¼ 3=4 or 1=2Þ 4a2 2 4a 2 4a3 þ 4a4 2 the expected value of the similarity index, and the weighting 1 a and M ¼ 2 2 ½E; E: scheme follows that of Wang (2002). 012 a 2 With the presence of null alleles, the probability mass function of S9 is identical to expression (13), where E9 and Now, the estimate D^ can be solved from Equation 5. The M9 are listed in expressions (14a) and (14b). With the same weighting scheme differs from that of Wang’s (2002) estima- ^ approximation as expression (15), expression (11) is estab- tor. In Thomas’s (2010) estimator, for each locus, ^r and D are lished, where E* and M* are given in expression (16). There- obtained by Equations 1 and 5, respectively. The averages of ^r fore, D^ can be solved from Equation 5, and the weight across and D^ for all loci are obtained from the following two ap- loci is defined by expression (2). proaches: (i) the inverse of the expected value of the similar- ity index is identical to that of both Wang (2002) and Li et al. (1993) and (ii) the inverses of the sampling variances of^r and Maximum-Likelihood Estimators ^ D are equal to those of Lynch and Ritland’s (1999) estimator. Jacquard (1972) described nine IBD modes, denoted by With the presence of null alleles, the probabilities of D1; D2; ...; D9 (see Figure 1), that summarize the possible S9 ¼ 1; 3=4; and 1/2 can also be expressed as expression 2 IBD relationships among the set of four alleles possessed by (13); namely P9 ¼½ðf 2f ÞD þðf 2f Þf þ f 1ðE9 þ M9DÞ; 1 3 2 3 3 two diploids. The probability that a pair of individuals is in E9 M9 P where the matrices and are D ; 9 D ¼ ; D D IBD mode Di is denoted by i so i¼1 i 1 and 7 and 8 are equivalent to D and f, respectively. Because IBD alleles 2b2 2 b þ 4p2b þ 4p b E9 ¼ 2 4 y 2 y 3 ; appear in the same individual, D1; D2; ...; D6 are IBD modes of 2 þ 2 2 2 2 þ 8pyb1b2 8pyb3 4b1b2 4b1b3 4b2 4b4 inbreeding. Therefore, the relatedness coefficient is given by b2 þ 2p b b b þ p2b þ 3p b 1 M9 ¼ 1 y 1 1 2 y 1 y 2 2 E9; E9 : r ¼ 2D1 þ D3 þ D5 þ D7 þ D8: (18) 2 2 2 þ 3 2 02pyb1 2pyb2 b1b2 b1 In an outbred population, Equation 18 is reduced to Equation 1. This result is similar to Wang’s (2002) estimator, the solution There are also nine IBS modes, denoted by S1; S2; ...; S9; being overly biased if D^ is solved directly. The approximation which are interpreted by using different definitions of iden- method for the correction of Wang’s (2002) estimator can tical-by-state (identified by separated lines in Figure 1).

Estimating Relatedness with Null Alleles 253 Table 1 Probability of patterns of identity-in-state given modes of identity-by-descent in the presence of null alleles IBD mode

IBS mode Allelic state D1 D2 D3 D4 D5 D6 D7 D8 D9 S A A ; A A y p y p2 y y p y y p2 y y p y y p2 y y p y y p y y2p2 1 i i i i 1 i 1 i 1 2 i 1 3 i 1 2 i 1 3 i 1 3 i 1 5 i 1 3 i S2 Ai Ai; Aj Aj 0 y1pipj 0 y1y4pipj 0 y1y3pipj 0 y1pipj py y1y3y4pi pj 2 2 S3 Ai Ai; Ai Aj 00y1pi pj 2y1pi pj 00 0y1y2pipj 2y1y3pi pj S4 AiAi ; Aj Ak 00 02y1pi pjpk 00 0 02y1y3pipj pk 2 2 S5 Ai Aj; Ai Ai 00 0 0 y1pipj 2y1pi pj 0 y1y2pipj 2y1y3pi pj S6 AjAk; AiAi 00 0 0 02y1pipj pk 002y1y3pipj pk 2 2 S7 Ai Aj; Ai Aj 0000002y1pipj y1pipj ðpi þ pj Þ 4y1pi pj 2 S8 AiAj ; Ai Ak 000000 0y1pipj pk 4y1pi pjpk S9 AiAj ; AkAl 000000 0 04y1pi pjpkpl

Unlike method-of-moment estimators, the maximum- Simulations and Comparisons likelihood estimator models the probability of an observed We simulated four applications: (i) the effect of null alleles IBS mode conditioned on each IBD mode (for details see (how ^r biases the result before adjustment), (ii) the reliability table 1 in Milligan 2003). of the estimators after correction (bias of the adjusted ^r), (iii) The single-locus likelihood L is the probability that the the efficiencies of the corrected estimators, and (iv) the ro- genotype pair has been observed given a value of D; and L bustness of the corrections (populations under nonideal con- can be described by the formula ditions were simulated, and the statistical behaviors of all X9 estimators before and after correction were compared). fi L ¼ PrðS; DiÞDi: We used Monte Carlo simulations for the rst three cases. i¼1 For each dyad, the alleles of the first individual were randomly associated into genotypes according to the allele frequencies. For multilocus estimation, the global likelihood is the product of Subsequently, the other genotype was then obtained by using the whole single-locus likelihoods. Maximum-likelihood estima- the condition of the first genotype and their relationships (D tors assign implicitlyP the weights for each locus. The parameter and f). For each locus, we generated a random number t that D 9 D ¼ ; # D # ; space is in which i¼1 i 1 0 i 1 while in outbred was uniformly distributed from 0 to 1: (i) if 0 # t # D; the fi populations, the rst sixP kinds of IBD alleles cannot appear in an second genotype at this locus would be equal to the first; (ii) 6 D ¼ : Dð 2 D 2 fÞ individual. Therefore i¼1 i 0 Aconstraint,4 1 if D , t # D þ f; one allele was randomly copied from the , f2; in diploids in an outbred population was proposed by first genotype, and the other was randomly generated Thompson (1976). This constraint is based on the assumption according to the allele frequency; and (iii) if D þ f , t # 1; that two individuals have fathers that may be relatives and moth- both alleles of the second genotype were randomly generated ers that may also be relatives, but the parents of each individual according to the allele frequency. are unrelated. This constraint can slightly reduce the bias and wasappliedbyAndersonandWeir(2007),butnotinMilligan’s (2003) estimator. In addition, Anderson and Weir (2007) model Before Correction the probability of patterns of identity-in-state given modes of identity-by-descent in structured populations. We used both observed genotype and allele frequency for With the presence of null alleles, the data in Milligan’s estimation and simulated four levels of null allele frequency: ¼ : ; (2003) table 1 should be modified as we present here in py 0 1 0.3, 0.5, and 0.6. For each level, the visible allele frequency was drawn from a triangular distribution, in which Table 1. This shows that y2 ¼ pi þ py; y3 ¼ pi þ 2py; 2 ; ; ...; y ¼ p þ 2p ; and y ¼ p2 þ 3p p þ p2; with y 1 being the allele frequency followed 1 2 n proportions. 4 j y 5 i i y y 1 – probability that a pair of genotypes is observed. Each coeffi- Four relationships were simulated, including parent off- spring, full sibs, half sibs, and nonrelatives, and six estimators cient in Table 1 thus needs to be multiplied by y1; where were compared, including that of Lynch and Ritland (1999) 2 y 1 ¼ðD þ D þ D Þð1 2 p ÞþD ð12p Þ2 (L&R), the novel estimator A (NA), Wang (2002) (WA), 1 1 3 5 y 2 y þðD þ D Þð 2 Þ 2 2 Thomas (2010) (using the inverse of variance for weighting, 4 6 1 py 1 py TH), the novel estimator B (NB), and the estimator of þ D 2 2 þ D 2 2 þ 3 þ D 2 2 2: 7 1 py 8 1 2py py 9 1 py Anderson and Weir (2007) (A&W). It is noteworthy that, be- (19) cause we were unable to apply both corrections (null alleles and structured populations) simultaneously, the population struc- Wagner et al. (2006) also applied similar corrections to the ture parameter u for the A&W estimator was set to zero in the estimator of the maximum-likelihood method, but failed to simulation (see Anderson and Weir 2007). For moment estima- remove the probabilities for unobservable genotype patterns tors, bias was obtained for a single locus by 6 million simula- and to apply Thompson’s (1976) constraint. tions. Because the final estimate is usually a weighted average of

254 K. Huang et al. Figure 2 Bias of ^r before correction as a function of the number of visible alleles at loci with triangular distributed allele frequency. The data that have null alleles with a frequency of 0.1, 0.3, 0.5, or 0.6 are shown in the first to fourth columns. The six estimators compared are as follows: Lynch and Ritland (1999) (L&R), novel estimator A (NA), Wang (2002) (WA), Thomas (2010) (TH), novel estimator B (NB), and the Anderson and Weir (2007) estimator (A&W). Results were obtained from 6 million pairs for four relation- ships by Monte Carlo simulations except the A&W estimator, including parent–offspring (“—”), full sibs (“––”), half sibs (“–”), and nonrelatives (“⋯”).

the estimates from all loci (except the WA estimator), increasing increases. For a highly polymorphic locus at low null allele the number of loci does not reduce bias. Therefore, a single frequencies, the biases of the WA and NB estimators are also locusissufficient to show an effect of null alleles on bias. In small. contrast,biaswasobtainedfor60,000locifrom100simulations The bias curves in Figure 2 can be divided into three cat- for the A&W estimator. This estimator is asymptotically unbi- egories: (i) L&R and NA; (ii) WA, TH, and NB; and (iii) A&W. ased. Therefore, there are two sources of bias with the presence The biases of L&R and NA estimators are highly independent of null alleles: (i) the original bias, present even in normal loci, of the number of observed alleles because the bias of L&R is and (ii) the bias brought about by the presence of null alleles. not a function of n9, while those of WA, TH, and NB estima- Wefocusedonthesecondtypeofbiasandusedseverallocito tors are dependent on n9: Because the NA estimator is a mod- eliminate the firsttypeofbias.TheresultsshowninFigure2are ification of the L&R estimator, and because the TH and NB a function of the observed number n9 of alleles, with n9 being estimators are both modifications of the WA estimator, the simulated from 2 to 15. resulting curves are similar within the same category. We The L&R and NA estimators cope relatively well with null present the bias curves of method-of-moment estimators in alleles, because their biases are smaller at low frequencies of File S1. For the A&W estimator, ^r is calculated by a numerical null alleles (Figure 2). The bias of the L&R estimator remains algorithm. Therefore an analytical solution cannot be relatively unchanged as n9 increases, as does that of the A&W obtained. estimator. Because the bias of each of these two estimators is small, each can be used directly without adjustment at low After Correction null allele frequencies. However, each begins to show a larger bias as py increases and reaches 0.15 at py ¼ 0:6: The NA The configurations for this application are identical to those estimator performs worse than L&R at low levels of null al- previously described. Although bias cannot be completely leles, but improves at high levels of null alleles when n9.5: eliminated, it reduces to a negligible level after correction The other three method-of-moment estimators (WA, TH, and (Figure 3). The biases are ,0.003, 0.015, 0.02, and 0.04 NB) have similar bias curves. The bias of ^r varies as n9 in- when the null allele frequency = 0.1, 0.3, 0.5, and 0.6, re- creases, is negative at lower n9; and begins to increase as n9 spectively. At n9 ¼ 2; both the NA and WA estimators

Estimating Relatedness with Null Alleles 255 Figure 3 Bias of ^r after correction as a function of the number of alleles at loci with a triangular allele fre- quency distribution. Four levels of null allele frequency are shown in each column. Six estimators are com- pared, as shown in Figure 2. Results were obtained from 60 million pairs for four relationships by Monte Carlo simulations except the A&W estimator, including parent-offspring (“—”), full-sibs (“––”), half-sibs (“–”) and nonrelatives (“⋯”).

encounter a singular matrix problem (see Huang et al. 2014) likely relationship between two randomly selected individuals. and cannot yield a valid estimate, so the corresponding value Therefore, we simulated 10 million dyads, with nonrelatives, is missing or highly negative. Full sibs yield the largest bias, half sibs, full sibs, and parent–offspring contributing to 70%, while the biases of the other relationships are relatively 10%, 10%, and 10% of these dyads, respectively. The results small. Similarly, the biases of corrected method-of-moment as a function of n9 are shown in Figure 5. estimators are also presented in File S1, where the biases of The potency of a locus is an inverse function of py (Figure 4) but the WA, TH, and NB estimators are identical and do not de- increases with n9; reaching an asymptote around n9 ¼ 13: Locus pend on n9: power is low for high frequencies of null alleles. For example, at least 200 loci should be used when py ¼ 0:7: The curve for py ¼ 0 is close to that of p ¼ 0:1 and is thus omitted from Figure 4. Efficiency of Estimators y When estimating relatedness with 20 , the Multiple multiallelic loci were used in all estimations to increase biases of the A&W estimator are the highest, while the MSEs reliability. Less variance suggests a more precise estimation, so of the A&W estimator are the lowest (Figure 5). For method-of- variance can be a criterion of efficiency of estimators. However, moment estimators, the WA, TH, and NB estimators are least bias cannot be excluded completely. We thus used the mean biased, with the MSE curves of these estimators being similar. squareerror(MSE)toevaluatethe efficiencyofestimators,where MSEð^rÞ¼Bias2ð^rÞþVarð^rÞ: To demonstrate the efficiency of Finite Populations corrected estimators in multilocus estimations, we performed two simulations: (i) the minimum number of loci needed to Although all of these estimators performed reasonably well reach a mean square error ,0.01 (the inverse of the resulting under their given assumptions, real situations often diverge value was defined as the potency of a locus and is shown to be a from ideal conditions. For example, allele frequencies are function of n9) (Figure 4) and (ii) the bias and MSE of these obtained from relatively few individuals, and other population estimators were compared when estimating relatedness with 20 effects such as self-fertilization, inbreeding, and genetic drift microsatellites. In natural populations, nonrelative is the most may occur.

256 K. Huang et al. Figure 4 The minimum number of loci by using multiple loci to reach MSEð^rÞ , 0:01 for the four relationships shown in Figure 2. The six esti- mators are the same as in Figure 1. For each estimator, four frequencies of null alleles were simulated: 0.1 (—), 0.3 (––), 0.5 (––), 0.6 (⋯), and 0.7 (•••), and the observed allele frequency follows the triangular distri- bution. Results were obtained from 1 million Monte Carlo simulations.

We followed Toro et al. (2011) by simulating a small popu- lation founded from 20 unrelated and outbred individuals, whose genotypes were randomly generated according to the genotypic frequencies under the Hardy–Weinberg equilibrium. Twenty loci with four visible alleles were simulated. In the foun- der generation, two levels of the expected frequencies of null Figure 5 The bias and MSE of ^r estimated from 20 loci with a triangular alleles were simulated (py ¼ 0:1or0.3),theexpectedfrequen- cies of these visible alleles being drawn from triangular distribu- allele frequency distribution. Ten million dyads were simulated, with non- relatives, half sibs, full sibs, and parent–offspring contributing to 70%, tions. Ten discrete generations of 20 individuals were produced, 10%, 10%, and 10% of dyads, respectively. The frequency of null alleles with the sex ratio at each generation fixed at 1 : 1. The parents was simulated at four levels (py ¼ 0:1; 0.3, 0.5, and 0.6), with each row of each individual were randomly selected from the previous showing a single level. The results of six estimators are compared: L&R generation (some individuals may not have reproduced), result- (—), NA (––), WA (=), TH (D), NA (∘), and A&W ( 3 ). ing in a data set of 200 individuals and 20,100 dyads. The true kinship coefficientwascalculatedfromthepedigreebyarecur- this bias to a low level. Second, a comparison of the number of sive algorithm (Karigl 1981), and the truep Wrightffiffiffiffiffiffiffiffiffiffiffiffi ’s relatedness loci that is needed to achieve the same accuracy was per- was obtained using the formula r ¼ uxy = uxx uyy (equation 12 formed to evaluate the impact of null alleles on estimator in Huang et al. 2015a), where uxy is the kinship coefficient be- efficiency.This comparison also demonstrated the potency of a tween x and y,anduxx is the kinship coefficient between x and different locus and will help researchers to choose an optimal itself. The frequencies of null and visible alleles were estimated estimator to address their specific research questions. Finally, by a maximum-likelihood estimator, using an EM algorithm a small, inbred population prone to genetic drift was simu- (Kalinowski et al. 2006). The results of the statistical analysis lated to mimic real populations. fi from 100 replications including bias, root MSE (RMSE), coef - Effect of null alleles b cient of correlation (R), and the slope ( 1) and the intercept b ( 0) of regression of the estimated relatedness on the true re- Null alleles can affect the estimates of relatedness in two ways. latedness are shown in Table 2. First, they can result in mistyping of true genotypes and Most of the statistics improved after correction, with the change the similarity among genotypes. Given a pair of ge- corrected estimator becoming more stable as py increases. notypes, the observed similarity index is under- or overesti- There were some exceptions, for instance the L&R estimator mated for genotypes such as AiAy; AjAy and AiAy; AiAi; because became more biased, with the bias of the corrected estimator these genotypes are instead observed to be Ai9A9i; Aj9Aj9 and being negative and close to 2r: A9iAi9; Ai9Ai9: These can reduce the accuracy of estimations. Second, the overestimation of observed allele frequencies can also affect the overall estimation. Nonrelatives share IBS Discussion alleles only by chance. However, the inflation of observed We examined the performance of six estimators in coping with allele frequency can cause overestimation of the probability null alleles under various types of pairwise relationship. First, that two nonrelatives share IBS alleles and may give an estimator bias before and after correction for null alleles was inaccurate, reduced estimate. Figure 2 shows the null allele compared. Although some bias remained, correction reduced bias for each estimator. The estimators of Wang (2002) and

Estimating Relatedness with Null Alleles 257 Table 2 Statistics of br in a finite population Before correction After correction py Model Bias RMSE R b1 b0 Bias RMSE R b1 b0 0.1 L&R 20.106 0.231 0.362 0.979 –0.104 –0.108 0.235 0.360 0.992 –0.107 NA –0.118 0.248 0.390 1.134 –0.133 –0.108 0.247 0.392 1.161 –0.126 WA –0.209 0.349 0.436 1.639 –0.280 –0.169 0.305 0.433 1.484 –0.223 TH –0.242 0.380 0.432 1.695 –0.319 –0.116 0.292 0.441 1.594 –0.182 NB –0.206 0.347 0.437 1.639 –0.277 –0.113 0.280 0.436 1.504 –0.169 ML –0.023 0.145 0.415 0.805 –0.001 –0.013 0.153 0.411 0.849 0.004 0.3 L&R –0.106 0.247 0.355 1.039 –0.111 –0.108 0.247 0.361 1.055 –0.114 NA –0.124 0.264 0.372 1.144 –0.140 –0.107 0.259 0.377 1.178 –0.127 WA –0.378 0.518 0.435 2.040 –0.493 –0.210 0.334 0.433 1.505 –0.266 TH –0.449 0.586 0.444 2.210 –0.583 –0.118 0.306 0.451 1.716 –0.197 NB 20.371 0.511 0.436 2.032 –0.485 –0.115 0.291 0.431 1.544 –0.175 ML –0.022 0.146 0.407 0.804 –0.001 –0.008 0.157 0.402 0.864 0.006

Thomas (2010) have reduced performance in the presence of cannot hold under any conditions. Therefore, an unbiased null alleles, as does the novel estimator B. For null allele estimator for expression (7) or (13) does not exist. For frequencies 0.5, all estimators have a large bias. py , 0:3; the bias is manageable. Why are these estimators still biased? The second source of bias is that the genotypic frequencies deviate from expected values. This problem is inherent to the These estimators were corrected after we modeled the prob- L&R and NA estimators (the bias increases as n9 decreases). ability of genetic relationships in the presence of null alleles. These estimators model the probability of an observed geno- However, bias still existed, although it was not high. There are type occurrence relative to a reference genotype. With null two sources of bias in moment estimators. alleles, the observed genotypic frequencies of two related The first source is due to the approximation [(e.g. Expressions individuals are drawn from different distributions. As an ex- (8), (11) and (15)] described in the Theory and Modeling section ample, Table 3 lists the distribution of parent–offspring re- and is present in all moment estimators. When f or D is not latedness, with the second column and bottom row showing equaltozeroorone,biasarisesbecausetheapproximationis the distributions of the reference and proband genotypes, notequivalenttotheoriginalform(e.g., the bias for full sibs was respectively. However, without a priori information, the esti- the largest in Figure 3, while the biases for both parent–off- mator cannot determine which is the reference genotype, and spring and nonrelatives were nearly zero). However, the unbi- bias cannot thus be avoided. ased estimator for expression (7) or (13) does not exist, because For the maximum-likelihood estimator, there are no such ^ there is an inverse of Pi in the r; and Pi is a binomial distribution problems. However,the maximum-likelihood estimator is still if only one locus is used for estimation. Consider a simpler form biased because the estimate is limited to the parameter space, as follows (by using only one parameter x): with negative relatedness being unobtainable. Therefore, a bias for relationships with D lying at the edge of the param- þ ð ¼ Þ¼a1x b1; eter space is always present. Pr X 1 þ c1x d1 The potency of loci with null alleles þ ð ¼ Þ¼a2x b2: Pr X 2 When the frequencies of null alleles are high enough, loci with c2x þ d2 null alleles can be identified. Although bias can be reduced, Assume that there is an unbiased estimator fðXÞ; then the loss of efficiency for loci with null alleles cannot be E½ fðXÞ ¼ x; so recovered. For example, for py ¼ 0:5 and 0.7, only 56% and 26% of loci, respectively, can give a valid estimate for fð1ÞPrðX ¼ 1Þþfð2ÞPrðX ¼ 2Þ¼x: nonrelatives. In simulations, the MSE is used to evaluate the efficiency of ð þ Þ Multiplying both sides of the last equality by c2x d2 loci. The inverse of the minimum number of loci that enables ð þ Þ; c1x d1 and acknowledging the representations of MSEð^rÞ , 0:01 is shown in Figure 4, which describes the po- ð ¼ Þ ð ¼ Þ Pr X 1 and Pr X 2 above, it follows that tency of a single locus with null alleles. Approximately 19, 25, 34, 100, and 200 polymorphic loci should be used to fulfill ð Þð þ Þð þ Þþ ð Þð þ Þð þ Þ f 1 a1x b1 c2x d2 f 2 a2x b2 c1x d1 this requirement for null allele frequencies being 0.1, 0.3, ¼ xðc2x þ d2Þðc1x þ d1Þ: 0.5, 0.6, and 0.7, so each locus contributes 5%, 4%, 3%, 1%, and 0.5% to reliability (Figure 4). In addition, the po- The index of x is not equal in the two sides of the above tency of each normal locus is approximately that of py ¼ 0:1: equation, but x is a consistent parameter, and the equation Loci showing higher potency (e.g., .3%) can be used for

258 K. Huang et al. Table 3 The distribution of genotypes between parents (Gx9) and that has superior performance under all conditions and using all offspring (Gy9) with the presence of null alleles, assuming there are metrics. The RMSE depended on the allele frequency, the num- three uniformly distributed alleles and one of them is a null allele ber of alleles and loci, and the type of relationship between PrðGy9 G9x Þ individuals. The simulations and comparisons we present here

Gx9 PrðGx9Þ A19A91 A19A29 A92A92 will probably help researchers choose the most suitable estima- tor for their particular application. Furthermore, for applications A9A9 3=85=81=41=8 1 1 for specific genetic conditions, we show that it is possible to A91A92 1=41=41=41=4 A92A9223=81=81=45=8 identify a single optimal estimator. To make this easier, we have PrðGy9Þ 4=11 3=11 4=11 made available a free software package, “PolyRelatedness,” that provides a simulation function to help researchers evaluate the performance of each estimator under their given conditions. estimation after correction. For loci displaying low potency, For novice users, we recommend the A&W estimator be- they should be discarded to search for new polymorphic loci cause of its robustness. However, there are at least two situ- fi or new primers should be designed. We suggest that loci with ations when the A&W estimator cannot be used. The rst one null allele frequencies .0.5 should not be typed, even if is dealing with tied values in the data set. Some of the esti- regulated estimators are used. This is because the informa- mates of the A&W estimator lie at the edge of parameter tion provided by these loci is less than half that of normal loci. space, which results in many estimates of 0 or 0.5. If the Even if they have been typed, it does not matter if they are estimated relatedness between two kinds of dyads is com- fi used for estimation because they contribute little to the MSE. pared with a rank-sum test, it is dif cult to determine the ranks of these ties. The second one is the problem of bias, Finite populations because the maximum-likelihood estimator cannot be used In our fourth application, nonideal conditions were simulated. for applications that required a bias-free analysis. For exam- The corrected estimators had improved statistical perfor- ple, the relationship between bee or ant workers within the mance and were more robust at different levels of allele same colony is either half sibs (r ¼ 0:25) or full sibs frequency. The L&R, NA, and A&W estimators performed well (r ¼ 0:75) if there is only a single queen inside this colony, ^2 : for py , 0:3 (Figure 2), so the statistical performances of the so the ratio of full sibs can be calculated by 2r 0 5ifan corrected estimators were not improved. For the WA, TH, and unbiased estimator is used (Huang et al. 2015b). In this case, NB estimators, there was large negative bias (,20:1) for the L&R estimator can be used as an alternative because the smaller values of n9 (Figure 2); their corrected estimators MSEs generated for nonrelatives are lower than other were therefore less biased and more accurate (lower RMSE). method-of-moment estimators (Figure 5), the most likely re- There was a negative bias near 2r for corrected estimators lationship between two randomly selected individuals being (Table 2). For panmictic populations, the estimator gives an nonrelatives in medium to large populations. expected estimate of zero. However, we found negative bias in populations that included relatives. Two randomly se- Acknowledgments lected individuals in such a population have an expected re- latedness estimate of zero but their true expected relatedness We thank Stephanie T. Chen for language corrections and is 0.111. This results in negative bias. Ruo-du Wang for helpful comments on the mathematics. We also thank the associate editor, Bret Payseur, and two Kinship estimators anonymous reviewers for helpful comments on an earlier We also modeled the kinship estimators of Loiselle et al. (1995) version of the manuscript. This study was supported by the and Ritland (1996) for null alleles. We found that these estima- National Nature Science Foundation of China (31130061, tors cannot incorporate null alleles because they model the 31501872, 31270441, and 31470455). event that one pair of alleles is sampled from two individuals. K.H. and B.L. designed the project, K.H. wrote the program Because null alleles cannot be observed and sampled, this and draft, S.G. and X.Q. provided the tools and data, and K.R. would instead be a resampling of alleles if null alleles were and D.W.D. checked the model and edited the manuscript. sampled. However, without D and f, the probabilities cannot be modeled through a single kinship coefficient. We performed a simple simulation and found that the expected similarity index Literature Cited fi – and kinship coef cient were different in full sibs and parent Anderson, A. D., and B. S. Weir, 2007 A maximum-likelihood offspring in which null alleles were present, even though the method for the estimation of pairwise relatedness in structured kinship coefficients were identical in these two relationships. populations. Genetics 176: 421–440. Blouin, M. S., 2003 DNA-based methods for pedigree reconstruc- Selection of estimators tion and kinship analysis in natural populations. Trends Ecol. Evol. 18: 503–511. Although the A&W estimator performs well in simulations, there Brookfield, J. F. Y., 1996 A simple new method for estimating null are some conditions under which others perform better accord- allele frequency from heterozygote deficiency. Mol. Ecol. 5: ing to specific metrics. Therefore, there is no single estimator 453–455.

Estimating Relatedness with Null Alleles 259 Dabrowski, M. J., M. Pilot, M. Kruczyk, M. Zmihorski, H. M. Umer Lynch, M., and K. Ritland, 1999 Estimation of pairwise related- et al., 2013 Reliability assessment of null allele detection: in- ness with molecular markers. Genetics 152: 1753–1766. consistencies between and within different methods. Mol. Ecol. Mattila, A. L. K., A. Duplouy, M. Kirjokangas, R. Lehtonen, P. Rastas Resour. 14: 361–373. et al., 2012 High genetic load in an old isolated butterfly pop- Hall,N.,L.Mercer,D.Phillips,J.Shaw,andA.D.Anderson, ulation. Proc. Natl. Acad. Sci. USA 109: E2496–E2505. 2012 Maximum likelihood estimation of individual inbreeding Milligan, B. G., 2003 Maximum-likelihood estimation of related- coefficients and null allele frequencies. Genet. Res. 94: 151–161. ness. Genetics 163: 1153–1167. He, G., K. Huang, S. T. Guo, W. H. Ji, X. G. Qi et al., Queller, D. C., and K. F. Goodnight, 1989 Estimating relatedness 2011 Evaluating the reliability of microsatellite genotyping using genetic markers. Evolution 43: 258–275. from low-quality DNA templates with a polynomial distribution Ritland, K., 1996 Estimators for pairwise relatedness and individ- model. Chin. Sci. Bull. 56: 2523–2530. ual inbreeding coefficients. Genet. Res. 67: 175–185. Huang, K., K. Ritland, S. T. Guo, M. Shattuckn, and B. G. Li, Taberlet, P., S. Griffin, B. Goossens, S. Questiau, V. Manceau et al., 2014 A pairwise relatedness estimator for polyploids. Mol. 1996 Reliable genotyping of samples with very low DNA quan- Ecol. Resour. 14: 734–744. tities using PCR. Nucleic Acids Res. 24: 3189–3194. Huang, K., S. T. Guo, M. R. Shattuck, S. T. Chen, X. G. Qi et al., Thomas, S. C., 2010 A simplified estimator of two and four gene 2015a A maximum-likelihood estimation of pairwise related- relationship coefficients. Mol. Ecol. Resour. 10: 986–994. ness for autopolyploids. Heredity 114: 133–142. Thompson, E. A., 1976 A restriction on the space of genetic rela- Huang, K., K. Ritland, S. T. Guo, D. W. Dunn, D. Chen et al., tionships. Ann. Hum. Genet. 40: 201–204. 2015b Estimating pairwise relatedness between individuals Toro, M. Á., L. A. García-Cortés, and A. Legarra, 2011 A note on with different levels of ploidy. Mol. Ecol. Resour. 15: 772–784. the rationale for estimating genealogical coancestry from mo- Jacquard, A., 1972 Genetic information given by a relative. Bio- lecular markers. Genet. Sel. Evol. 43: 27. metrics 28: 1101–1114. van Oosterhout, C., W. F. Hutchinson, D. P. Wills, and P. Shipley, Kalinowski, S. T., A. P. Wagner, and M. L. Taper, 2006 Ml-relate: a 2004 Micro-checker: software for identifying and correcting computer program for maximum likelihood estimation of relat- genotyping errors in microsatellite data. Mol. Ecol. Notes 4: edness and relationship. Mol. Ecol. Notes 6: 576–579. 535–538. Karigl, G., 1981 A recursive algorithm for the calculation of iden- Vogl, C., A. Karhu, G. Moran, and O. Savolainen, 2002 High res- tity coefficients. Ann. Hum. Genet. 45: 299–305. olution analysis of mating systems: inbreeding in natural pop- Kokita, T., S. Takahashi, and H. Kumada, 2013 Molecular signa- ulations of Pinus radiata. J. Evol. Biol. 15: 433–439. tures of lineage-specific adaptive evolution in a unique sea ba- Wagner, A. P., S. Creel, and S. T. Kalinowski, 2006 Estimating sin: the example of an anadromous goby Leucopsarion petersii. relatedness and relationships using microsatellite loci with null Mol. Ecol. 22: 1341–1355. alleles. Heredity 97: 336–345. Li, C. C., D. E. Weeks, and A. Chakravarti, 1993 Similarity of DNA Wang, J. L., 2002 An estimator for pairwise relatedness using fingerprints due to chance and relatedness. Hum. Hered. 43: molecular markers. Genetics 160: 1203–1215. 45–52. Wright, S., 1921 Systems of mating. I. The biometric relations Loiselle, B. A., V. L. Sork, J. Nason, and C. Graham, 1995 Spatial between parent and offspring. Genetics 6: 111. genetic structure of a tropical understory shrub, Psychotria offi- cinalis (rubiaceae). Am. J. Bot. 82: 1420–1425. Communicating editor: B. A. Payseur

260 K. Huang et al. GENETICS

Supporting Information www.genetics.org/lookup/suppl/doi:10.1534/genetics.114.163956/-/DC1

Estimating Relatedness in the Presence of Null Alleles

Kang Huang, Kermit Ritland, Derek W. Dunn, Xiaoguang Qi, Songtao Guo, and Baoguo Li

Copyright © 2016 by the Genetics Society of America DOI: 10.1534/genetics.114.163956 File S1. Supplementary materials.

Supplementary materials for ‘Estimating relatedness with the presence of null alleles’

The derivation of Expressions (3a) and (3b)

In outbred populations, the probability that the proband genotype 퐺푦 is observed is conditional on

the reference genotype 퐺푥 and can be expressed as

Pr(퐺푦 | 퐺푥) = Pr(퐺푦 | 퐺푥, Δ = 휑 = 0)(1 − Δ − 휑) + Pr(퐺푦 | 퐺푥, 휑 = 1)휑 + Pr(퐺푦 | 퐺푥, Δ = 1)Δ.

This expression is a weighed summation of three probabilities, where 1−Δ−휑, 휑 and Δ are corresponding

weights. The first probability 퐺Pr( 푦 | 퐺푥, Δ = 휑 = 0) assumes that 푥 and 푦 are nonrelatives, and this can be calculated by using the genotypic frequencies under Hardy-Weinberg equilibrium. The second

and third probabilities Pr(퐺푦 | 퐺푥, 휑 = 1) and Pr(퐺푦 | 퐺푥, Δ = 1) denote the probabilities of genotype

퐺푦 being observed is conditional on genotype 퐺푥, where the former is conditional on the two individuals sharing one identical-by-descent allele (e.g. parent-offspring) and the latter is conditional on two identical- by-descent alleles (e.g. identical-twins). If the reference individual 푥 is a homozygote, then

Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖) = Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖, Δ = 휑 = 0)(1 − Δ − 휑)

+ Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖, 휑 = 1)휑

+ Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖, Δ = 1)Δ

2 = 푝푖 (1 − Δ − 휑) + 푝푖휑 + Δ,

namely 2 2 2 Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖) = 푝푖 + (1 − 푝푖 )Δ + (푝푖 − 푝푖 )휑; (i)

and

Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖) = Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖, Δ = 휑 = 0)(1 − Δ − 휑)

+ Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖, 휑 = 1)휑

+ Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖, Δ = 1)Δ

= 2푝푖푝푥(1 − Δ − 휑) + 푝푥휑 + 0Δ,

namely

Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖) = 2푝푖푝푥 + (−2푝푖푝푥)Δ + (푝푥 − 2푝푖푝푥)휑. (ii)

Now, combining expressions (i) with (ii), we obtain Expression (3a), that is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤ 2 2 2 Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖) 푝푖 1 − 푝푖 푝푖 − 푝푖 Δ ⎣ ⎦ = ⎣ ⎦ + ⎣ ⎦⎣ ⎦ . Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖) 2푝푖푝푥 −2푝푖푝푥 푝푥 − 2푝푖푝푥 휑

1 Next, if the reference individual 푥 is a heterozygote, then

Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗) = Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗, Δ = 휑 = 0)(1 − Δ − 휑)

+ Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗, 휑 = 1)휑

+ Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗, Δ = 1)Δ 1 = 푝2(1 − Δ − 휑) + 푝 휑 + 0Δ 푖 2 푖 (︂1 )︂ = 푝2 + (︀ − 푝2)︀Δ + 푝 − 푝2 휑, 푖 푖 2 푖 푖

Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗) = Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗, Δ = 휑 = 0)(1 − Δ − 휑)

+ Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗, 휑 = 1)휑

+ Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗, Δ = 1)Δ 1 = 푝2(1 − Δ − 휑) + 푝 휑 + 0Δ 푗 2 푗 (︂1 )︂ = 푝2 + (︀ − 푝2)︀Δ + 푝 − 푝2 휑, 푗 푗 2 푗 푗

Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) = Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗, Δ = 휑 = 0)(1 − Δ − 휑)

+ Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗, 휑 = 1)휑

+ Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗, Δ = 1)Δ 1 = 2푝 푝 (1 − Δ − 휑) + (푝 + 푝 )휑 + Δ 푖 푗 2 푖 푗 (︂1 1 )︂ = 2푝 푝 + (1 − 2푝 푝 )Δ + 푝 + 푝 − 2푝 푝 휑, 푖 푗 푖 푗 2 푖 2 푗 푖 푗

Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗) = Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗, Δ = 휑 = 0)(1 − Δ − 휑)

+ Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗, 휑 = 1)휑

+ Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗, Δ = 1)Δ 1 = 2푝 푝 (1 − Δ − 휑) + 푝 휑 + 0Δ 푖 푥 2 푥 (︂1 )︂ = 2푝 푝 + (−2푝 푝 )Δ + 푝 − 2푝 푝 휑, 푖 푥 푖 푥 2 푥 푖 푥

Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗) = Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗, Δ = 휑 = 0)(1 − Δ − 휑)

+ Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗, 휑 = 1)휑

+ Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗, Δ = 1)Δ 1 = 2푝 푝 (1 − Δ − 휑) + 푝 휑 + 0Δ 푗 푥 2 푥 (︂1 )︂ = 2푝 푝 + (−2푝 푝 )Δ + 푝 − 2푝 푝 휑. 푗 푥 푗 푥 2 푥 푗 푥

2 Now, it is clear that the following expression (i.e. Expression (3b)) holds: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 2 1 2 Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗) 푝푖 −푝푖 2 푝푖 − 푝푖 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2 ⎥ ⎢ 2 1 2 ⎥ ⎢ Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 푝푗 ⎥ ⎢ −푝푗 2 푝푗 − 푝푗 ⎥⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Δ ⎢ ⎥ = ⎢ ⎥ + ⎢ 1 1 ⎥ . ⎢ Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 2푝푖푝푗 ⎥ ⎢ 1 − 2푝푖푝푗 2 푝푖 + 2 푝푗 − 2푝푖푝푗 ⎥⎣ ⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 휑 ⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 2푝푖푝푥 ⎥ ⎢ −2푝푖푝푥 2 푝푥 − 2푝푖푝푥 ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 1 Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗) 2푝푗푝푥 −2푝푗푝푥 2 푝푥 − 2푝푗푝푥

The variances of 푟^ and Δ^ of corrected Lynch and Rit- land’s (1999) estimator

There are three and six genotype patterns for homozygous and heterozygous reference individuals, respectively. The probability of each genotype pattern been observed on condition that the relationship between proband and reference individuals are nonrelatives are listed in Table 1. The푟 ^ and Δ^ for each genotypes pattern can also be calculated. Therefore, Var(^푟) and Var(Δ)^ can be obtained by the variance formula Var(푋) = E(푋2) − E2(푋).

Table 1: Genotype patterns and the probabilities being observed in nonrelatives Genotype pattern Probability Expression 2 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 푝푖 + 2푝푖푝푦 퐴푖퐴푖-퐴푖퐴푖 Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖, 휑 = Δ = 0) 2 1 − 푝푦 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 2푝푖푝푥 퐴푖퐴푥-퐴푖퐴푖 Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖, 휑 = Δ = 0) 2 1 − 푝푦 2 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 푝푥 + 2푝푥푝푦 퐴푥퐴푥-퐴푖퐴푖 Pr(퐺푦 = 퐴푥퐴푥 | 퐺푥 = 퐴푖퐴푖, 휑 = Δ = 0) 2 1 − 푝푦 2 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 푝푖 + 2푝푖푝푦 퐴푖퐴푖-퐴푖퐴푗 Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗, 휑 = Δ = 0) 2 1 − 푝푦 푝2 + 2푝 푝 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 푗 푗 푦 퐴푗퐴푗-퐴푖퐴푗 Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗, 휑 = Δ = 0) 2 1 − 푝푦 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 2푝푖푝푗 퐴푖퐴푗-퐴푖퐴푗 Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗, 휑 = Δ = 0) 2 1 − 푝푦 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 2푝푖푝푥 퐴푖퐴푥-퐴푖퐴푗 Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗, 휑 = Δ = 0) 2 1 − 푝푦 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 2푝푗푝푥 퐴푗퐴푥-퐴푖퐴푗 Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗, 휑 = Δ = 0) 2 1 − 푝푦 2 ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 푝푥 + 2푝푥푝푦 퐴푥퐴푥-퐴푖퐴푗 Pr(퐺푦 = 퐴푥퐴푥 | 퐺푥 = 퐴푖퐴푗, 휑 = Δ = 0) 2 1 − 푝푦

Using a mathematics computational software package (Mathematica V9), the symbolic expressions of Var(^푟) and Var(Δ)^ can be calculated, and the results are as follows: for the case of homozygous reference individuals, 푝 휆 + 푝3휆 + 푝2휆 푝 휆 + 푝3휆 Var(^푟) = 푖 11 푖 12 푖 13 and Var(Δ)^ = 푖 21 푖 22 , 휇1 휇2

3 where

4 2 2 2 2 2 휆11 = 푝푖 (2 − 푝푦) (1 − 2푝푦) + 4푝푦(1 + 푝푦) (4 − 8푝푦 + 5푝푦),

2 3 4 5 2 3 4 5 휆12 = 4 − 44푝푦 + 45푝푦 + 78푝푦 − 67푝푦 − 16푝푦 + 2푝푖(−4 + 16푝푦 + 3푝푦 − 27푝푦 + 9푝푦 + 푝푦),

2 3 4 5 휆13 = 4푝푦(4 − 14푝푦 − 7푝푦 + 18푝푦 + 5푝푦 − 2푝푦), 2 [︀ 2 2 ]︀2 휇1 = 2(−1 + 푝푖 + 푝푦)(−1 + 푝푦) 푝푖 (−2 + 푝푦) + 2푝푦(1 + 푝푦) − 푝푖(−2 + 5푝푦 + 푝푦) ;

3 2 4 2 2 2 2 3 휆21 = 8푝푦(1 + 푝푦) − 푝푖 (−2 + 푝푦) (1 + 푝푦) + 4푝푖푝푦(5 + 푝푦 − 푝푦 + 3푝푦),

2 3 4 2 3 4 5 휆22 = 2푝푦(8 − 8푝푦 + 15푝푦 − 10푝푦 − 5푝푦) + 푝푖(4 − 16푝푦 + 21푝푦 − 25푝푦 + 7푝푦 + 푝푦), 2 [︀ 2 2 ]︀2 휇2 = (−1 + 푝푖 + 푝푦)(−1 + 푝푦) 푝푖 (−2 + 푝푦) + 2푝푦(1 + 푝푦) − 푝푖(−2 + 5푝푦 + 푝푦) .

For the case of heterozygous reference individuals, 2 2푝푖푝푗(휆11 + 푝푖 휆12 + 푝푖휆13) ^ 2푝푖푝푗(휆21 + 푝푖휆22 + 푝푖푝푗휆23) Var(^푟) = 2 3 2 and Var(Δ) = 3 3 , (−1 + 푝푦)(휇11 + 푝푖 휇12 + 푝푖휇13 + 푝푖 휇14) 휇21 + 푝푖 휇22 + 푝푖휇23 + 푝푖 휇24 where

2 휆11 = 2푝푦(1 − 푝푗)(푝푗 + 2푝푦)(−1 + 푝푦),

2 2 2 2 3 휆12 = 2푝푦(1 − 푝푦) + 푝푗 (−2 + 4푝푦) + 푝푗(1 − 3푝푦 − 푝푦 + 7푝푦),

2 3 2 2 3 2 3 4 휆13 = 2푝푦(−1 + 2푝푦 + 푝푦 − 2푝푦) + 푝푗 (1 − 3푝푦 − 푝푦 + 7푝푦) + 푝푗(−1 + 4푝푦 − 3푝푦 − 4푝푦 + 12푝푦),

2 휇11 = 2푝푗푝푦(−1 + 푝푗)(푝푗 + 2푝푦)(−1 + 푝푦),

3 2 2 2 휇12 = −8푝푗 + 푝푗 (6 − 14푝푦) + 2푝푦(−1 + 푝푦) + 푝푗(−1 + 12푝푦 + 푝푦),

2 2 3 2 2 2 3 2 3 휇13 = 4푝푦(1 − 푝푦) + 푝푗 (−1 + 12푝푦 + 푝푦) + 푝푗 (1 − 15푝푦 + 23푝푦 − 푝푦) + 4푝푗푝푦(1 − 7푝푦 − 푝푦 − 푝푦),

3 2 2 2 3 2 3 휇14 = 푝푗 (6 − 14푝푦) + 푝푗 (−6 + 24푝푦 − 26푝푦) + 푝푗(1 − 15푝푦 + 23푝푦 − 푝푦) + 2푝푦(1 − 2푝푦 − 푝푦 + 2푝푦);

2 2 2 2 2 2 휆21 = 2푝푗푝푦(푝푗 + 2푝푦)(−1 + 푝푦) + 2푝푖 푝푦(−1 + 푝푦) + 4푝푖 푝푗 (1 + 푝푦),

2 3 2 2 휆22 = 푝푖푝푗(−1 + 9푝푦 + 푝푦 + 7푝푦) + 4푝푦(−1 + 푝푦),

2 3 2 3 휆23 = 4푝푦(−1 + 5푝푦 + 푝푦 + 3푝푦) + 푝푗(−1 + 9푝푦 + 푝푦 + 7푝푦),

2 2 휇21 = (−1 + 푝푦) + 2푝푗푝푦(−1 + 푝푗)(푝푗 + 2푝푦)(−1 + 푝푦),

3 2 2 2 휇22 = −8푝푗 + 푝푗 (6 − 14푝푦) + 2푝푦(−1 + 푝푦) + 푝푗(−1 + 12푝푦 + 푝푦),

2 2 3 2 2 2 3 2 3 휇23 = 4푝푦(1 − 푝푦) + 푝푗 (−1 + 12푝푦 + 푝푦) + 푝푗 (1 − 15푝푦 + 23푝푦 − 푝푦) + 4푝푗푝푦(1 − 7푝푦 − 푝푦 − 푝푦),

3 2 2 2 3 2 3 휇24 = 푝푗 (6 − 14푝푦) + 푝푗 (−6 + 24푝푦 − 26푝푦) + 푝푗(1 − 15푝푦 + 23푝푦 − 푝푦) + 2푝푦(1 − 2푝푦 − 푝푦 + 2푝푦).

Especially, if 푝푦 = 0, the expressions of Var(^푟) and Var(Δ)^ above can be simplified. In fact, for the case of homozygous reference individuals, we have 2 푝푖 ^ 푝푖 Var(^푟) = and Var(Δ) = 2 ; 2 − 2푝푖 (1 − 푝푖) for the case of heterozygous reference individuals, we have 2푝 푝 2푝 푝 Var(^푟) = 푖 푗 and Var(Δ)^ = 푖 푗 . 푝푖 + 푝푗 − 4푝푖푝푗 1 − 푝푖 − 푝푗 + 2푝푖푝푗

4 Using the Kronecker operator 퐾푎푏, both cases above can be unified to write as 2푝 푝 2푝 푝 Var(^푟) = 푎 푏 and Var(Δ)^ = 푎 푏 . (1 + 퐾푎푏)(푝푎 + 푝푏) − 4푝푎푝푏 (1 + 퐾푎푏)(1 − 푝푎 − 푝푏) + 2푝푎푝푏

The derivation of Expressions (9a) and (9b)

′ ′ ′ If the observed genotype of the reference individual 푥 is a homozygote (i.e. 퐺푥 = 퐴푖퐴푖), letting 휌1 be the probability of 푆 = 1 and 퐺푦 is visible, we have

′ ′ ′ 휌1 = Pr(퐺푥 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖) Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ + Pr(퐺푥 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖) Pr(퐺푦 = 퐴푖퐴푦 | 퐺푥 = 퐴푖퐴푖, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ + Pr(퐺푥 = 퐴푖퐴푦 | 퐺푥 = 퐴푖퐴푖) Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푦, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ + Pr(퐺푥 = 퐴푖퐴푦 | 퐺푥 = 퐴푖퐴푖) Pr(퐺푦 = 퐴푖퐴푦 | 퐺푥 = 퐴푖퐴푦, 퐺푦 ̸= 퐴푦퐴푦) 2 푝푖 푝푖 (1 − 휑 − Δ) + 푝푖휑 + Δ 푝푖 2푝푖푝푦(1 − 휑 − Δ) + 푝푦휑 = · 2 + · 2 푝푖 + 2푝푦 (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ 푝푖 + 2푝푦 (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ 2푝 푝2(1 − 휑 − Δ) + 1 푝 휑 2푝 2푝 푝 (1 − 휑 − Δ) + 1 (푝 + 푝 )휑 + Δ + 푦 · 푖 2 푖 + 푦 · 푖 푦 2 푖 푦 . 2 1 2 1 푝푖 + 2푝푦 (1 − 푝푦)(1 − 휑 − Δ) + (1 − 2 푝푦)휑 + Δ 푝푖 + 2푝푦 (1 − 푝푦)(1 − 휑 − Δ) + (1 − 2 푝푦)휑 + Δ

As in Expression (8), an approximated expression of the probability 휌1 is given by the following equation: 3 2 2 푝푖 푝푖 푝푖 2푝푖 푝푦 휌1 ≈ 2 (1 − 휑 − Δ) + 휑 + Δ + 2 (1 − 휑 − Δ) (푝푖 + 2푝푦)(1 − 푝푦) 푝푖 + 2푝푦 푝푖 + 2푝푦 (푝푖 + 2푝푦)(1 − 푝푦) 2 푝푖푝푦 2푝푖 푝푦 2푝푖푝푦 + 휑 + 2 (1 − 휑 − Δ) + 휑 푝푖 + 2푝푦 (푝푖 + 2푝푦)(1 − 푝푦) (푝푖 + 2푝푦)(2 − 푝푦) 2 4푝푖푝푦 2푝푦(푝푖 + 푝푦) 2푝푦 + 2 (1 − 휑 − Δ) + 휑 + Δ (푝푖 + 2푝푦)(1 − 푝푦) (푝푖 + 2푝푦)(2 − 푝푦) 푝푖 + 2푝푦 2 푝푖 + 2푝푖푝푦 푝푖(푝푖 + 푝푦)(2 − 푝푦) + 2푝푦(푝푦 + 2푝푖) = 2 (1 − 휑 − Δ) + 휑 + Δ 1 − 푝푦 (푝푖 + 2푝푦)(2 − 푝푦) 2 2 2 푝푖 + 2푝푖푝푦 (︁ 푝푖 + 2푝푖푝푦 )︁ [︁푝푖(푝푖 + 푝푦)(2 − 푝푦) + 2푝푦(푝푦 + 2푝푖) 푝푖 + 2푝푖푝푦 ]︁ = 2 + 1 − 2 Δ + − 2 휑. 1 − 푝푦 1 − 푝푦 (푝푖 + 2푝푦)(2 − 푝푦) 1 − 푝푦 ′ ′ ′ ′ ′ ′ Likewise, letting 휌2 be the probability Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖, 퐺푦 ̸= 퐴푦퐴푦), we have

′ ′ ′ 휌2 = Pr(퐺푥 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖) Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ + Pr(퐺푥 = 퐴푖퐴푦 | 퐺푥 = 퐴푖퐴푖) Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푦, 퐺푦 ̸= 퐴푦퐴푦) 푝 2푝 푝 (1 − 휑 − Δ) + 푝 휑 2푝 2푝 푝 (1 − 휑 − Δ) + 1 푝 휑 = 푖 · 푖 푥 푥 + 푦 · 푖 푥 2 푥 , 2 2 1 푝푖 + 2푝푦 (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ 푝푖 + 2푝푦 (1 − 푝푦)(1 − 휑 − Δ) + (1 − 2 푝푦)휑 + Δ and an approximated expression of 휌2 is given by 2 2푝푖 푝푥 푝푖푝푥 4푝푖푝푥푝푦 2푝푥푝푦 휌2 ≈ 2 (1 − 휑 − Δ) + 휑 + 2 (1 − 휑 − Δ) + 휑 (푝푖 + 2푝푦)(1 − 푝푦) 푝푖 + 2푝푦 (푝푖 + 2푝푦)(1 − 푝푦) (푝푖 + 2푝푦)(2 − 푝푦)

2푝푖푝푥 푝푖푝푥(2 − 푝푦) + 2푝푥푝푦 = 2 (1 − 휑 − Δ) + 휑 1 − 푝푦 (푝푖 + 2푝푦)(2 − 푝푦)

2푝푖푝푥 (︁ 2푝푖푝푥 )︁ [︁푝푖푝푥(2 − 푝푦) + 2푝푥푝푦 2푝푖푝푥 ]︁ = 2 + 0 − 2 Δ + − 2 휑. 1 − 푝푦 1 − 푝푦 (푝푖 + 2푝푦)(2 − 푝푦) 1 − 푝푦

5 Now, it is clear that the following expressions (i.e. the expressions in (9a)) hold: ⎡ 2 ⎤ ⎡ ⎤ 푝푖 +2푝푖푝푦 푝푖(푝푖+푝푦 )(2−푝푦 )+2푝푦 (푝푦 +2푝푖) 2 1 * 1−푝푦 * (푝푖+2푝푦 )(2−푝푦 ) [︀ * *]︀ E = ⎣ ⎦ and M = ⎣ ⎦ − E , E 2푝푖푝푥 푝푖푝푥(2−푝푦 )+2푝푥푝푦 2 0 1−푝푦 (푝푖+2푝푦 )(2−푝푦 ) ′ ′ ′ Next, if the observed genotype of the reference individual 푥 is a heterozygote (i.e. 퐺푥 = 퐴푖퐴푗), then

′ ′ ′ ′ ′ ′ Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ = Pr(퐺푥 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ + Pr(퐺푥 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) Pr(퐺푦 = 퐴푖퐴푦 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦) 2 1 1 푝푖 (1 − 휑 − Δ) + 2 푝푖휑 2푝푖푝푦(1 − 휑 − Δ) + 2 푝푦휑 = 1 · 2 + 1 · 2 (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ 2 푝푖 1 2푝푖푝푦 1 ≈ 2 (1 − 휑 − Δ) + 푝푖휑 + 2 (1 − 휑 − Δ) + 푝푦휑 1 − 푝푦 2 1 − 푝푦 2 2 푝푖 + 2푝푖푝푦 푝푖 + 푝푦 = 2 (1 − 휑 − Δ) + 휑 1 − 푝푦 2 2 2 2 푝푖 + 2푝푖푝푦 (︁ 푝푖 + 2푝푖푝푦 )︁ (︁푝푖 + 푝푦 푝푖 + 2푝푖푝푦 )︁ = 2 + 0 − 2 Δ + − 2 휑, 1 − 푝푦 1 − 푝푦 2 1 − 푝푦

′ ′ ′ ′ ′ ′ Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ = Pr(퐺푥 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ + Pr(퐺푥 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) Pr(퐺푦 = 퐴푗퐴푦 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦) 2 1 1 푝푗 (1 − 휑 − Δ) + 2 푝푗휑 2푝푗푝푦(1 − 휑 − Δ) + 2 푝푦휑 = 1 · 2 + 1 · 2 (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ 2 푝푗 1 2푝푗푝푦 1 ≈ 2 (1 − 휑 − Δ) + 푝푗휑 + 2 (1 − 휑 − Δ) + 푝푦휑 1 − 푝푦 2 1 − 푝푦 2 2 푝푗 + 2푝푗푝푦 푝푗 + 푝푦 = 2 (1 − 휑 − Δ) + 휑 1 − 푝푦 2 2 2 2 푝푗 + 2푝푗푝푦 (︁ 푝푗 + 2푝푗푝푦 )︁ (︁푝푗 + 푝푦 푝푗 + 2푝푗푝푦 )︁ = 2 + 0 − 2 Δ + − 2 휑, 1 − 푝푦 1 − 푝푦 2 1 − 푝푦

′ ′ ′ ′ ′ ′ Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ = Pr(퐺푥 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦) 1 2푝푖푝푗(1 − 휑 − Δ) + 2 (푝푖 + 푝푗)휑 + Δ = 1 · 2 (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ 2푝푖푝푗 푝푖 + 푝푗 ≈ 2 (1 − 휑 − Δ) + 휑 + Δ 1 − 푝푦 2 2푝푖푝푗 (︁ 2푝푖푝푗 )︁ (︁푝푖 + 푝푗 2푝푖푝푗 )︁ = 2 + 1 − 2 Δ + − 2 휑, 1 − 푝푦 1 − 푝푦 2 1 − 푝푦

′ ′ ′ ′ ′ ′ Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ = Pr(퐺푥 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) Pr(퐺푦 = 퐴푖퐴푥|퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦) 1 2푝푖푝푥(1 − 휑 − Δ) + 2 푝푥휑 = 1 · 2 (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ

6 2푝푖푝푥 푝푥 ≈ 2 (1 − 휑 − Δ) + 휑 1 − 푝푦 2 2푝푖푝푥 (︁ 2푝푖푝푥 )︁ (︁푝푥 2푝푖푝푥 )︁ = 2 + 0 − 2 Δ + − 2 휑, 1 − 푝푦 1 − 푝푦 2 1 − 푝푦

′ ′ ′ ′ ′ ′ Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦)

′ ′ ′ = Pr(퐺푥 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗, 퐺푦 ̸= 퐴푦퐴푦) 1 2푝푗푝푥(1 − 휑 − Δ) + 2 푝푥휑 = 1 · 2 (1 − 푝푦)(1 − 휑 − Δ) + 휑 + Δ 2푝푗푝푥 푝푥 ≈ 2 (1 − 휑 − Δ) + 휑 1 − 푝푦 2 2푝푗푝푥 (︁ 2푝푗푝푥 )︁ (︁푝푥 2푝푗푝푥 )︁ = 2 + 0 − 2 Δ + − 2 휑. 1 − 푝푦 1 − 푝푦 2 1 − 푝푦 Now, it is clear that the following expressions (i.e. the expressions of (9b)) are valid:

⎡ 2 ⎤ 푝 +2푝푖푝푦 ⎡ ⎤ 푖 푝푖+푝푦 2 1−푝푦 0 2 ⎢ 2 ⎥ 푝 +2푝푗 푝푦 ⎢ ⎥ ⎢ 푗 ⎥ 푝푗 +푝푦 ⎢ 2 ⎥ ⎢ 0 ⎥ 1−푝푦 ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ E* = ⎢ 2푝푖푝푗 ⎥ and M* = ⎢ 푝푖+푝푗 ⎥ − [︀E*, E*]︀. ⎢ 1−푝2 ⎥ ⎢ 1 2 ⎥ ⎢ 푦 ⎥ ⎢ ⎥ ⎢ 2푝푖푝푥 ⎥ ⎢ 푝푥 ⎥ ⎢ 1−푝2 ⎥ ⎢ 0 2 ⎥ ⎢ 푦 ⎥ ⎣ ⎦ ⎣ 2푝푗 푝푥 ⎦ 푝푥 2 0 2 1−푝푦 The derivation of Expression (12)

Wang’s (2002) estimator uses the similarity index as defined in Expression (2), but it does not distinguish the reference and proband individuals. The probability that 푆 is equal to a specific value is calculated by taken the sum of the probabilities of a series of genotype pairs. ∑︀ 푚 ∑︀ It is known that the symbol 푎푚 (푚 = 1, 2, 3, 4) is used to denote the sum 푖 푝푖 . Note that 푖 푝푖 = 1, we see that 푎1 = 1. Next, because

∑︁ 푚 푛 ∑︁ 푚(︁ ∑︁ 푛)︁ ∑︁ 푚 푛 ∑︁ 푚 ∑︁ 푚+푛 푝푖 푝푗 = 푝푖 푝푗 = 푝푖 (푎푛 − 푝푖 ) = 푎푛 푝푖 − 푝푖 , 푖̸=푗 푖 푗̸=푖 푖 푖 푖 ∑︀ 푚 푛 ∑︀ 푛 we obtain 푖̸=푗 푝푖 푝푗 = 푎푚푎푛 − 푎푚+푛. Specially, by 푎1 = 1, we have 푖̸=푗 푝푖푝푗 = 푎푛 − 푎푛+1. Moreover, since

∑︁ 푚 ∑︁ 푚[︁ ∑︁ (︁ ∑︁ )︁]︁ ∑︁ 푚[︁ ∑︁ ]︁ 푝푖 푝푗푝푘 = 푝푖 푝푗 푝푘 = 푝푖 푝푗(1 − 푝푖 − 푝푗) 푗̸=푖 푖 푗̸=푖 푘̸=푖 푖 푗̸=푖 푘̸=푖 푘̸=푗 푘̸=푗

∑︁ 푚[︁ ∑︁ ∑︁ 2]︁ ∑︁ 푚 2 2 = 푝푖 (1 − 푝푖) 푝푗 − 푝푗 = 푝푖 [(1 − 푝푖) − (푎2 − 푝푖 )] 푖 푗̸=푖 푗̸=푖 푖

∑︁ 푚 ∑︁ 푚+1 ∑︁ 푚+2 = (1 − 푎2) 푝푖 − 2 푝푖 + 2 푝푖 , 푖 푖 푖

7 ∑︀ 푚 we get 푗̸=푖 푝푖 푝푗푝푘 = (1 − 푎2)푎푚 − 2푎푚+1 + 2푎푚+2. Specially, we have 푘̸=푖 푘̸=푗

∑︁ 2 2 푝푖 푝푗푝푘 = (1 − 푎2)푎2 − 2푎3 + 2푎4 = 푎2 − 푎2 − 2푎3 + 2푎4, 푗̸=푖 푘̸=푖 푘̸=푗 ∑︁ 푝푖푝푗푝푘 = (1 − 푎2)푎1 − 2푎2 + 2푎3 = 1 − 3푎2 + 2푎3. 푗̸=푖 푘̸=푖 푘̸=푗 Now, using these facts, we can calculate the following probabilities: ∑︁ ∑︁ Pr(푆 = 1) = Pr(퐺푥 = 퐴푖퐴푖, 퐺푦 = 퐴푖퐴푖) + Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푗) 푖 푖̸=푗

∑︁ 4 ∑︁ 3 ∑︁ 2 ∑︁ 2 2 ∑︁ 2 ∑︁ = (1 − 휑 − Δ) 푝푖 + 휑 푝푖 + Δ 푝푖 + (1 − 휑 − Δ) 2푝푖 푝푗 + 휑 푝푖푝푗 + Δ 푝푖푝푗 푖 푖 푖 푖̸=푗 푖̸=푗 푖̸=푗

2 = (1 − 휑 − Δ)푎4 + 휑푎3 + Δ푎2 + 2(1 − 휑 − Δ)(푎2 − 푎4) + 휑(푎2 − 푎3) + Δ(1 − 푎2)

2 = (1 − 휑 − Δ)(2푎2 − 푎4) + 휑푎2 + Δ

2 2 2 = (2푎2 − 푎4) + [1 − (2푎2 − 푎4)]Δ + [푎2 − (2푎2 − 푎4)]휑,

3 ∑︁ ∑︁ Pr(푆 = 4 ) = Pr(퐺푥 = 퐴푖퐴푖, 퐺푦 = 퐴푖퐴푗) + Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푖) 푖 푖̸=푗 ∑︁ = 2 Pr(퐺푥 = 퐴푖퐴푖, 퐺푦 = 퐴푖퐴푗) 푖 ∑︁ 3 ∑︁ 2 = (1 − 휑 − Δ) 4푝푖 푝푗 + 휑 2푝푖 푝푗 푖̸=푗 푖̸=푗

= (1 − 휑 − Δ)[4(푎3 − 푎4)] + 휑[2(푎2 − 푎3)]

= (4푎3 − 4푎4) − (4푎3 − 4푎4)Δ + [(2푎2 − 2푎3) − (4푎3 − 4푎4)]휑,

1 ∑︁ Pr(푆 = 2 ) = Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푘) 푗̸=푖 푘̸=푖 푘̸=푗

∑︁ 2 ∑︁ = (1 − 휑 − Δ) 4푝푖 푝푗푝푘 + 휑 푝푖푝푗푝푘 푗̸=푖 푗̸=푖 푘̸=푖 푘̸=푖 푘̸=푗 푘̸=푗

2 = (1 − 휑 − Δ)[4(푎2 − 푎2 − 2푎3 + 2푎4)] + 휑(1 − 3푎2 + 2푎3)

2 = (1 − 휑 − Δ)(4푎2 − 4푎2 − 8푎3 + 8푎4) + 휑(1 − 3푎2 + 2푎3)

2 2 2 = (4푎2 − 4푎2 − 8푎3 + 8푎4) − (4푎2 − 4푎2 − 8푎3 + 8푎4)Δ + (1 − 7푎2 + 4푎2 + 10푎3 − 8푎4)휑.

Combining the results from the foregoing three probabilities, they can be written as the following expres- sion (i.e. Expression (12)): ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Pr(푆 = 1) 휆1 1 − 휆1 푎2 − 휆1 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Δ ⎢ 3 ⎥ = ⎢ ⎥ + ⎢ ⎥ , ⎢ Pr(푆 = 4 ) ⎥ ⎢ 휆2 ⎥ ⎢ −휆2 2푎2 − 2푎3 − 휆2 ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 휑 1 Pr(푆 = 2 ) 휆3 −휆3 1 − 3푎2 + 2푎3 − 휆3

8 2 2 where 휆1 = 2푎2 − 푎4, 휆2 = 4푎3 − 4푎4 and 휆3 = 4푎2 − 4푎2 − 8푎3 + 8푎4.

The derivation of Expressions (14a) and (14b)

∑︀ 푚 It is known that 푏푚 denotes the sum 푖̸=푦 푝푖 , then

∑︁ 푚 푛 ∑︁ 푚(︁ ∑︁ 푛)︁ ∑︁ 푚 푛 푝푖 푝푗 = 푝푖 푝푗 = 푝푖 (푏푛 − 푝푖 ) = 푏푚푏푛 − 푏푚+푛, 푖̸=푦 푖̸=푦 푗̸=푦 푖̸=푦 푗̸=푦 푗̸=푖 푗̸=푖

∑︁ 푚 ∑︁ 푚[︁ ∑︁ (︁ ∑︁ )︁]︁ ∑︁ 푚[︁ ∑︁ ]︁ 푝푖 푝푗푝푘 = 푝푖 푝푗 푝푘 = 푝푖 푝푗(푏1 − 푝푖 − 푝푗) 푖̸=푦, 푘̸=푦 푖̸=푦 푗̸=푦 푘̸=푦 푖̸=푦 푗̸=푦 푗̸=푦, 푘̸=푖 푗̸=푖 푘̸=푖 푗̸=푖 푗̸=푖, 푘̸=푗 푘̸=푗

∑︁ 푚 2 2 ∑︁ 푚 2 2 = 푝푖 [(푏1 − 푝푖) − (푏2 − 푝푖 )] = 푝푖 (푏1 − 2푏1푝푖 − 푏2 + 2푝푖 ) 푖̸=푦 푖̸=푦

2 = 푏1푏푚 − 2푏1푏푚+1 − 푏2푏푚 + 2푏푚+2.

′ 3 1 With the presence of null alleles, the probabilities of 푆 = 1, 4 and 2 can be calculated as follows:

′ Pr(푆 = 1)[(푓1 − 푓3)Δ + (푓2 − 푓3)휑 + 푓3)]

∑︁ ′ ′ ′ ′ ′ ′ ∑︁ ′ ′ ′ ′ ′ ′ = Pr(퐺푥 = 퐴푖퐴푖, 퐺푦 = 퐴푖퐴푖) + Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푗) 푖̸=푦 푖̸=푦 푗̸=푦 푗̸=푖 ∑︁ ∑︁ [︁ ∑︁ = Pr(퐺푥 = 퐴푖퐴푖, 퐺푦 = 퐴푖퐴푖) + Pr(퐺푥 = 퐴푖퐴푦, 퐺푦 = 퐴푖퐴푦) + Pr(퐺푥 = 퐴푖퐴푦, 퐺푦 = 퐴푖퐴푖) 푖̸=푦 푖̸=푦 푖̸=푦 ∑︁ ]︁ ∑︁ + Pr(퐺푥 = 퐴푖퐴푖, 퐺푦 = 퐴푖퐴푦) + Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푗) 푖̸=푦 푖̸=푦 푗̸=푦 푗̸=푖 ∑︁ ∑︁ = Pr(퐺푥 = 퐴푖퐴푖, 퐺푦 = 퐴푖퐴푖) + Pr(퐺푥 = 퐴푖퐴푦, 퐺푦 = 퐴푖퐴푦) 푖̸=푦 푖̸=푦 ∑︁ ∑︁ + 2 Pr(퐺푥 = 퐴푖퐴푦, 퐺푦 = 퐴푖퐴푖) + Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푗) 푖̸=푦 푖̸=푦 푗̸=푦 푗̸=푖

∑︁ 4 ∑︁ 3 ∑︁ 2 ∑︁ 2 2 ∑︁ 2 2 ∑︁ = (1 − 휑 − Δ) 푝푖 + 휑 푝푖 + Δ 푝푖 + (1 − 휑 − Δ) 4푝푖 푝푦 + 휑 (푝푖 푝푦 + 푝푖푝푦) + Δ 2푝푖푝푦 푖̸=푦 푖̸=푦 푖̸=푦 푖̸=푦 푖̸=푦 푖̸=푦

[︁ ∑︁ 3 ∑︁ 2 ]︁ ∑︁ 2 2 ∑︁ 2 ∑︁ + 2 (1 − 휑 − Δ) 2푝푖 푝푦 + 휑 푝푖 푝푦 + (1 − 휑 − Δ) 2푝푖 푝푗 + 휑 푝푖푝푗 + Δ 푝푖푝푗 푖̸=푦 푖̸=푦 푖̸=푦 푖̸=푦 푖̸=푦 푗̸=푦 푗̸=푦 푗̸=푦 푗̸=푖 푗̸=푖 푗̸=푖

2 2 = (1 − 휑 − Δ)푏4 + 휑푏3 + Δ푏2 + (1 − 휑 − Δ)(4푝푦푏2) + 휑(푝푦푏2 + 푝푦푏1) + Δ(2푝푦푏1)

2 2 + 2[(1 − 휑 − Δ)(2푝푦푏3) + 휑푝푦푏2] + (1 − 휑 − Δ)[2(푏2 − 푏4)] + 휑(푏1푏2 − 푏3) + Δ(푏1 − 푏2)

2 2 2 2 = (1 − 휑 − Δ)(2푏2 − 푏4 + 4푝푦푏2 + 4푝푦푏3) + 휑(푏1푏2 + 푝푦푏1 + 3푝푦푏2) + Δ(푏1 + 2푝푦푏1)

2 2 2 2 = 휆1 + (푏1 + 2푝푦푏1 − 휆1)Δ + (푏1푏2 + 푝푦푏1 + 3푝푦푏2 − 휆1)휑 (where 휆1 = 2푏2 − 푏4 + 4푝푦푏2 + 4푝푦푏3),

′ 3 Pr(푆 = 4 )[(푓1 − 푓3)Δ + (푓2 − 푓3)휑 + 푓3)]

9 ∑︁ ′ ′ ′ ′ ′ ′ ∑︁ ′ ′ ′ ′ ′ ′ = Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푖) + Pr(퐺푥 = 퐴푖퐴푖, 퐺푦 = 퐴푖퐴푗) 푖̸=푦 푖̸=푦 푗̸=푦 푗̸=푦 푗̸=푖 푗̸=푖

∑︁ ′ ′ ′ ′ ′ ′ = 2 Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푖) 푖̸=푦 ∑︁ ∑︁ = 2 Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푖) + 2 Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푦) 푖̸=푦 푖̸=푦

∑︁ 3 ∑︁ 2 ∑︁ 2 ∑︁ = 2(1 − 휑 − Δ) 2푝푖 푝푗 + 2휑 푝푖 푝푗 + 2(1 − 휑 − Δ) 4푝푖 푝푗푝푦 + 2휑 푝푖푝푗푝푦 푖̸=푦 푖̸=푦 푖̸=푦 푖̸=푦 푗̸=푦 푗̸=푦 푗̸=푦 푗̸=푦 푗̸=푖 푗̸=푖 푗̸=푖 푗̸=푖

2 = 2(1 − 휑 − Δ)[2(푏1푏3 − 푏4)] + 2휑(푏1푏2 − 푏3) + 2(1 − 휑 − Δ)[4푝푦(푏1푏2 − 푏3)] + 2휑[푝푦(푏1 − 푏2)]

2 = (1 − 휑 − Δ)(4푏1푏3 − 4푏4 + 8푝푦푏1푏2 − 8푝푦푏3) + 휑(2푏1푏2 − 2푏3 + 2푝푦푏1 − 2푝푦푏2)

2 = 휆2 + (0 − 휆2)Δ + (2푏1푏2 − 2푏3 + 2푝푦푏1 − 2푝푦푏2 − 휆2)휑 (where 휆2 = 4푏1푏3 − 4푏4 + 8푝푦푏1푏2 − 8푝푦푏3),

′ 1 Pr(푆 = 2 )[(푓1 − 푓3)Δ + (푓2 − 푓3)휑 + 푓3)] ∑︁ ′ ′ ′ ′ ′ ′ = Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푘) 푖̸=푦, 푘̸=푦 푗̸=푦, 푘̸=푖 푗̸=푖, 푘̸=푗 ∑︁ = Pr(퐺푥 = 퐴푖퐴푗, 퐺푦 = 퐴푖퐴푘) 푖̸=푦, 푘̸=푦 푗̸=푦, 푘̸=푖 푗̸=푖, 푘̸=푗

∑︁ 2 ∑︁ = (1 − 휑 − Δ) 4푝푖 푝푗푝푘 + 휑 푝푖푝푗푝푘 푖̸=푦, 푘̸=푦 푖̸=푦, 푘̸=푦 푗̸=푦, 푘̸=푖 푗̸=푦, 푘̸=푖 푗̸=푖, 푘̸=푗 푗̸=푖, 푘̸=푗

2 2 3 = (1 − 휑 − Δ)[4(푏1푏2 − 2푏1푏3 − 푏2 + 2푏4)] + 휑(푏1 − 2푏1푏2 − 푏2푏1 + 2푏3)

3 2 2 = 휆3 + (0 − 휆3)Δ + (푏1 − 3푏1푏2 + 2푏3 − 휆3)휑 (where 휆3 = 4푏1푏2 − 8푏1푏3 − 4푏2 + 8푏4).

′ Note that P and 휆1, 휆2, 휆3 are respectively ⎡ ⎤ ⎧ ′ 2 2 Pr(푆 = 1) ⎪휆1 = 2푏2 − 푏4 + 4푝푦푏2 + 4푝푦푏3, ⎢ ⎥ ⎨⎪ P′ = ⎢ ′ 3 ⎥ and ⎢ Pr(푆 = 4 ) ⎥ 휆2 = 4푏1푏3 − 4푏4 + 8푝푦푏1푏2 − 8푝푦푏3, ⎣ ⎦ ⎪ ′ 1 ⎪ 2 2 Pr(푆 = 2 ) ⎩휆3 = 4푏1푏2 − 8푏1푏3 − 4푏2 + 8푏4. It is clear from the calculated results above that the matrices E′ and M′ in Expression (13), i.e. the ′ −1 ′ ′ expression P = [(푓1 − 푓3)Δ + (푓2 − 푓3)휑 + 푓3] (E + M Δ), are as follows: ⎡ ⎤ 2 2 2푏2 − 푏4 + 4푝푦푏2 + 4푝푦푏3 ⎢ ⎥ ′ ⎢ ⎥ E = ⎢ 4푏1푏3 − 4푏4 + 8푝푦푏1푏2 − 8푝푦푏3 ⎥ , ⎣ ⎦ 2 2 4푏1푏2 − 8푏1푏3 − 4푏2 + 8푏4 ⎡ ⎤ 2 2 푏1 + 2푝푦푏1 푏1푏2 + 푝푦푏1 + 3푝푦푏2 ⎢ ⎥ M′ = ⎢ 2 ⎥ − [︀E′, E′]︀. ⎢ 0 2푏1푏2 − 2푏3 + 2푝푦푏1 − 2푝푦푏2 ⎥ ⎣ ⎦ 3 0 푏1 − 3푏1푏2 + 2푏3 The last two expressions are just Expressions (14a) and (14b).

10 The biases of various estimators before and after cor- rection

The bias curves (Figures 2 and 3 in the article) of these estimators can be divided into three categories: (i) L&R and NA; (ii) WA, TH and NB; (iii) A&W. The bias of the L&R and NA estimators are highly independent on the number of observed alleles, whilst those of the WA, TH and NB estimators are not. Because the NA estimator is a modification of the L&R estimator, and because the TH and NB estimators are modifications of the WA estimator, the curves within a category are similar. We derived thebias curve of method-of-moment estimators. For Anderson and Weir’s (2007) estimator,푟 ^ is obtained from a numerical algorithm, so the analytical solution can not be solved.

Lynch and Ritland’s (1999) estimator

We enumerate all possible genotype pairs of the two individuals 푥 and 푦, and weighted the푟 ^ by the probability of each genotype pair to obtain the symbolic solution of the bias of푟 ^. 푛 푛 푛 푛 푛 푖 {︁ ∑︁ ∑︁ [︁ ∑︁ ∑︁ ′ ′ ∑︁ ∑︁ ′ ′ Bias(^푟) = (1 − 휑 − Δ) 푝푖푝푗푝푘푝푙푉 (퐺푥, 퐺푦)^푟(퐺푥, 퐺푦) + 휑 푝푖푝푗푝푘푉 (퐺푥, 퐺푦)^푟(퐺푥, 퐺푦) 푖=1 푗=1 푘=1 푙=1 푘=1 푙=푖 푖 푗  ∑︁ ∑︁ ′ ′ ]︁}︁ +Δ 푝푖푝푗푉 (퐺푥, 퐺푦)^푟(퐺푥, 퐺푦) [(푓1 − 푓3)Δ + (푓2 − 푓3)휑 + 푓3] − Δ − 휑/2. (iii) 푘=푖 푙=푗

Where (푓1 − 푓3)Δ + (푓2 − 푓3)휑 + 푓3 is the probability that the genotypes of both individuals are visible,

푉 (퐺푥, 퐺푦) is a binary function to test whether the both genotypes 퐺푥 = 퐴푖퐴푗 and 퐺푦 = 퐴푘퐴푙 are visible, which is defined by ⎧ ⎨ 1 if 퐺푥 ̸= 퐴푦퐴푦 and 퐺푦 ̸= 퐴푦퐴푦, 푉 (퐺푥, 퐺푦) = (iv) ⎩ 0 otherwise. ′ ′ ′ ′ In Expression (iii), 퐺푥 and 퐺푦 are the observed genotypes and푟 ^(퐺푥, 퐺푦) is the estimated relatedness of Lynch and Ritland’s (1999) estimator (Equation (5)). Note that before correction, the observed allele frequencies are also not the true values: the sum of visible allele frequencies is extended to one, say ∑︀ 푝′ = 1 and 푝′ = 푝푖 (푖 ̸= 푦). If the frequencies of visible alleles are drawn from a triangular 푖̸=푦 푖 푖 1−푝푦 2푖 ′ 2푖(1−푝푦 ) ′ distribution, then 푝푖 = 푛′(푛′+1) and 푝푖 = 푛′(푛′+1) (푖 ̸= 푦), and the Bias(^푟) can be solved, for 푛 ≥ 2:

(Δ + 휑/2)(1 + 푝푦) Bias(^푟) = 2 3 . 1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦 From the expression above, we can find that Bias(^푟) is not a function of 푛′. Therefore, in Figure 2, the bias curve of L&R estimator is independent on 푛′.

Corrected Lynch and Ritland’s (1999) estimator

′ ′ After correction, Bias(^푟) can be solved by using the same way, with the푟 ^(퐺푥, 퐺푦) in Expression (iv) been replaced with the corrected L&R estimator. The Bias(^푟) is dependent on 푛′. Although we fail to

11 obtain the general form, the solution of Bias(^푟) for each 푛′ can be calculated. Because these expressions become more complex, we only show the bias of푟 ^ for 2 ≤ 푛′ ≤ 6, and the numerical solution is shown in 2 3 Table 2. In the following, we let 휇 = 1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦, then ′ −1 (1) Bias(^푟, 푛 = 2) = −Δ − 휑/2 + (2휆21Δ + 휆22휑)(휆23휇) , where

2 3 4 5 6 7 휆21 = 4 + 2푝푦 + 48푝푦 + 51푝푦 + 171푝푦 + 261푝푦 + 101푝푦 + 10푝푦,

2 3 4 5 6 7 8 휆22 = 4 + 2푝푦 + 46푝푦 + 51푝푦 + 132푝푦 + 252푝푦 − 85푝푦 − 71푝푦 − 7푝푦,

2 3 2 3 휆23 = 2(2 + 푝푦 + 13푝푦 + 2푝푦)(2 − 2푝푦 + 13푝푦 + 5푝푦);

′ −1 (2) Bias(^푟, 푛 = 3) = −Δ − 휑/2 + (2휆31Δ + 휆32휑)(휆33휇) , where

2 3 4 5 6 7 휆31 = 40 + 180푝푦 + 998푝푦 + 2691푝푦 + 8226푝푦 + 13689푝푦 + 24068푝푦 + 23877푝푦

8 9 10 + 8098푝푦 + 1035푝푦 + 42푝푦 ,

2 3 4 5 6 7 휆32 = 40 + 180푝푦 + 978푝푦 + 2591푝푦 + 7593푝푦 + 11933푝푦 + 18176푝푦 + 15606푝푦

8 9 10 11 − 9711푝푦 − 5231푝푦 − 660푝푦 − 23푝푦 ,

2 3 2 3 2 3 휆33 = 2(2 + 푝푦 + 13푝푦 + 2푝푦)(2 − 푝푦 + 12푝푦 + 3푝푦)(10 + 35푝푦 + 92푝푦 + 7푝푦);

′ −1 (3) Bias(^푟, 푛 = 4) = −Δ − 휑/2 + (2휆41Δ + 휆42휑)(휆43휆44휇) , where

2 3 4 5 휆41 = 18144 + 214704푝푦 + 1310112푝푦 + 5939124푝푦 + 19865538푝푦 + 52830058푝푦

6 7 8 9 10 + 111221531푝푦 + 176504833푝푦 + 218922427푝푦 + 157800497푝푦 + 48247633푝푦

11 12 13 + 6701775푝푦 + 414615푝푦 + 9009푝푦 ,

2 3 4 5 휆42 = 18144 + 214704푝푦 + 1301040푝푦 + 5818164푝푦 + 19064214푝푦 + 48926482푝푦

6 7 8 9 10 + 97839029푝푦 + 139971751푝푦 + 144883154푝푦 + 47354737푝푦 − 70687906푝푦

11 12 13 14 − 30301523푝푦 − 4170916푝푦 − 227003푝푦 − 4071푝푦 ,

2 3 2 3 휆43 = 2(4 + 10푝푦 + 33푝푦 + 3푝푦)(6 + 37푝푦 + 7푝푦),

2 3 2 3 휆44 = (18 + 135푝푦 + 236푝푦 + 11푝푦)(42 + 35푝푦 + 284푝푦 + 39푝푦);

′ −1 (4) Bias(^푟, 푛 = 5) = −Δ − 휑/2 + (2휆51Δ + 휆52휑)(휆53휆54휇) , where

2 3 4 휆51 = 128128 + 2914912푝푦 + 27446800푝푦 + 171610800푝푦 + 779358084푝푦

5 6 7 8 + 2753453190푝푦 + 7656153022푝푦 + 17031462637푝푦 + 29663170036푝푦

9 10 11 12 + 39608425347푝푦 + 37781928630푝푦 + 21577639263푝푦 + 6040014268푝푦

13 14 15 16 + 864678919푝푦 + 64270024푝푦 + 2314932푝푦 + 31008푝푦 ,

2 3 4 휆52 = 128128 + 2914912푝푦 + 27382736푝푦 + 169993184푝푦 + 762706068푝푦

5 6 7 8 + 2642767874푝푦 + 7131386478푝푦 + 15135519959푝푦 + 24384594730푝푦

9 10 11 12 + 28131218262푝푦 + 19033914707푝푦 − 787700001푝푦 − 10294303102푝푦

13 14 15 16 17 − 3754393974푝푦 − 536626095푝푦 − 35896604푝푦 − 1095314푝푦 − 11948푝푦 ,

2 3 2 3 2 3 휆53 = 2(2 + 푝푦 + 13푝푦 + 2푝푦)(4 + 10푝푦 + 33푝푦 + 3푝푦)(14 + 175푝푦 + 253푝푦 + 8푝푦),

12 2 3 2 3 휆54 = (26 + 130푝푦 + 277푝푦 + 17푝푦)(44 + 55푝푦 + 313푝푦 + 38푝푦);

′ −1 (5) Bias(^푟, 푛 = 6) = −Δ − 휑/2 + (2휆61Δ + 휆62휑)(휆63휆64휆65휇) , where

2 3 휆61 = 248064000 + 9289996800푝푦 + 137054051200푝푦 + 1206977214560푝푦

4 5 6 + 7478912348960푝푦 + 34971140501296푝푦 + 128318338268268푝푦

7 8 9 + 376010959730232푝푦 + 885015211256668푝푦 + 1662157794116498푝푦

10 11 12 + 2441370328609961푝푦 + 2701406869519177푝푦 + 2087707108083687푝푦

13 14 15 + 998780278713489푝푦 + 258916743498697푝푦 + 37328014277531푝푦

16 17 18 19 + 3064633669073푝푦 + 140507178121푝푦 + 3282998190푝푦 + 29601000푝푦 ,

2 3 휆62 = 248064000 + 9289996800푝푦 + 136930019200푝푦 + 1201898104160푝푦

4 5 6 + 7397157964960푝푦 + 34202495888336푝푦 + 123325558158108푝푦

7 8 9 + 351945664942868푝푦 + 795449297019884푝푦 + 1400682977460974푝푦

10 11 12 + 1840758497928775푝푦 + 1631376876416498푝푦 + 656223687892950푝푦

13 14 15 − 345308229230269푝푦 − 499737648215188푝푦 − 160563926216347푝푦

16 17 18 − 23264813191808푝푦 − 1752919253641푝푦 − 69812259099푝푦

19 20 − 1360693027푝푦 − 9949430푝푦 ,

2 3 2 3 휆63 = 2(6 + 27푝푦 + 61푝푦 + 4푝푦)(10 + 10푝푦 + 69푝푦 + 9푝푦),

2 3 2 3 휆64 = (20 + 370푝푦 + 481푝푦 + 11푝푦)(38 + 304푝푦 + 517푝푦 + 23푝푦),

2 3 2 3 휆65 = (68 + 187푝푦 + 577푝푦 + 50푝푦)(80 + 136푝푦 + 601푝푦 + 65푝푦).

Noval estimator A

The bias curve of NA estimator can also be obtained by using the same methods as Lynch and ′ ′ Ritland’s (1999) estimator, with the푟 ^(퐺푥, 퐺푦) calculated using Equation (10). However, as shown in Figure 2, although the curves of the NA estimator are similar to those of the L&R estimator, the bias curves are dependent on 푛′. We fail to obtain the general form, but the solution of Bias(^푟) for each 푛′ can be calculated. The calculated results for 2 ≤ 푛′ ≤ 6 are as follows. ′ −1 (1) Bias(^푟, 푛 = 2) = −Δ − 휑/2 + (휆21 + 휆22Δ + 휆23휑)(휆24 + 휆25Δ + 휆26휑) , where

2 2 3 휆21 = (−1 + 푝푦) (7 + 11푝푦), 휆22 = 38 + 120푝푦 + 15푝푦 − 11푝푦, 2 3 2 휆23 = 20 + 39푝푦 + 33푝푦 − 11푝푦, 휆24 = (1 − 푝푦)(5 + 13푝푦) , 2 3 2 3 휆25 = 20 + 12푝푦 − 39푝푦 + 169푝푦, 휆26 = 2 + 12푝푦 − 102푝푦 + 169푝푦; ′ −1 (2) Bias(^푟, 푛 = 3) = −Δ − 휑/2 + (휆31 + 2휆32Δ + 휆33휑)휆34 ,

2 휆31 = 1718404(1 − 푝푦) 푝푦,

2 3 휆32 = 1005795 + 1864997푝푦 − 1718404푝푦 + 859202푝푦,

2 3 휆33 = 1005795 + 2286403푝푦 − 2999012푝푦 + 1718404푝푦,

13 Table 2: The bias of푟 ^ of Lynch and Ritland’s (1999) after correction Relationship Relationship ′ ′ 푝푦 푛 Parent 푛 Parent Unrelated Half-sibs Full-sibs -offspringUnrelated Half-sibs Full-sibs -offspring 2 0.000000 0.001124 0.002266 0.002019 9 0.000000 0.000744 0.001887 0.001259 3 0.000000 0.001043 0.002185 0.001857 10 0.000000 0.000713 0.001856 0.001196 4 0.000000 0.000973 0.002115 0.001717 11 0.000000 0.000685 0.001828 0.001140 0.1 5 0.000000 0.000914 0.002056 0.001598 12 0.000000 0.000659 0.001802 0.001089 6 0.000000 0.000863 0.002005 0.001496 13 0.000000 0.000636 0.001779 0.001042 7 0.000000 0.000818 0.001961 0.001407 14 0.000000 0.000614 0.001757 0.000999 8 0.000000 0.000779 0.001922 0.001328 15 0.000000 0.000595 0.001738 0.000960 2 0.000000 0.006906 0.015919 0.008079 9 0.000000 0.004589 0.013657 0.003496 3 0.000000 0.006392 0.015417 0.007062 10 0.000000 0.004449 0.013521 0.003219 4 0.000000 0.005896 0.014934 0.006082 11 0.000000 0.004329 0.013404 0.002983 0.3 5 0.000000 0.005506 0.014553 0.005311 12 0.000000 0.004226 0.013303 0.002779 6 0.000000 0.005200 0.014254 0.004705 13 0.000000 0.004136 0.013215 0.002601 7 0.000000 0.004955 0.014015 0.004220 14 0.000000 0.004057 0.013138 0.002444 8 0.000000 0.004755 0.013819 0.003825 15 0.000000 0.003986 0.013069 0.002305 2 0.000000 0.016953 0.040103 0.007212 9 0.000000 0.014792 0.038100 0.003105 3 0.000000 0.016599 0.039775 0.006538 10 0.000000 0.014654 0.037972 0.002843 4 0.000000 0.016115 0.039326 0.005618 11 0.000000 0.014538 0.037864 0.002622 0.5 5 0.000000 0.015718 0.038958 0.004864 12 0.000000 0.014438 0.037772 0.002432 6 0.000000 0.015406 0.038669 0.004270 13 0.000000 0.014352 0.037692 0.002268 7 0.000000 0.015157 0.038439 0.003799 14 0.000000 0.014276 0.037622 0.002124 8 0.000000 0.014957 0.038253 0.003418 15 0.000000 0.014209 0.037560 0.001997 2 0.000000 0.026800 0.059042 0.005383 9 0.000000 0.025171 0.057596 0.002408 3 0.000000 0.026588 0.058854 0.004996 10 0.000000 0.025060 0.057498 0.002206 4 0.000000 0.026222 0.058529 0.004328 11 0.000000 0.024966 0.057415 0.002034 0.6 5 0.000000 0.025911 0.058253 0.003760 12 0.000000 0.024885 0.057343 0.001887 6 0.000000 0.025663 0.058033 0.003307 13 0.000000 0.024815 0.057281 0.001759 7 0.000000 0.025464 0.057857 0.002944 14 0.000000 0.024754 0.057227 0.001648 8 0.000000 0.025303 0.057714 0.002650 15 0.000000 0.024700 0.057179 0.001549

14 2 3 휆34 = 2011590[1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦];

′ −1 (3) Bias(^푟, 푛 = 4) = −Δ − 휑/2 + (휆41 + 2휆42Δ + 휆43휑)휆44 , where

2 휆41 = − 624091473188(−1 + 푝푦) 푝푦,

2 3 휆42 = 602638296435 + 914684033029푝푦 − 624091473188푝푦 + 312045736594푝푦,

2 3 휆43 = 602638296435 + 923267324083푝푦 − 944720500836푝푦 + 624091473188푝푦,

2 3 휆44 = 1205276592870[1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦];

′ −1 (4) Bias(^푟, 푛 = 5) = −Δ − 휑/2 + (휆51 + 2휆52Δ + 휆53휑)휆54 , where

2 휆51 = 19685159108451104(1 − 푝푦) 푝푦,

2 휆52 = 26698375509931125 + 36540955064156677푝푦 − 19685159108451104푝푦

3 + 9842579554225552푝푦,

2 휆53 = 26698375509931125 + 32592836333532839푝푦 − 25579619932052818푝푦

3 + 19685159108451104푝푦,

2 3 휆54 = 53396751019862250[1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦];

′ −1 (5) Bias(^푟, 푛 = 6) = −Δ − 휑/2 + (휆61 + 휆62 − 휆63 − 휆64)휆65 , where

2 3 휆61 = Δ(−1 + 푝푦) − 1/2휑(−1 + 푝푦) − 2Δ(−1 + 푝푦)푝푦, 17752956739302371838068178659863휑(−1 + 푝 )2푝 휆 = 푦 푦 , 62 14309643024953567059203104069400

4430928163818870667758088716781(−1 + Δ + 휑)(−1 + 푝 )3푝 휆 = 푦 푦 , 63 15467040622560105571344531604425

2 휆64 = 휑(−1 + 푝푦)푝푦,

2 3 4 휆65 = 1 + (−2 + Δ)푝푦 + 휑푝푦 − (−1 + Δ + 휑)푝푦.

Because the expressions for a large 푛′ are complex, the approximate numerical solution for 푛′ ≥ 6 is calculated by the following formula:

−1 Bias(^푟) ≈ −Δ − 휑/2 + (휆1 + 휆2휇1 − 휆3휇2 − 휆4)휆5 , where

2 3 휆1 = Δ(−1 + 푝푦) − 0.5휑(−1 + 푝푦) − 2Δ(−1 + 푝푦)푝푦,

2 휆2 = 휑(−1 + 푝푦) 푝푦,

3 휆3 = (−1 + Δ + 휑)(−1 + 푝푦) 푝푦,

2 휆4 = 휑(−1 + 푝푦)푝푦,

2 3 4 휆5 = 1 + (−2 + Δ)푝푦 + 휑푝푦 − (−1 + Δ + 휑)푝푦, and the values of 휇1 and 휇2 are listed in the following table:

15 ′ ′ 푛 휇1 휇2 푛 휇1 휇2 6 1.24063 0.286475 11 1.24316 0.136650 7 1.24085 0.234555 12 1.24364 0.123819 8 1.24142 0.198748 13 1.24406 0.113206 9 1.24204 0.172530 14 1.24443 0.104279 10 1.24263 0.152485 15 1.24477 0.0966651

Corrected Noval estimator A

The expressions of Bias(^푟) become complex after correction, and we can only shown those for 2 ≤ 푛′ ≤ 6, with the numerical solution shown in Table 3. At 푛′ = 2, some genotype pairs encounter the problem of singular matrices. In these cases,푟 ^ is unsolvable. Therefore these results are not incorporated for numerical solution, and the analytical solution at 푛′ = 2 is also not given. ′ −1 (1) Bias(^푟, 푛 = 3) = −Δ − 휑/2 + (2휆31Δ + 휆32휑)(휆33휆34) , where

2 3 4 5 6 7 휆31 = 40 + 180푝푦 + 998푝푦 + 2691푝푦 + 8226푝푦 + 13689푝푦 + 24068푝푦 + 23877푝푦

8 9 10 + 8098푝푦 + 1035푝푦 + 42푝푦 ,

2 3 4 5 6 7 휆32 = 40 + 180푝푦 + 978푝푦 + 2591푝푦 + 7593푝푦 + 11933푝푦 + 18176푝푦 + 15606푝푦

8 9 10 11 − 9711푝푦 − 5231푝푦 − 660푝푦 − 23푝푦 ,

2 3 2 3 2 3 휆33 = 2(2 + 푝푦 + 13푝푦 + 2푝푦)(2 − 푝푦 + 12푝푦 + 3푝푦)(10 + 35푝푦 + 92푝푦 + 7푝푦),

2 3 휆34 = 1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦;

′ −1 (2) Bias(^푟, 푛 = 4) = −Δ − 휑/2 + (2휆41Δ + 휆42휑)(휆43휆44) , where

2 3 4 5 휆41 = 18144 + 214704푝푦 + 1310112푝푦 + 5939124푝푦 + 19865538푝푦 + 52830058푝푦

6 7 8 9 10 + 111221531푝푦 + 176504833푝푦 + 218922427푝푦 + 157800497푝푦 + 48247633푝푦

11 12 13 + 6701775푝푦 + 414615푝푦 + 9009푝푦 ,

2 3 4 5 휆42 = 18144 + 214704푝푦 + 1301040푝푦 + 5818164푝푦 + 19064214푝푦 + 48926482푝푦

6 7 8 9 10 + 97839029푝푦 + 139971751푝푦 + 144883154푝푦 + 47354737푝푦 − 70687906푝푦

11 12 13 14 − 30301523푝푦 − 4170916푝푦 − 227003푝푦 − 4071푝푦 ,

2 3 2 3 2 3 휆43 = 2(4 + 10푝푦 + 33푝푦 + 3푝푦)(6 + 37푝푦 + 7푝푦)(18 + 135푝푦 + 236푝푦 + 11푝푦),

2 3 2 3 휆44 = (42 + 35푝푦 + 284푝푦 + 39푝푦)[1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦];

′ −1 (3) Bias(^푟, 푛 = 5) = −Δ − 휑/2 + (2휆51Δ + 휆52휑)(휆53휆54휆55) , where

2 3 4 휆51 = 128128 + 2914912푝푦 + 27446800푝푦 + 171610800푝푦 + 779358084푝푦

5 6 7 8 + 2753453190푝푦 + 7656153022푝푦 + 17031462637푝푦 + 29663170036푝푦

9 10 11 12 + 39608425347푝푦 + 37781928630푝푦 + 21577639263푝푦 + 6040014268푝푦

13 14 15 16 + 864678919푝푦 + 64270024푝푦 + 2314932푝푦 + 31008푝푦 ,

16 2 3 4 휆52 = 128128 + 2914912푝푦 + 27382736푝푦 + 169993184푝푦 + 762706068푝푦

5 6 7 8 + 2642767874푝푦 + 7131386478푝푦 + 15135519959푝푦 + 24384594730푝푦

9 10 11 12 + 28131218262푝푦 + 19033914707푝푦 − 787700001푝푦 − 10294303102푝푦

13 14 15 16 17 − 3754393974푝푦 − 536626095푝푦 − 35896604푝푦 − 1095314푝푦 − 11948푝푦 ,

2 3 2 3 휆53 = 2(2 + 푝푦 + 13푝푦 + 2푝푦)(4 + 10푝푦 + 33푝푦 + 3푝푦),

2 3 2 3 휆54 = (14 + 175푝푦 + 253푝푦 + 8푝푦)(26 + 130푝푦 + 277푝푦 + 17푝푦),

2 3 2 3 휆55 = (44 + 55푝푦 + 313푝푦 + 38푝푦)[1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦];

′ −1 (4) Bias(^푟, 푛 = 6) = −Δ − 휑/2 + (2휆61Δ + 휆62휑)(휆63휆64휆65휆66) , where

2 3 휆61 = 248064000 + 9289996800푝푦 + 137054051200푝푦 + 1206977214560푝푦

4 5 6 + 7478912348960푝푦 + 34971140501296푝푦 + 128318338268268푝푦

7 8 9 + 376010959730232푝푦 + 885015211256668푝푦 + 1662157794116498푝푦

10 11 12 + 2441370328609961푝푦 + 2701406869519177푝푦 + 2087707108083687푝푦

13 14 15 + 998780278713489푝푦 + 258916743498697푝푦 + 37328014277531푝푦

16 17 18 19 + 3064633669073푝푦 + 140507178121푝푦 + 3282998190푝푦 + 29601000푝푦 ,

2 3 휆62 = 248064000 + 9289996800푝푦 + 136930019200푝푦 + 1201898104160푝푦

4 5 6 + 7397157964960푝푦 + 34202495888336푝푦 + 123325558158108푝푦

7 8 9 + 351945664942868푝푦 + 795449297019884푝푦 + 1400682977460974푝푦

10 11 12 + 1840758497928775푝푦 + 1631376876416498푝푦 + 656223687892950푝푦

13 14 15 − 345308229230269푝푦 − 499737648215188푝푦 − 160563926216347푝푦

16 17 18 − 23264813191808푝푦 − 1752919253641푝푦 − 69812259099푝푦

19 20 − 1360693027푝푦 − 9949430푝푦 ,

2 3 2 3 휆63 = 2(6 + 27푝푦 + 61푝푦 + 4푝푦)(10 + 10푝푦 + 69푝푦 + 9푝푦),

2 3 2 3 휆64 = (20 + 370푝푦 + 481푝푦 + 11푝푦)(38 + 304푝푦 + 517푝푦 + 23푝푦),

2 3 2 3 휆65 = (68 + 187푝푦 + 577푝푦 + 50푝푦)(80 + 136푝푦 + 601푝푦 + 65푝푦),

2 3 휆66 = 1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦.

17 Table 3: The bias of푟 ^ of Noval estimator A after correction Relationship Relationship ′ ′ 푝푦 푛 Parent 푛 Parent Unrelated Half-sibs Full-sibs -offspringUnrelated Half-sibs Full-sibs -offspring 2 0.126026 0.221154 0.222184 0.294988 9 0.000000 0.000744 0.001887 0.001259 3 0.000000 0.001043 0.002185 0.001857 10 0.000000 0.000713 0.001856 0.001196 4 0.000000 0.000973 0.002115 0.001717 11 0.000000 0.000685 0.001828 0.001140 0.1 5 0.000000 0.000914 0.002056 0.001598 12 0.000000 0.000659 0.001802 0.001089 6 0.000000 0.000863 0.002005 0.001496 13 0.000000 0.000636 0.001779 0.001042 7 0.000000 0.000818 0.001961 0.001407 14 0.000000 0.000614 0.001757 0.000999 8 0.000000 0.000779 0.001922 0.001328 15 0.000000 0.000595 0.001738 0.000960 2 0.074074 0.136228 0.153817 0.192944 9 0.000000 0.004589 0.013657 0.003496 3 0.000000 0.006392 0.015417 0.007062 10 0.000000 0.004449 0.013521 0.003219 4 0.000000 0.005896 0.014934 0.006082 11 0.000000 0.004329 0.013404 0.002983 0.3 5 0.000000 0.005506 0.014553 0.005311 12 0.000000 0.004226 0.013303 0.002779 6 0.000000 0.005200 0.014254 0.004705 13 0.000000 0.004136 0.013215 0.002601 7 0.000000 0.004955 0.014015 0.004220 14 0.000000 0.004057 0.013138 0.002444 8 0.000000 0.004755 0.013819 0.003825 15 0.000000 0.003986 0.013069 0.002305 2 0.154900 0.206527 0.203012 0.242478 9 0.000000 0.014792 0.038100 0.003105 3 0.000000 0.016599 0.039775 0.006538 10 0.000000 0.014654 0.037972 0.002843 4 0.000000 0.016115 0.039326 0.005618 11 0.000000 0.014538 0.037864 0.002622 0.5 5 0.000000 0.015718 0.038958 0.004864 12 0.000000 0.014438 0.037772 0.002432 6 0.000000 0.015406 0.038669 0.004270 13 0.000000 0.014352 0.037692 0.002268 7 0.000000 0.015157 0.038439 0.003799 14 0.000000 0.014276 0.037622 0.002124 8 0.000000 0.014957 0.038253 0.003418 15 0.000000 0.014209 0.037560 0.001997 2 0.192033 0.255932 0.240904 0.281987 9 0.000000 0.025171 0.057596 0.002408 3 0.000000 0.026588 0.058854 0.004996 10 0.000000 0.025060 0.057498 0.002206 4 0.000000 0.026222 0.058529 0.004328 11 0.000000 0.024966 0.057415 0.002034 0.6 5 0.000000 0.025911 0.058253 0.003760 12 0.000000 0.024885 0.057343 0.001887 6 0.000000 0.025663 0.058033 0.003307 13 0.000000 0.024815 0.057281 0.001759 7 0.000000 0.025464 0.057857 0.002944 14 0.000000 0.024754 0.057227 0.001648 8 0.000000 0.025303 0.057714 0.002650 15 0.000000 0.024700 0.057179 0.001549

18 Wang’s (2002) estimator

∑︀ ′ ′ Before correction, the sum of visible allele frequencies is extended to one, i.e. 푖̸=푦 푝푖 = 1, where 푝푖 푗 푝푖 ′ ∑︀ (︀ ′ )︀ is the observed allele frequency which is equal to . In addition, 푎 is defined by 푝 and 푏푖 1−푝푦 푖 푗̸=푦 푖 ∑︀ 푗 is defined by 푗̸=푦 푝푖 . If the frequencies of visible alleles are drawn from a triangular distribution, then ′ 2푖 2푖(1−푝푦 ) ′ 푝푖 = 푛′(푛′+1) and 푝푖 = 푛′(푛′+1) , therefore the values of 푎푖 and 푏푖 (1 ≤ 푖 ≤ 4) are given by

′ 푎1 = 1, 푏1 = 1 − 푝푦,

2(1 + 2푛′) 2(1 + 2푛′)(1 − 푝 )2 푎′ = , 푏 = 푦 , 2 3푛′(1 + 푛′) 2 3푛′(1 + 푛′) 2 2(1 − 푝 )3 푎′ = , 푏 = 푦 , 3 푛′(1 + 푛′) 3 푛′(1 + 푛′) 8(1 + 2푛′)(−1 + 3푛′ + 3푛′2) 8(1 + 2푛′)(−1 + 3푛′ + 3푛′2)(1 − 푝 )4 푎′ = , 푏 = 푦 . 4 15푛′3(1 + 푛′)3 4 15푛′3(1 + 푛′)3

According to Expression (13), namely the expression

′ −1 ′ ′ P = [(푓1 − 푓3)Δ + (푓2 − 푓3)휑 + 푓3] (E + M Δ), and combining Expressions (14a) with (14b), P′ can be calculated. ^ ′ ′ ^ Before correction, the estimator misuses P and 푎푖 to substitute P and 푎푖 in Expressions (5) and (12). With a mathematics software, the symbolic solution of Bias(^푟) is given as follows (it is actually a ′ function with Δ, 휑, 푝푦 and 푛 as the independent variables):

−1 Bias(^푟) = −Δ − 휑/2 + (휆1 + 2휆2Δ + 휆3휑)(휆4휆5) , where

2 ′ 2 휆1 = 16푝푦[864(−1 − 7푝푦 + 8푝푦) + 48푛 (−163 − 201푝푦 + 364푝푦)

′2 2 ′3 2 − 16푛 (629 − 1404푝푦 + 775푝푦) − 8푛 (−2803 − 3078푝푦 + 5881푝푦)

′4 2 ′5 2 + 푛 (44930 − 45918푝푦 + 988푝푦) + 푛 (−677 − 32373푝푦 + 33050푝푦)

′6 2 ′7 2 + 90푛 (−444 + 323푝푦 + 121푝푦) + 45푛 (−501 + 239푝푦 + 262푝푦)

′8 2 + 1350푛 (−2 − 5푝푦 + 7푝푦)],

2 3 ′ 2 3 휆2 = − 576(31 + 19푝푦 − 84푝푦 + 96푝푦) − 96푛 (621 − 31푝푦 − 804푝푦 + 1456푝푦)

′2 2 3 + 16푛 (−287 + 4745푝푦 − 11232푝푦 + 6200푝푦)

′3 2 3 + 16푛 (8363 − 2849푝푦 − 12312푝푦 + 23524푝푦)

′4 2 3 − 4푛 (−15251 + 74609푝푦 − 91836푝푦 + 1976푝푦)

′5 2 3 − 2푛 (51733 + 49025푝푦 − 129492푝푦 + 132200푝푦)

′6 2 3 − 15푛 (4451 − 16861푝푦 + 15504푝푦 + 5808푝푦)

′7 2 3 − 45푛 (−785 − 4793푝푦 + 1912푝푦 + 2096푝푦)

′8 2 3 ′9 − 135푛 (−379 − 539푝푦 − 400푝푦 + 560푝푦) + 18225푛 (1 + 푝푦),

2 3 ′ 2 3 휆3 = − 576(31 + 60푝푦 − 221푝푦 + 192푝푦) − 96푛 (621 + 224푝푦 − 2515푝푦 + 2912푝푦)

19 ′2 2 3 + 16푛 (−287 + 12958푝푦 − 25645푝푦 + 12400푝푦)

′3 2 3 + 16푛 (8363 + 4106푝푦 − 42791푝푦 + 47048푝푦)

′4 2 3 + 푛 (61004 − 506512푝푦 + 583324푝푦 − 15808푝푦)

′5 2 3 − 2푛 (51733 + 94960푝푦 − 307627푝푦 + 264400푝푦)

′6 2 3 − 15푛 (4451 − 23566푝푦 + 16401푝푦 + 11616푝푦)

′7 2 3 − 45푛 (−785 − 3162푝푦 − 1815푝푦 + 4192푝푦)

′8 2 3 ′9 2 − 135푛 (−379 − 2푝푦 − 1497푝푦 + 1120푝푦) − 2025푛 (−9 − 14푝푦 + 5푝푦),

′ ′2 ′3 ′4 ′5 휆4 = 2(−17856 − 59616푛 − 4592푛 + 133808푛 + 61004푛 − 103466푛

− 66765푛′6 + 35325푛′7 + 51165푛′8 + 18225푛′9),

2 3 휆5 = 1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦.

′ The graph of Bias(^푟) as a function of 푛 as the explanatory variable and with 푝푦 = 0.6 is shown below, where the green (yellow, pink or blue) curve is the bias of푟 ^ for parent-offspring (full-sibs, half-sibs or nonrelatives):

0.4

0.2

2 4 6 8 10 12 14

0.2

0.4

Corrected Wang’s (2002) estimator

After correction, the approximate form P′ ≈ E* + M*Δ is used for estimation, and Δ^ is

Δ^ = [(M*)푇 (V*)−1M*]−1(M*)푇 (V*)−1(P^ ′ − E*).

Substituting P^ ′ by P′ (where P′ can be obtained by Expression (13)), the bias of푟 ^ can be solved, which is independent on 푛′: 2 Δ(1 + 푝푦) + 휑(1 + 푝푦 − 푝푦)/2 Bias(^푟) = −Δ − 휑/2 + 2 3 . 1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦

20 Thomas’s (2010) estimator

The bias of the TH estimator can be calculated using the same method as the WA estimator, and the expression of Bias(^푟) is as follows:

−1 Bias(^푟) = −Δ − 휑/2 + (휆1 + 2휆2Δ + 휆3휑)(휆4휆5) , where

′ ′2 ′3 ′4 휆1 = 48푝푦(−1 + 푝푦)(−2 + 3푛 + 4푝푦)(4 − 11푛 + 5푛 + 10푛 ),

2 3 ′ 2 휆2 = − 48(3 + 7푝푦 − 12푝푦 + 8푝푦) − 8푛 (37 + 푝푦 + 36푝푦)

′2 2 3 ′3 2 3 + 24푛 (9 + 31푝푦 − 66푝푦 + 44푝푦) + 푛 (494 − 538푝푦 + 1512푝푦 − 480푝푦)

′4 2 3 ′5 2 ′6 − 15푛 (9 + 17푝푦 − 72푝푦 + 64푝푦) − 90푛 (1 − 7푝푦 + 8푝푦) + 135푛 (1 + 푝푦),

2 3 ′ 2 휆3 = − 48(3 + 10푝푦 − 23푝푦 + 16푝푦) − 8푛 (37 − 54푝푦 + 91푝푦)

′2 2 3 ′3 2 3 + 24푛 (9 + 55푝푦 − 134푝푦 + 88푝푦) + 푛 (494 − 1788푝푦 + 3242푝푦 − 960푝푦)

′4 2 3 ′5 2 ′6 2 − 15푛 (9 + 44푝푦 − 163푝푦 + 128푝푦) − 90푛 (1 − 12푝푦 + 13푝푦) + 135푛 (1 + 푝푦),

2 3 ′ 휆4 = 2[1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦](2 + 3푛 ),

′ ′2 ′3 ′4 ′5 휆5 = (−72 − 40푛 + 168푛 − 5푛 − 60푛 + 45푛 ).

′ The graph of Bias(^푟) as a function with 푛 as the explanatory variable and with 푝푦 = 0.6 is shown below, where the green (yellow, pink or blue) curve is the bias of푟 ^ for parent-offspring (full-sibs, half-sibs or nonrelatives):

0.2

2 4 6 8 10 12 14

0.2

0.4

Corrected Thomas’s (2010) estimator

The expression of Bias(^푟) of the corrected Thomas’s (2010) estimator is identical to Wang’s (2002) estimator.

21 Noval estimator B

The bias of the TH estimator can be calculated using the same method as the WA estimator, and the expression of Bias(^푟) is simpler than the WA estimator:

−1 Bias(^푟) = −Δ − 휑/2 + (휆1 + 2휆2Δ + 휆3휑)휆4 , where

2 휆1 = 16푝푦(−1 + 푝푦),

′ 2 휆2 = [3푛 − 2(1 − 2푝푦) ](1 + 푝푦),

′ 2 3 휆3 = 3푛 (1 + 푝푦) + 2(−1 + 푝푦 + 6푝푦 − 8푝푦),

′ 2 3 휆4 = 2(−2 + 3푛 )[1 + 푝푦 + (−1 + Δ)푝푦 + (−1 + Δ + 휑)푝푦].

′ The graph of Bias(^푟) as a function with 푛 as the explanatory variable and with 푝푦 = 0.6 is shown below, where the green (yellow, pink or blue) curve is the bias of푟 ^ for parent-offspring (full-sibs, half-sibs or nonrelatives):

0.4

0.2

2 4 6 8 10 12 14

0.2

0.4

0.6

Corrected Noval estimator B

The expression of Bias(^푟) of the corrected Noval estimator B is identical to Wang’s (2002) estimator.

22 A summary of expressions of various estimators

Lynch and Ritland’s (1999) estimator

Homozygote: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤ 2 2 2 Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖) 푝푖 1 − 푝푖 푝푖 − 푝푖 Δ ⎣ ⎦ = ⎣ ⎦ + ⎣ ⎦⎣ ⎦ ; Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푖) 2푝푖푝푥 −2푝푖푝푥 푝푥 − 2푝푖푝푥 휑 Heterozygote: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 2 1 2 Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗) 푝푖 −푝푖 2 푝푖 − 푝푖 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2 ⎥ ⎢ 2 1 2 ⎥ ⎢ Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 푝푗 ⎥ ⎢ −푝푗 2 푝푗 − 푝푗 ⎥⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Δ ⎢ ⎥ = ⎢ ⎥ + ⎢ 1 1 ⎥ . ⎢ Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 2푝푖푝푗 ⎥ ⎢ 1 − 2푝푖푝푗 2 푝푖 + 2 푝푗 − 2푝푖푝푗 ⎥⎣ ⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 휑 ⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 2푝푖푝푥 ⎥ ⎢ −2푝푖푝푥 2 푝푥 − 2푝푖푝푥 ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 1 Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗) 2푝푗푝푥 −2푝푗푝푥 2 푝푥 − 2푝푗푝푥

Corrected Lynch and Ritland’s (1999) estimator

Homozygote: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤ ′ ′ ′ ′ ′ ′ 푝푖(푝푖+푝푦 )(2−푝푦 )+2푝푦 (푝푦 +2푝푖) Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푖) 휆1 1 − 휆1 (푝 +2푝 )(2−푝 ) − 휆1 Δ ⎣ ⎦ ≈ ⎣ ⎦ + ⎣ 푖 푦 푦 ⎦⎣ ⎦ , ′ ′ ′ ′ ′ ′ 푝푖푝푥(2−푝푦 )+2푝푥푝푦 Pr(퐺 = 퐴 퐴 | 퐺 = 퐴 퐴 ) 휆2 −휆2 − 휆2 휑 푦 푖 푥 푥 푖 푖 (푝푖+2푝푦 )(2−푝푦 ) 2 푝푖 + 2푝푖푝푦 2푝푖푝푥 where 휆1 = 2 and 휆2 = 2 ; 1 − 푝푦 1 − 푝푦 Heterozygote: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ′ ′ ′ ′ ′ ′ 푝푖+푝푦 Pr(퐺푦 = 퐴푖퐴푖 | 퐺푥 = 퐴푖퐴푗) 휆1 −휆1 2 − 휆1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ′ ′ ′ ′ ′ ′ ⎥ ⎢ ⎥ ⎢ 푝푗 +푝푦 ⎥ ⎢ Pr(퐺푦 = 퐴푗퐴푗 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 휆2 ⎥ ⎢ −휆2 2 − 휆2 ⎥⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Δ ⎢ ′ ′ ′ ′ ′ ′ ⎥ ≈ ⎢ ⎥ + ⎢ 푝푖+푝푗 ⎥ , ⎢ Pr(퐺푦 = 퐴푖퐴푗 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 휆3 ⎥ ⎢ 1 − 휆3 2 − 휆3 ⎥⎣ ⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 휑 ⎢ ′ ′ ′ ′ ′ ′ ⎥ ⎢ ⎥ ⎢ 푝푥 ⎥ ⎢ Pr(퐺푦 = 퐴푖퐴푥 | 퐺푥 = 퐴푖퐴푗) ⎥ ⎢ 휆4 ⎥ ⎢ −휆4 2 − 휆4 ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ′ ′ ′ ′ ′ ′ 푝푥 Pr(퐺푦 = 퐴푗퐴푥 | 퐺푥 = 퐴푖퐴푗) 휆5 −휆5 2 − 휆5 2 2 푝푖 + 2푝푖푝푦 푝푗 + 2푝푗푝푦 2푝푖푝푗 2푝푖푝푥 2푝푗푝푥 where 휆1 = 2 , 휆2 = 2 , 휆3 = 2 , 휆4 = 2 , 휆5 = 2 . 1 − 푝푦 1 − 푝푦 1 − 푝푦 1 − 푝푦 1 − 푝푦

Noval estimator A

Homozygote ⎡ ⎤ ⎡ ⎤ 2 2 2 ⎡ ⎤ ⎡ ⎤ 푝푖 ⎡ ⎤ 1 − 푝푖 푝푖 − 푝푖 ⎡ ⎤ E(푆) 1 3 1 ⎢ ⎥ 1 3 1 ⎢ ⎥ Δ 4 2 ⎢ ⎥ 4 2 ⎢ ⎥ ⎣ ⎦ = ⎣ ⎦⎢ 2푝푖푝푥 ⎥ + ⎣ ⎦⎢ −2푝푖푝푥 푝푥 − 2푝푖푝푥 ⎥⎣ ⎦ ; E(푆2) 1 9 1 ⎣ ⎦ 1 9 1 ⎣ ⎦ 휑 16 4 0 16 4 0 0 Heterozygote: ⎡ ⎤ ⎡ ⎤ 1 ⎡ ⎤ ⎡ ⎤ 2푝푖푝푗 ⎡ ⎤ 1 − 2푝푖푝푗 2 (푝푖 + 푝푗) − 2푝푖푝푗 ⎡ ⎤ E(푆) 1 3 1 ⎢ ⎥ 1 3 1 ⎢ ⎥ Δ 4 2 2 2 4 2 2 2 1 2 2 ⎣ ⎦ = ⎣ ⎦⎢ 푝 + 푝 ⎥+⎣ ⎦⎢ −푝 − 푝 (푝푖 + 푝푗) − 푝 − 푝 ⎥⎣ ⎦ . 2 9 1 ⎢ 푖 푗 ⎥ 9 1 ⎢ 푖 푗 2 푖 푗 ⎥ E(푆 ) 1 16 4 ⎣ ⎦ 1 16 4 ⎣ ⎦ 휑 2푝푥(푝푖 + 푝푗) −2푝푥(푝푖 + 푝푗) 푝푥 − 2푝푥(푝푖 + 푝푗)

23 Corrected Noval estimator A

Homozygote: ⎡ ⎤ ⎡ ⎤ 푝푖(푝푖+푝푦 )(2−푝푦 )+2푝푦 (푝푦 +2푝푖) ⎡ ⎤ ⎡ ⎤ 휆1 ⎡ ⎤ 1 − 휆1 (푝 +2푝 )(2−푝 ) − 휆1 ⎡ ⎤ E(푆) 1 3 1 ⎢ ⎥ 1 3 1 ⎢ 푖 푦 푦 ⎥ Δ ≈ 4 2 ⎢ ⎥ + 4 2 ⎢ 푝푖푝푥(2−푝푦 )+2푝푥푝푦 ⎥ , ⎣ ⎦ ⎣ ⎦⎢ 휆2 ⎥ ⎣ ⎦⎢ −휆2 (푝 +2푝 )(2−푝 ) − 휆2 ⎥⎣ ⎦ E(푆2) 1 9 1 ⎣ ⎦ 1 9 1 ⎣ 푖 푦 푦 ⎦ 휑 16 4 0 16 4 0 0

2 푝푖 + 2푝푖푝푦 2푝푖푝푥 where 휆1 = 2 and 휆2 = 2 . 1 − 푝푦 1 − 푝푦 Heterozygote: ⎡ ⎤ ⎡ ⎤ 푝푖+푝푗 ⎡ ⎤ ⎡ ⎤ 휆1 ⎡ ⎤ 1 − 휆1 2 − 휆1 ⎡ ⎤ E(푆) 1 3 1 ⎢ ⎥ 1 3 1 ⎢ ⎥ Δ 4 2 4 2 푝푖+푝푗 ⎣ ⎦ ≈ ⎣ ⎦⎢ 휆2 ⎥ + ⎣ ⎦⎢ −휆2 + 푝푦 − 휆2 ⎥⎣ ⎦ , 2 9 1 ⎢ ⎥ 9 1 ⎢ 2 ⎥ E(푆 ) 1 16 4 ⎣ ⎦ 1 16 4 ⎣ ⎦ 휑 휆3 −휆3 푝푥 − 휆3 2 2 2푝푖푝푗 푝푗 + 2푝푗푝푦 + 푝푖 + 2푝푖푝푦 2푝푥(푝푖 + 푝푗) where 휆1 = 2 , 휆2 = 2 , 휆3 = 2 . 1 − 푝푦 1 − 푝푦 1 − 푝푦

Wang’s (2002) estimator ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Pr(푆 = 1) 휆1 1 − 휆1 푎2 − 휆1 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Δ ⎢ 3 ⎥ = ⎢ ⎥ + ⎢ ⎥ , ⎢ Pr(푆 = 4 ) ⎥ ⎢ 휆2 ⎥ ⎢ −휆2 2푎2 − 2푎3 − 휆2 ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 휑 1 Pr(푆 = 2 ) 휆3 −휆3 1 − 3푎2 + 2푎3 − 휆3

2 2 where 휆1 = 2푎2 − 푎4, 휆2 = 4푎3 − 4푎4 and 휆3 = 4푎2 − 4푎2 − 8푎3 + 8푎4.

Corrected Wang’s (2002) estimator ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ′ −1 2 −1 2 Pr(푆 = 1) 휆1 푓1 (푏1 + 2푝푦푏1) − 휆1 푓2 (푏1푏2 + 푝푦푏1 + 3푝푦푏2) − 휆1 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Δ ⎢ ′ 3 ⎥ ≈ ⎢ ⎥ + ⎢ −1 2 ⎥ , ⎢ Pr(푆 = 4 ) ⎥ ⎢ 휆2 ⎥ ⎢ −휆2 푓2 (2푏1푏2 − 2푏3 + 2푝푦푏1 − 2푝푦푏2) − 휆2 ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 휑 ′ 1 −1 3 Pr(푆 = 2 ) 휆3 −휆3 푓2 (푏1 − 3푏1푏2 + 2푏3) − 휆3 where

−1 2 2 휆1 = 푓3 (2푏2 − 푏4 + 4푝푦푏2 + 4푝푦푏3),

−1 휆2 = 푓3 (4푏1푏3 − 4푏4 + 8푝푦푏1푏2 − 8푝푦푏3),

−1 2 2 휆3 = 푓3 (4푏1푏2 − 8푏1푏3 − 4푏2 + 8푏4).

Thomas’s (2010) estimator ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤ Pr(푆 = 1) 휆1 1 − 휆1 푎2 − 휆1 Δ ⎣ ⎦ = ⎣ ⎦ + ⎣ ⎦⎣ ⎦ , (︀ 3 1 )︀ Pr 푆 = 4 or 2 휆2 −휆2 1 − 푎2 − 휆2 휑

2 2 where 휆1 = 2푎2 − 푎4 and 휆2 = 4푎2 − 4푎2 − 4푎3 + 4푎4.

24 Corrected Thomas’s (2010) estimator ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤ ′ −1 2 −1 2 Pr(푆 = 1) 휆1 푓1 (푏1 + 2푝푦푏1) − 휆1 푓2 (푏1푏2 + 푝푦푏1 + 3푝푦푏2) − 휆1 Δ ⎣ ⎦ ≈ ⎣ ⎦ + ⎣ ⎦⎣ ⎦ , (︀ ′ 3 1 )︀ −1 2 3 Pr 푆 = 4 or 2 휆2 −휆2 푓2 (2푝푦푏1 − 2푝푦푏2 − 푏1푏2 + 푏1) − 휆2 휑 −1 2 2 −1 2 2 where 휆1 = 푓3 (2푏2 − 푏4 + 4푝푦푏2 + 4푝푦푏3) and 휆2 = 푓3 (8푝푦푏1푏2 − 8푝푦푏3 + 4푏1푏2 − 4푏1푏3 − 4푏2 + 4푏4).

Noval estimator B ⎛⎡ ⎤ ⎡ ⎤ ⎞ ⎡ ⎤ ⎡ ⎤ 휆1 1 − 휆1 푎2 − 휆1 ⎡ ⎤ 3 1 E(푆) 1 4 2 ⎜⎢ ⎥ ⎢ ⎥ Δ ⎟ ⎣ ⎦ = ⎣ ⎦⎜⎢ 휆2 ⎥ + ⎢ −휆2 2푎2 − 푎3 − 휆2 ⎥⎣ ⎦⎟ , 2 9 1 ⎜⎢ ⎥ ⎢ ⎥ ⎟ E(푆 ) 1 16 4 ⎝⎣ ⎦ ⎣ ⎦ 휑 ⎠ 휆3 −휆3 1 − 3푎2 + 2푎3 − 휆3

2 2 where 휆1 = 2푎2 − 푎4, 휆2 = 4푎3 − 4푎4 and 휆3 = 4푎2 − 4푎2 − 8푎3 + 8푎4.

Corrected Noval estimator B ⎛⎡ ⎤ ⎡ ⎤ ⎞ −1 2 ⎡ ⎤ ⎡ ⎤ 휆1 푓 (푏 + 2푝푦푏1) − 휆1 휇1 − 휆1 ⎡ ⎤ ′ 3 1 1 1 E(푆 ) 1 4 2 ⎜⎢ ⎥ ⎢ ⎥ Δ ⎟ ⎣ ⎦ ≈ ⎣ ⎦⎜⎢ 휆2 ⎥ + ⎢ −휆2 휇2 − 휆2 ⎥⎣ ⎦⎟ , ′2 9 1 ⎜⎢ ⎥ ⎢ ⎥ ⎟ E(푆 ) 1 16 4 ⎝⎣ ⎦ ⎣ ⎦ 휑 ⎠ 휆3 −휆3 휇3 − 휆3 where

−1 2 2 휆1 = 푓3 (2푏2 − 푏4 + 4푝푦푏2 + 4푝푦푏3),

−1 휆2 = 푓3 (4푏1푏3 − 4푏4 + 8푝푦푏1푏2 − 8푝푦푏3),

−1 2 2 휆3 = 푓3 (4푏1푏2 − 8푏1푏3 − 4푏2 + 8푏4),

−1 2 휇1 = 푓2 (푏1푏2 + 푝푦푏1 + 3푝푦푏2),

−1 2 휇2 = 푓2 (2푏1푏2 − 2푏3 + 2푝푦푏1 − 2푝푦푏2),

−1 3 휇3 = 푓2 (푏1 − 3푏1푏2 + 2푏3).

Literature Cited

Anderson, A. D., and B. S. Weir, 2007 A maximum-likelihood method for the estimation of pairwise relatedness in structured populations. Genetics 176: 421–440.

Lynch, M., and K. Ritland, 1999 Estimation of pairwise relatedness with molecular markers. Genetics 152: 1753–1766.

Thomas, S. C., 2010 A simplified estimator of two and four gene relationship coefficients. Molecular Ecology Resources 10: 986–994.

Wang, J. L., 2002 An estimator for pairwise relatedness using molecular markers. Genetics 160: 1203– 1215.

25