Definition and Estimation of Higher-Order Gene Fixation Indices
Total Page:16
File Type:pdf, Size:1020Kb
Copyright 0 1987 by the Genetics Society of America Definition and Estimation of Higher-Order Gene Fixation Indices Kermit Ritland Department of Botany, University of Toronto, Toronto, M5S IAl Canada Manuscript received March 26, 1987 Revised copy accepted August 15, 1987 ABSTRACT Fixation indices summarize the associations between genes that arise from the joint effects of inbreeding and selection. In this paper, fixation indices are derived for pairs, triplets and quadruplets of genes at a single multiallelic locus. The fixation indices are obtained by dividing cumulants by constants; the cumulants describe the statistical distribution of alleles and the constants are functions of gene frequency. The use of cumulants instead of moments is necessary only for four-gene indices, when the fourth cumulant is used. A second type of four-gene index is also required, and this index is based upon the covariation of second-order cumulants. At multiallelic loci, a large number of indices is possible. If alleles are selectively neutral, the number of indices is reduced and the relationship between gene identity and gene cumulants is shown.-Two-gene indices can always be estimated from genotypic frequency data at a single polymorphic locus. Three-gene indices are also estimable except when allele frequency equals one-half. Four-gene indices are not estimable unless selection is assumed to have an equal effect upon each allele (such as under selective neutrality) and the locus contains at least three alleles of unequal frequency. For diallelic or selected loci, an alternative four- gene fixation index is proposed. This index incorporates both types of four-gene associations but cannot be related to gene identity. HE association of two genes at a locus can be locus requires measures based upon four genes. Fi- T measured either by the probability of allelic nally, higher-order fixation indices may be useful in identity-by-descent (HARRIS1964; COCKERHAM197 1) selection models that incorporate such population or by fixation indices based upon the covariance of structure. allelic values (WRIGHT 1922, 1969; COCKERHAM The concept of gene identity, as conceived by 1969; WEIR 1970). These two genes are usually con- MALECOT(1 948), has become extremely useful for sidered as residing in a single diploid individual, or as any problem that requires a measure of genetic relat- having been chosen randomly from each of two dip- edness. GILLOIS(1 965, 1966), COCKERHAM(197 l), loid individuals. If one needs to consider more than JACQUARD (1974) and CANNINGSand THOMPSON two genes, computations of gene identity are relatively (1981), and others, have used gene identity in many easily extended to an arbitrary number of genes (CAN- applications in populations and quantitative genetics. NINGS and THOMPSON1981). A three-gene fixation Genes are identical by descent if they are all copies of index, based upon the third moment of three genes an ancestral allele. However, when selection is pres- at a diallelic locus, was derived for characterizing ent, genotypic associations cannot be strictly inter- mating systems under nonrandom outcrossing and preted with gene identity coefficients. In addition, selection (RITLAND1985). The extension of fixation estimation of gene identity requires inferences about indices to four genes, and the general multiallelic case the ancestral population to which gene identity is for either three or four genes, has remained unde- relative. scribed. Alternatively, without making prior assumptions Measures of higher-order gene associations are use- about selection and without the need to determine ful in several ways. They can be used to find covari- the relativity of the measure, we can measure the ances of inbred relatives (GILLOIS 1965, 1966). contemporary associations of alleles in terms of covar- Models based upon higher-order indices of association iances, moments or cumulants of a distribution of are useful in the analyses of gene frequencies (COCK- allelic values. These statistical measures incorporate ERHAM 1971). In natural populations of plants, spatial both the effects of inbreeding and natural selection, associations of genotypes often occur because of ge- and are relative to the easily measured, contemporary netic drift, selection and restricted gene flow. If selfing gene frequencies. or biparental inbreeding also occurs, associations may This paper derives, in terms of gene cumulants, all develop among pairs of inbred individuals, and the two-, three-, and four-gene fixation indices for mul- proper characterization of these associations at one tiallelic loci. The attainable space of fixation indices Genetics 117: 783-793 (December, 1987) 784 K. Ritland and the estimation variance of higher-order gene fix- The matrix of random variables consisting of the ation indices are also briefly examined. If alleles are four vectors is a single observation that follows a selectively neutral, most fixation indices derived here multivariate multinomial distribution. It is empha- can be related to gene identity coefficients, thus giving sized that the distribution function may differ between alternative gene cumulant definitions to higher-order observations. This distinction is necessary only for gene identity coefficients. four genes, but for consistency, is kept for lower- order cases as well. DEFINITION OF GENE FIXATION INDICES Two genes: First, consider genes a and b. An obser- The statistical approach for describing associations vation consists of the two alleles of both genes a and of genotypes specifies the distribution of genotypes in 6. Observations are distributed as bivariate-multinom- terms of the cumulants of a multivariate, multinomial ial with first-order cumulants denoted as K, and Kj and distribution. A sufficient number of parameters are second-order cumulants denoted as K, (i,j = 1, . ., n). introduced such that expected frequencies of all pos- Cumulants are a set of descriptive constants of a sible genotypes are specified. This method “saturates” distribution which are useful for measuring its prop- or even “oversaturates” (in the case of four genes) the erties, and in our circumstance, for specifying it. degrees of freedom available in the data. These cumulants are written in boldface to emphasize In the following, we will sequentially consider the that their values vary among observations, and as such cases of two, three and four genes at a single locus. are random variables. For one observation, the probability of observing Each gene has n alleles with respective frequencies cl, cp, . ., c, (c as in cumulant). Since we are considering allele i for gene a is E[K,],the probability of observing one locus, these alleles are shared among genes and allelej for gene b is E[K,],and the probability ofjointly are of equal frequency among genes. At this single observing alleles i andj is E[K,K,]+ E[K,], where E[ ] is the “expectation” operator. Cumulants about the locus, the four genes are denoted as a, b, c and d. For notational convenience, we always observe allele i at mean of order one, two and three equal the corre- sponding moments about the mean. However, cumu- gene a, allelej at gene b, allele k at gene c, and allele I at gene d, even though the same allele may be lants of order four (used in the four-gene case below) do not equal the corresponding moments of order observed at different genes (in which case i = j for example). four. Thus the assignment of alleles to genes at one locus Since these cumulants may vary among observa- is as follows: tions, the population frequency$, of allele i for gene a and allele j for gene b is the double expectation, Gene a - allele i (i = 1, . ., n) taken first of single observations then taken among Gene b - allelej (j= 1, ..., n) observations, Gene c - allele k (k = 1, . .., n) Gene d - allele 1 (1 = 1, ..., n) There are several specific situations to which our where c, is the mean of K, (and is the frequency of treatment applies. For example, if we consider only allele i), cJ the mean of Kj, and cy the mean of K, (and is the expected covariance between alleles and j). two genes, genes a and b can be the homologous genes i of one diploid individual. For three genes, gene c The covariance between K, and K~ is assumed to be could additionally be the gamete allele contributed by zero. the mate of the first individual. For four genes, genes To characterize the deviation of genotypic associa- tions from Hardy-Weinberg proportions, define the c and d could additionally be the homologous genes of a second diploid individual. Other situations exist following gene fixation indices (WRIGHT1922, 1969; as well. WEIR 1970) To describe the presence of alleles in genes mathe- matically, first consider gene a. For gene a, define a vector of Bernoulli random variables AI, Az, ..., A, such that if allele i is present, A, = 1 and all other A where the denominator term d, is are zero. In other words, if allele i is present, the random vector equals (0, 0, . ., 0, 1, 0, . .., O), wherein dy = 6tjcz - cicj (2) the ith term equals one. Likewise, for gene b, introduce a second vector of with 6, as the Kronecker operator (6, = 1 if i =j, 6, = random variables, BI,BP, . ., B,, . ., B,, defined such 0 otherwise). The denominator term d, is the maxi- that if allele j is present for gene b, BJ = 1 and all mum value that the covariance clJ can take. For ex- other B, are zero. For genes c and d, the corresponding ample, if i =j then d, = c,(l - c,), or if i #j then d, = vectors C1, ..., Ck, ..., C, and D,, ..., Dl, ..., D, are -c,cJ. defined in the same way. Defining the fixation indices in this way enables us Higher-Order Gene Fixation Indices 785 to specify the ordered genotypic frequencies for genes For this three-gene index, the denominator term dqk a and b as is j.= cicj + d..F.