
Phylogenetic Inference under the Pure Drift Model Shizhong Xu, * William R. Atchley, * and Walter M. Fitch? *Center for Qua ntitative Genetics, Department of Genetics, North Carolina State University; and TDepartment of Ecology and Evolutionary Biology, University of California, Irvine When pairwise genetic distances are used for phylogenetic reconstruction, it is usually assumed that the genetic distance between two taxa contains information about the time after the two taxa diverged. As a result, upon an appropriate transformation if necessary, the distance usually can be fitted to a linear model such that it is expressed as the sum of lengths of all branches that connect the two taxa in a given phylogeny. This kind of distance is referred to as “additive distance.” For a phylogenetic tree exclusively driven by random genetic drift, genetic distances related to coancestry coefficients (6x, ) between any two taxa are more suitable. However, these distances are fundamentally different from the additive distance in that coancestry does not contain any information about the time after two taxa split from a common ancestral population; instead, it reflects the time before the two taxa diverged. In other words, the magnitude of OxY provides information about how long the two taxa share the same evolutionary pathways. The fundamental difference between the two kinds of distances has led to a different algorithm of evaluating phylogenetic trees when 8 Xv and related distance measures are used. Here we present the new algorithm using the ordinary-least-squares approach but fitting to a different linear model. This treatment allows genetic variation within a taxon to be included in the model. Monte Carlo simulation for a rooted phylogeny of four taxa has verified the efficacy and consistency of the new method. Application of the method to human population was demonstrated. Introduction Random genetic drift is an important evolutionary selection is stronger than commonly observed in natural force. It has been argued that, in natural populations, populations, it is inefficient in countering drift when population size is sufficiently large that drift could be population sizes are on the order of 100 or fewer (Lacy ignored compared with other evolutionary forces such 1987). Random genetic drift is also considered to be as selection and mutation (Fisher 1958, pp. 22-5 1). In important in determining the variation in gene frequen- inbred strains of mice, rats, guinea pigs, and some plants, cies in man (Cavalli-Sforza et al. 1964; Edwards and for example, the population size is so small and the evo- Cavalli-Sforza 1964; Cavalli-Sforza 1966; Cavalli-Sforza lutionary history so short that variation in allelic fre- and Edwards 1967 ) . quencies among inbred strains must have been predom- A commonly used measurement of divergence in inantly driven by random drift or allelic fixation ( Atchley gene frequencies caused by random genetic drift is and Fitch 199 1, 1993 ). Thus, patterns of genetic diver- Wright’s FST (Wright 1943, 195 1, 1965 ) . &statistics were gence observed among inbred strains result from random originally derived from a population-genetics perspective segregation of the original heterozygosity of the founding and it was assumed that an infinite number of popula- stocks. tions diverged at the same time from a common ancestral For captive populations, genetic drift is the over- population. From a phylogenetic perspective, the coan- riding factor controlling the loss of heterozygosity. Mu- cestry coefficient (&), which is another Fs,-related tation has no noticeable effect on populations of size measurement of population divergence, seems more ap- typically managed in zoos and nature preserves. Unless propriate. Within a population, it is defined as the prob- Key words: phylogeny, genetic drift, coancestry coefficient, genetic ability that a random gene from one individual is iden- distance, reduction of heterozygosity. tical by descent to a random gene from another Address for correspondence and reprints: Shizhong Xu, Center individual ( Kempthorne 1969, pp. 72-80; Falconer for Quantitative Genetics, Department of Genetics, North Carolina 1980, pp. 80-83). Between two populations exu is de- State University, Raleigh, North Carolina 2769576 14. fined as the probability that a random pair of genes, one Mol. Biol. Evol. 11(6):949-960. 1994. from each population, are identical by descent. Appro- 0 1994 by The University of Chicago. All rights reserved. 0737-4038/94/l 106-00 13$02.00 priate transformations of the coancestry coefficients can 949 950 Xu et al. be treated as genetic distances for use in phylogenetic X inference. However, this measure of genetic distance may not be additive, which is assumed by the Fitch-Margolish method (Fitch and Margolish 1967) and other phylogeny inferring algorithms. Further, there are no phylogeny- t inferring algorithms available that incorporate inbreed- AB ing coefficients like 8 xy. To circumvent this problem, a A phylogeny-inferring algorithm using character data such as the parsimony method may be used. Recently, Atchley and Fitch ( 1993) introduced the concept of loss parsi- mony to describe the segregation and random fixation of alleles under systematic brother-sister mating. These authors used an inverted Camin-Sokal algorithm to find trees that minimize allele loss. However, irreversibility FIG.1 .-The rooted tree for two taxa used as an example in the of allele loss is only a qualitative prediction of random text. genetic drift and the loss parsimony model fails to in- corporate appropriate quantitative predictions from is the mean inbreeding coefficient of population B. We population-genetics theory. A maximum-likelihood use H throughout to represent the expected heterozy- (ML) method under the pure drift model could, in prin- gosity. Estimated heterozygosity will be discussed later. ciple, incorporate all the pertinent quantitative predic- Equation ( 1) allows the time before divergence to be tions inherent to genetic drift (Felsenstein 1973, 198 1). inferred from the existing heterozygosity of node B as The drawback of an ML method is that it involves ex- tensive computing if explicit solutions are not possible. In addition, lack of knowledge on the exact joint prob- tAB = [~~g(~B)-~~g(~A)l/1~~[~-1/(2N,)l. (2) ability distribution of the data will decrease the credibility Let Hx and Hy be the expected heterozygosities of of ML. the terminal population X and Y, respectively, at the In this research, we first introduce a class of 8xy- time when data are sampled. Because of the relationships related measurements of genetic distances. These genetic distances, after appropriate transformation, are then used Hx = H~[l-l/(2N,)]~~‘ to infer phylogenetic relationships among taxa. and The Pure Drift Model Consider a finite population with effective popu- Hy = HB[ l-1 /(2Ne)lzBy, lation size N, isolated from an infinite, random-mating population in Hardy-Weinberg equilibrium (denoted by tBX and tBy can be inferred by A as in fig. 1). Assume effective population size did not change through time and at generation tAB the popula- tBx = [log(Hx)-log(HB)]/log[l-1/(2x)] (3) tion (denoted by B) was split into two lineages, X and Y, each of which had the same effective population size and N,. Populations X and Y have independent histories of genetic drift for t BX and tgy generations, reSpeCtiVdy. tBY = [log(HY)-log(HB)l/los[1-l/(2N,)1, (4) Suppose that the heterozygosity of a neutral locus in the infinite ancestral population was HA. When the finite respectively. If generation intervals for the two lineages population (population B) was split, the expected het- were the same and data were sampled at the same time, erozygosity was expressed by HB. If both HA and HB are tBX should equal t BY. However, this is not a requirement tAB, in generations, by known, one can infer the time of the phylogenetic methods being described here. the following formula: The number of heterozygotes in populations X and ( 1 ) Y can be obtained by counting individual genotypes. HB = HA( l-FIAB) = HA[I-~/(XV~)]~*~, Observed frequencies of heterozygotes are then used to where estimate Hx and Hy, denoted by fix and fiy, respec- tively. Unfortunately, HB is an unobservable historical FIAB= 1 - [1-1/(2N,)]‘AB event that cannot be simply counted. What we want is rnylogenetic merence unaer vnn y3 I to obtain an estimate of HB from the observed data sam- inbreeding coefficient, as H = Ho( 1 -F), where Ho is pled from X and Y. the heterozygosity of the panmictic base population. Therefore, instead of estimating F or Oxy, we may es- Estimation of Heterozygosity of an Internal Node timate the heterozygosity. The number of heterozygotes in populations X and We now propose an unbiased estimator of HB, the Y (the terminal nodes) can be obtained from the actual heterozygosity of the internal node (see fig. 1), using counts by examining individual genotypes. However, observed allele frequencies for a locus of interest from individuals were assumed to mate randomly within each taxa X and Y. Under the drift model, their expected population so that the heterozygosity of each terminal values are the same as that in the initial base population. node can be estimated by the so-called gene diversity An unbiased estimate of the heterozygosity two gener- (Nei 1976, pp. 723-765). Let xi and yi denote the ith ations after node B is proposed as allele frequencies (observed) for taxa X and Y, respec- tively, where i = 1, 2, . , n for a locus with n allelic 1 - i xiyi. states. The heterozygosities of X and Y are then esti- Dxy = (5) i=l mated by This heterozygosity estimator, denoted by Dxy , is a ge- netic distance. The unbiased property of equation (5) H~=l-~xf is proved easily by showing i= 1 and E[Dxul = HA(~-~xY)- (64 As indicated before, exy = F,,,,,; hence, E[Dxul = HA( l-r;;,,,,).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages12 Page
-
File Size-