Fractional Coalescent

Fractional coalescent Somayeh Mashayekhia,1 and Peter Beerlia aDepartment of Scientific Computing, Florida State University, Tallahassee, FL 32306 Edited by Scott V. Edwards, Harvard University, Cambridge, MA, and approved February 15, 2019 (received for review June 21, 2018) An approach to the coalescent, the fractional coalescent (f- model than the n-coalescent to describe the variability of this coalescent), is introduced. The derivation is based on the discrete- dataset. time Cannings population model in which the variance of the number of offspring depends on the parameter α. This additional Motivation parameter α affects the variability of the patterns of the waiting It is common to assume that, within a population, all individu- times; values of α<1 lead to an increase of short time intervals, als are affected in the same way by the environment (3–7, 9–12). but occasionally allow for very long time intervals. When α = 1, Neglecting this heterogeneity may lead to biased parameter esti- the f-coalescent and the Kingman’s n-coalescent are equivalent. mates. Development of multiple-merger coalescence focused on The distribution of the time to the most recent common ancestor either strong selection (24) or large offspring variance (25); both and the probability that n genes descend from m ancestral genes could be induced by environmental heterogeneity. But, these in a time interval of length T for the f-coalescent are derived. approaches do not allow estimating a parameter that reflects this The f-coalescent has been implemented in the population genetic heterogeneity. The f -coalescent allows nonexponential waiting model inference software MIGRATE. Simulation studies suggest times; therefore, it should be able not only to handle datasets that it is possible to accurately estimate α values from data that generated under such conditions, but also give estimates about were generated with known α values and that the f-coalescent the magnitude of this heterogeneity. can detect potential environmental heterogeneity within a population. Bayes factor comparisons of simulated data with α<1 Model and real data (H1N1 influenza and malaria parasites) showed an We derive the f -coalescent based on the nest-site model which improved model fit of the f-coalescent over the n-coalescent. The was introduced by Wakeley (26). We included the derivation of development of the f-coalescent and its inclusion into the infer- the f -coalescent from the discrete Cannings model (SI Appendix, EVOLUTION ence program MIGRATE facilitates testing for deviations from the section B) and an alternative derivation of the f -coalescent as a n-coalescent. semi-Markov process, in an equivalent way as the n-coalescent emerges as a continuous-time Markov process (SI Appendix, sec- coalescent j fractional calculus j population genetics j Bayesian inference j tion C). Since we compare the f -coalescent with the Kingman’s environmental heterogeneity n-coalescent, we have included a derivation of Kingman’s n- coalescent for the Wright–Fisher and the Cannings model in SI n 1982, Kingman (1, 2) introduced the n-coalescent. The Appendix, section A. In-coalescent describes the probability density function of a genealogy of samples embedded in a population with fixed size. The f -Coalescent Based on the Nest-Site Model. The nest-site Extensions to this probabilistic description of the genealogical model allows for different qualities of nest sites, therefore lead- process include changing population size (3, 4), immigration (5, ing to differences among offspring numbers, leading to the 6), population divergence (7), selection (8), and recombination Canning model. The habitat structure determines the distribu- (9). These theoretical advances resulted in several widely used tion of offspring numbers. Consider a haploid population model computer packages that estimate various population parameters (for example, refs. 10–12). While the waiting times for events Significance in the n-coalescent are exponentially distributed, a more gen- eral framework of these waiting times is offered by the field The fractional coalescent is a generalization of Kingman’s of fractional calculus (13–18). Fractional calculus has attracted n-coalescent. It facilitates the development of the theory considerable interest because of the ability to model complex of population genetic processes that deviate from Poisson- phenomena, such as continuum and statistical mechanics (19) distributed waiting times. It also marks the use of methods and viscoelastic materials (20). We introduce fractional calcu- developed in fractional calculus in population genetics. The lus into population genetics. Our work concentrates on the use fractional coalescent is an extension of Canning’s model, of the fractional Poisson process (21) in the context of the where the variance of the number of offspring per parent is a coalescent, and we introduce a model of coalescence, the frac- random variable. The distribution of the number of offspring f f tional coalescent, or -coalescent. We derive the -coalescent depends on a parameter α, which is a potential measure of based on the discrete-time Cannings model and present the the environmental heterogeneity that is commonly ignored in f f properties of the -coalescent. This -coalescent is then imple- current inferences. mented in a Bayesian estimator of effective population size; we discuss the implementation and runtime characteristics. We Author contributions: S.M. and P.B. designed research, performed research, contributed explore the quality of the inference for simulated datasets new reagents/analytic tools, analyzed data, and wrote the paper.y and also apply the method to three real datasets: mitochon- The authors declare no conflict of interest.y drial sequence data of humpback whales (22), mitochondrial This article is a PNAS Direct Submission.y data of the malaria parasite Plasmodium falciparum (23), and This open access article is distributed under Creative Commons Attribution License 4.0 complete genome data of the H1N1 influenza virus strain col- (CC BY).y lected in Mexico City in 2014. The biological motivation of this Data deposition: MIGRATE output files are available in the GitHub repository, https:// model is discussed by using a simulator that assigns an envi- github.com/pbeerli/fractional-coalescent-material.y ronmental quality affecting the chance of having offspring to 1 To whom correspondence should be addressed. Email: [email protected] each individual of a population. The dataset which is derived This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. based on this simulator shows the potential heterogeneity within 1073/pnas.1810239116/-/DCSupplemental.y a population. It is shown that the f -coalescent is a better www.pnas.org/cgi/doi/10.1073/pnas.1810239116 PNAS Latest Articles j 1 of 6 Downloaded by guest on September 30, 2021 with a fixed population size N . Individuals can occupy places By using SI Appendix, Eq. S62, the average of Eq. 6 over the with reproduction conditions 1, ::: , L. Consider N individuals distribution of σ2 2 (0, 1) shows the probability that the two per generation, where fixed proportions β1, ::: , βL≥0 of them lineages remain distinct for N units of scaled time as P have condition i ( βi = 1) and the total number of offspring i 2 of all individuals in condition i is N χi , where χi 2 [0, 1] fixed σj N τ X 2 2 1 N τ α P E! 1 − = !(σj , α) 1 − σj !Eα(−τ ), with χi = 1. Assume the N χi offspring are produced by their N N i j N βi parents via Wright–Fisher sampling. In this model, βi and [7] χi are fixed and constant across generations; therefore, King- α as N goes to infinity, Eα(−τ ) is the Mittag–Leffler function N = N /σ2 man’s coalescent, by changing the time scale to e , is (SI Appendix, section N) (27). We choose the time scale as 2 PL 2 σ = χ /β SI 1/α an appropriate model, where i=1 i i (details are in τ = t=(N ); thus, in the limit, the coalescence time for a Appendix, section D). pair of lineages is distributed as the fractional generalization If in this model the quality of nest sites is a random vari- of the exponential distribution (28). We can generalize the f - able, then the probability of coalescence becomes a random coalescent from two lineages to k lineages by changing τ ! variable and Kingman’s coalescent cannot be an appropriate k1/α model to describe this probability. Suppose χi is a discrete ran- τ 2 . The probability that the two lineages among k lineages dom variable, which is drawn once and is identical for each remain distinct for N units of scaled time is j 0 1 1 generation, whose possible values are χi , j = 1, 2, :::, where k ! j j j α P X 2 2 2 N τ k α i χi = 1, j = 1, 2, :::. For each case, for example χi , N χi !(σ , α) 1 − σ !E (− τ ): [8] j @ j N A α 2 offspring are produced by their N βi parents via Wright–Fisher j sampling. Similar to ref. 26, the probability that two individuals come from the same parent in the immediately previous Choosing the time scale as τ = t=(N 1/α) keeps the parameter generation is (population size) the same as the n-coalescent (SI Appendix, section B). L j 2 j j X j N χi − 1 1 Based on Eq. 6, each value of the random variable σj leads to Pfcoaljχ1 = χ1, ::: , χL = χLg = χi : N N βi Kingman’s n-coalescent genealogy on a suitable timescale which i=1 [1] is a bifurcating genealogy (SI Appendix, Eq. S12). Eq. 7 shows As N increases, the probability of coalescence, which is a random that the average of these bifurcating genealogies leads to the f variable, becomes -coalescent on a suitable timescale, which still is a bifurcating genealogy (SI Appendix, Eq.

Fractional Coalescent

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support