<<

Downloaded by guest on September 30, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1810239116 estimate suggest accurately studies to possible Simulation is it MIGRATE. that software inference model length of interval The ancestor a common in recent that most probability the the to and time the of When distribution intervals. The time long very the for allow occasionally but of the values of times; variance the parameter (f the parameter which on depends in offspring of model fractional number discrete- population the the on Cannings based coalescent, is time 2018) derivation 21, The June the introduced. review is for to (received coalescent), 2019 15, approach February approved and An MA, Cambridge, University, Harvard Edwards, V. Scott by Edited a Mashayekhi Somayeh coalescent Fractional ouain ti hw htthe that shown is derived It is within which heterogeneity population. potential dataset to the envi- a shows The offspring simulator an population. this having on assigns a of based that of chance simulator individual the a each this affecting using of quality motivation by col- ronmental biological discussed strain The 2014. is virus in influenza model City H1N1 Mexico the in mitochondrial of lected mitochon- data (22), parasite datasets: genome datasets whales malaria complete real simulated the humpback three of of for to data data method inference sequence the the drial apply of size; also population quality We and characteristics. effective the runtime of and explore implementation estimator the the Bayesian discuss present we a and in model mented Cannings the the discrete-time of of properties the frac- context the on the coalescence, based of in or model coalescent, (21) a tional introduce process use we Poisson the and on coalescent, fractional concentrates calcu- the work Our fractional of genetics. introduce (19) population We into mechanics (20). complex lus statistical materials model and viscoelastic to continuum field and ability as the the such by attracted of phenomena, offered has because calculus is interest Fractional times considerable (13–18). waiting calculus fractional events these of for of times framework waiting eral the While the 10–12). in refs. used parameters example, widely population (for several various estimate in that resulted packages recombination computer advances and theoretical (8), These selection (5, (9). (7), immigration divergence 4), genealogical (3, population the size 6), population of changing description size. include fixed probabilistic process with this population to a in Extensions embedded samples of genealogy I heterogeneity environmental coalescent n-coalescent. MIGRATE program ence the an of showed development parasites) the of malaria with fit and model data influenza improved (H1N1 simulated data of real comparisons pop- and a factor within Bayes heterogeneity ulation. environmental potential detect can known with generated were eateto cetfi optn,FoiaSaeUiest,Tlaase L32306 FL Tallahassee, University, State Florida Computing, Scientific of Department 92 iga 1 )itoue the introduced 2) (1, n Kingman 1982, n caecn ecie h rbblt est ucino a of function density probability the describes -coalescent f f caecn n h Kingman’s the and -coalescent caecn a enipeetdi h ouaingenetic population the in implemented been has -coalescent n caecn r xoetal itiue,amr gen- more a distributed, exponentially are -coalescent | rcinlcalculus fractional α fet h aiblt ftepten ftewaiting the of patterns the of variability the affects edt nices fsottm intervals, time short of increase an to lead α<1 f caecn.This -coalescent. f caecn n t nlso noteinfer- the into inclusion its and -coalescent f aiiae etn o eitosfo the from deviations for testing facilitates caecn.W eiethe derive We -coalescent. a,1 n f ee ecn from descend genes | caecn vrthe over -coalescent n ee Beerli Peter and ouaingenetics population T α lsoimfalciparum Plasmodium o the for ausadta the that and values caecn r equivalent. are n-coalescent f caecn ste imple- then is -coalescent f f caecn sabetter a is -coalescent α caecn r derived. are -coalescent ausfo aathat data from values a | n m aeininference Bayesian hsadditional This α. caecn.The -coalescent. caecn.The n-coalescent. neta genes ancestral f f -coalescent -coalescent 2) and (23), α α<1 = 1, | - 1073/pnas.1810239116/-/DCSupplemental. y at online information supporting contains article This 1 github.com/pbeerli/fractional-coalescent-material M deposition: Data under distributed BY).y is (CC article access open This Submission.y Direct PNAS a is article This interest.y of conflict no declare paper.y authors the The wrote and contributed data, research, analyzed performed tools, research, reagents/analytic designed new P.B. and S.M. contributions: Author oe hnthe than model n odfeecsaogofpignmes edn othe model population to haploid distribu- a leading the Consider numbers. determines numbers, offspring structure of tion offspring habitat The among model. lead- therefore Canning differences sites, nest to of qualities ing different for allows model The a nrdcdb aee 2) eicue h eiainof derivation the included We (26). the Wakeley by introduced was the derive We Model about datasets estimates handle heterogeneity. give to this also of only magnitude but not the conditions, able such be under should generated it therefore, times; these this reflects But, that The parameter heterogeneity. a heterogeneity. estimating environmental allow not by do approaches both induced (25); variance be offspring large could or (24) selection on strong focused either coalescence esti- multiple-merger 9–12). parameter of biased (3–7, Development individu- to environment mates. lead the all may by population, heterogeneity way this a same Neglecting the within in that, affected assume are als to common is It Motivation dataset. olsetfrteWih–ihradteCnig oe in model Cannings A the section and Appendix, Wright–Fisher the for coalescent the compare n we sec- Since Appendix, C). (SI the process tion Markov as continuous-time way a as equivalent emerges an the in of process, derivation alternative semi-Markov an and B) section owo orsodnesol eadesd mi:[email protected] Email: addressed. be should correspondence whom To caecn,w aeicue eiaino Kingman’s of derivation a included have we -coalescent, h niomna eeoeet hti omnyinrdin ignored inferences. commonly current is that heterogeneity environmental offspringthe parameter of a number model, a on the is of depends Canning’s parent distribution per offspring The of of variable. number random extension the of an The variance the genetics. where is methods population of coalescent in use calculus fractional the fractional marks Poisson- theory in also from developed It the deviate times. that of waiting Kingman’s processes distributed development genetic of the population generalization of facilitates a It is n-coalescent. coalescent fractional The Significance f f IAppendix, (SI model Cannings discrete the from -coalescent Caecn ae nteNs-ieModel. Nest-Site the on Based -Coalescent IGRATE f caecn ae ntens-iemdlwhich model nest-site the on based -coalescent n caecn odsrb h aiblt fthis of variability the describe to -coalescent . uptfie r vial nteGtu repository, GitHub the in available are files output f caecn losnnxoeta waiting nonexponential allows -coalescent hc saptnilmaueof measure potential a is which α, f caecn ihteKingman’s the with -coalescent raieCmosAtiuinLcne4.0 License Attribution Commons Creative . y www.pnas.org/lookup/suppl/doi:10. NSLts Articles Latest PNAS f caecn sa as -coalescent h nest-site The n -coalescent | https:// f6 of 1 n SI -

EVOLUTION with a fixed population size N . Individuals can occupy places By using SI Appendix, Eq. S62, the average of Eq. 6 over the with reproduction conditions 1, ... , L. Consider N individuals distribution of σ2 ∈ (0, ∞) shows the probability that the two per generation, where fixed proportions β1, ... , βL≥0 of them lineages remain distinct for N units of scaled time as P have condition i ( βi = 1) and the total number of offspring i  2     of all individuals in condition i is N χi , where χi ∈ [0, 1] fixed σj N τ X 2 2 1 N τ α P Eω 1 − = ω(σj , α) 1 − σj → Eα(−τ ), with χi = 1. Assume the N χi offspring are produced by their N N i j N βi parents via Wright–Fisher sampling. In this model, βi and [7] χi are fixed and constant across generations; therefore, King- α as N goes to infinity, Eα(−τ ) is the Mittag–Leffler function N = N /σ2 man’s coalescent, by changing the time scale to e , is (SI Appendix, section N) (27). We choose the time scale as 2 PL 2 σ = χ /β SI 1/α an appropriate model, where i=1 i i (details are in τ = t/(N ); thus, in the limit, the coalescence time for a Appendix, section D). pair of lineages is distributed as the fractional generalization If in this model the quality of nest sites is a random vari- of the exponential distribution (28). We can generalize the f - able, then the probability of coalescence becomes a random coalescent from two lineages to k lineages by changing τ → variable and Kingman’s coalescent cannot be an appropriate k1/α model to describe this probability. Suppose χi is a discrete ran- τ 2 . The probability that the two lineages among k lineages dom variable, which is drawn once and is identical for each remain distinct for N units of scaled time is j  1  generation, whose possible values are χi , j = 1, 2, ..., where k ! j j j  α P X 2 2 2 N τ k α i χi = 1, j = 1, 2, .... For each case, for example χi , N χi ω(σ , α) 1 − σ → E (− τ ). [8] j  j N  α 2 offspring are produced by their N βi parents via Wright–Fisher j sampling. Similar to ref. 26, the probability that two individ- uals come from the same parent in the immediately previous Choosing the time scale as τ = t/(N 1/α) keeps the parameter generation is (population size) the same as the n-coalescent (SI Appendix, section B). L  j   2 j j X j N χi − 1 1 Based on Eq. 6, each value of the random variable σj leads to P{coal|χ1 = χ1, ... , χL = χL} = χi . N N βi Kingman’s n-coalescent genealogy on a suitable timescale which i=1 [1] is a bifurcating genealogy (SI Appendix, Eq. S12). Eq. 7 shows As N increases, the probability of coalescence, which is a random that the average of these bifurcating genealogies leads to the f variable, becomes -coalescent on a suitable timescale, which still is a bifurcat- ing genealogy (SI Appendix, Eq. S15). An alternative derivation L which characterizes the f -coalescent as a semi-Markov process (χj )2 j j 1 X i (SI Appendix, section C) shows that the f -coalescent does not P{coal|χ1 = χ1, ... , χL = χL} ≈ . [2] N βi i=1 require multiple mergers, similar to the n-coalescent. These dif- ferent derivations of the f -coalescent suggest that we have a This argument shows that, in each case j = 1, 2, ..., with proba- versatile framework that maintains the strictly bifurcating prop- bility βi , the individual will have a Poisson number of offspring erty of the n-coalescent, but permits more variability in waiting j with mean and variance equal to χi /βi . Then, the expected num- times between coalescent events. Thus, this versatility may allow ber of its offspring is equal to 1. By conditioning on the type of us to infer processes that change the waiting times—for exam- nest site, the individual ends up occupying and the variance of the ple, selection—better. Currently, coalescent models that allow number of offspring then σ2 becomes a random variable whose multiple mergers, such as the BS-coalescent (cf. 24), are used to 2 possible values are σj with j = 1, 2, ..., where discuss such forces. The f -coalescent may be a viable alternative. Properties of the f -Coalescent. The n-coalescent has two steps: L j  j 2! L j 2 2 X χi χi X (χi ) First, choose a pair to coalesce by using the concept of equiva- σj = βi + − 1 = . [3] βi βi βi i=1 i=1 lence classes; second, pick a waiting time in which two lineages need to coalesce. For the f -coalescent, we changed the second Using Eqs. 2 and 3, we have step, resulting in a different time to the most recent common ancestor (TMRCA) compared with the n-coalescent. We derive N this new distribution of the T of the f -coalescent and com- N j = , [4] MRCA e 2 T n σj pare it with the MRCA of the -coalescent. We also present the probability that n genes are descendants from m ancestral and genes using the f -coalescent and compare these results with the 2 j j 1 σj n-coalescent. To do this, we extend the work of ref. 29 to the f - P{coal|χ1 = χ1, ... , χL = χL} ≈ j = . [5] 7 N N coalescent. In the following theorems, we use Eq. . These lead e to the probability density of waiting times of the f -coalescent Assume that the variance of the number of offspring is a random 2 α−1 α variable [σ ∈ (0, ∞)] which has the probability mass function f (t) = t λEα,α(−λt ). [9] ω(σ2, α) where 0<α≤1. Suppose this probability mass function For α = 1, this is equivalent to an exponential distribution which has a closed form which has been introduced in SI Appendix, Eq. is used for the n-coalescent. S60; since 0<α≤1 is a parameter, ω(σ2, α) can have different forms depending on the α. The relation between the probabil- f (t) = t αi −1λ E (−λ t αi ) 2 Theorem 1. Suppose Ti i αi ,αi i is the distri- ity mass function of χi and ω(σ , α) is presented in SI Appendix, bution of a waiting time in the f -coalescent, where Ti , i = 2, ... , n section O. i  are the coalescent times and λi = 2 if α1 = α2 = ... = αn , then SI Appendix S5 n By this assumption, similar to , Eq. , the proba- the distribution of T = P T is bility that the two lineages remain distinct for N units of scaled MRCA i=2 i   time is n n X Y λk 2 f (t) =  f (t), [10]  σ  TMRCA Ti 2 2 j N τ 2  λk − λi  P{not coal |σ = σj } = 1 − . σj ∈ (0, ∞). [6] i=2 k=2 N k=6 n

2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1810239116 Mashayekhi and Beerli Downloaded by guest on September 30, 2021 Downloaded by guest on September 30, 2021 m coalescent, 2. Theorem from descendants Proof: 1). lto aemr netr naui iecmae ihthe with compared time unit a in ancestors n more have ulation increases. size f sample the as distributions model, Wright–Fisher 2, where ahykiadBeerli and Mashayekhi coalesce, can for generation assumption per same get: lineages the we two use then only we If that one. coalescent particular one pick configura- possible of u time number any the the account at into tions on take times to interval Based need the we by Genealogy defined topologies a possible of Function Coalescent. Density Probability com- to (BS-coalescent). recent empirically, most Results the the and to pare Methods time in the ancestor derive also common We E. section Appendix, of values ancestor, of common bution recent most the for the While to which, time times of waiting tribution the of patterns the Corollary. Proof: or, ouainsz dtisaein are (details size population where the m -coalescent, k -coalescent.                        ntenx hoe,w eietepoaiiythat probability the derive we theorem, next the In oedtisrltdt hs w hoesaepeetdin presented are theorems two these to related details More hc orsod oapro of period a to corresponds which genes erae ut ail as rapidly quite decreases and equivalently, f -coalescent i E i f P =m =2 n α (G n K o h ro,see proof, the For o h ro,see proof, the For P k n (−λ [i

T +1 ] f )= |Θ) iegs hr are there lineages, α stenme fsmlsand samples of number the is + = f f h parameter The caecn ihteBolthausen–Sznitman-coalescent the with -coalescent k T T k nt ftm g is ago time of units h probability the Q

oetatapriua genealogy particular a extract To in =n 6 λ n ihtesm supini Theorem in assumption same the With =2 n MRCA MRCA n i = T n egv oenmrclvleo Eq. of value numerical some give We ti oelkl that likely more is it −λ (n k λ IAppendix, SI =m -coalescent, i λ α k k k Y Y hscnb rsne as presented be can this Q IAppendix, (SI k K K 6 =n m =2 =2 ) n (t (t − λ −λ +1 k E m u = ) ) 1)  u α k i ovre nadsrbto ihma equal mean with distribution a on converges k α−1 yuigKingman’s using by ; (−λ ! neta ee nthe in genes ancestral u λ . . . o the for X k k i α λ (−E =2 −λ n −1 k (n  m α eto E. section Appendix, SI eto E. section Appendix, SI i

P k  P T stesml ieincreases, size sample the as ! α nTheorem in eto F section − T (k nm (2i Θ (−λ nm 2 f α  k 2 Θ i eto F section -coalescent, eto H ). section Appendix, SI )  increases,  − (T λ  (T 1), + E − oeta ofiuain,adwe and configurations, potential m i 1) α λ T −λ ) 1)(−1) = ) i ,α n n 2N  α that (−λ i ( . E + ) n ee ape rmapop- a from sampled genes i E nthe In ) α ( α ). saresult, a as i eeain ntehaploid the in generations ,α 1 Θ ) (−λ n u hsi o h aein case the not is this but E k = (−λ i yuigEq. using By fet h aiblt in variability the affects u α n ehv heavy-tailed a have we stemutation-scaled the is ee ecne from descended genes k α f (0)) n n [i n i -coalescent. ) ] caecn o any for -coalescent (n T k  G -coalescent ! u α , f 1) + k α u ftemany the of out T ) u ) i 12 fet h dis- the affects (t 2 m m 1 1 k , 2 1, ), . . .  n u o different for < 1 = = 3 , f 12, ee are genes T h distri- the , m nthe in (n n . . . MRCA (α . < nthe in + , 1 = [12] [13] [11] [14] n u (t i K SI f f − f ). ) - - - , ;dt iuae ihaparticular a recover with to simulated difficult data be 2); will Fig. it Appendix, general, parameter In the (35–38). J) section at the (available from genealogies package simulator com/pbeerli/fractional-coalescent-material our updated Simulation. investigation. further need the will of BS-coalescent mapping the but others, parameter the with compared similar rather the some in of distributions are that f (details recognize are We parametrization J). BS-coalescent the section the Appendix, of with because Comparisons difficult 34). more (cf. tail heavy the expected The n with respectively; 0.00576 shrinkage, of with strong 0.00288 medians and had BS-coalescent growth, the strong growth, no α the for 0.00379 the were with the and than for intervals tails values heavier time median to coalescent; short leading intervals, more time with large rare peaked more becomes tion simulated 100,000 on for values ferent the for the of individuals distributions empirical of ples in are the details the for for individuals five Ancestor the of of Common distributions Recent empirical compared Most We the to Time Results and Methods Mittag–Leffler the in of shown implementation are the function to related details The Eq. for details where hnigagrtmaedsrbdb e.3.T rwanwtime new tree- a the draw of To 30. Details ref. events. (t by for described times are algorithm new changing draw we MCMC, the the cal- density, is posterior culating the it approximate chain to Markov (MCMC) uses model—here, Carlo software Monte The population function. Mittag–Leffler the particular for a size population for -scaled density set posterior eter Bayesian the f approximate M we software the of framework Implementation. ubr between numbers where itiuin a eue osml h ie sarsl,tetime is the function result, a Mittag–Leffler As the time. from the stable derived sample geometric to used of (28). be simulation mixture can fast al. a distributions the as et functions, expressed MacNamara be exponential can of distri- by function Mittag–Leffler introduced Mittag–Leffler the the been Since of has method which sampling bution the use we fore, -coalescent (ρ)f caecn o v ape iuae iha with simulated samples five for -coalescent 0 0 = P sn Eq. Using t ,w solve we ), 0 (t = .06,0033 n .03 o the for 0.00031 and 0.00343, 0.00667, .8; (X > r − k 1 f |ρ, t  0 (ρ) and and 1 = ) T λ eeautdteagrtm sn iuain.We simulations. using algorithms the evaluated We α)/f 1 T c k 15 T and MRCA and  3) i.1sosexam- shows 1 Fig. (33). J ) section Appendix, SI T 16 r C α 2 oda ie ietyi iecnuig There- time-consuming. is directly times draw to − MRCA α 1 eipeetdormdli h existing the in model our implemented We (X α 0 = aebe rsne yMcaaae l (28). al. et MacNamara by presented been have r w needn admnmes More numbers. random independent two are α r xdnmes n ecos random choose we and numbers, fixed are ( f 0 1) (0, Z Θ tan (X n 0 where |α), htwsue osmlt h aa(SI data the simulate to used was that n h Scaecn.Ec uv sbased is curve Each BS-coalescent. the and n oeo h BS-coalescent T the of some and h xetto fthe of expectation The .005 t n oprsno the of comparison and caecn,the -coalescent, f 0 o the for eto I section Appendix, SI caecn with -coalescent | f (πα u T ρ, sin caecn dtisaein are (details -coalescent for α MRCA ocos regnaoyduring genealogy tree a choose To α). −1 (πα f (1 P -, λ − k (t f n IGRATE ) E aus With values. X caecn sifiiebcueof because infinite is -coalescent ,adB-olset(2 (more (32) BS-coalescent and -, r > α 1 Θ—and ,α stedt and data the is )) T t 0 (−λ MRCA ). − T f cos MRCA caecn ihtodif- two with -coalescent α k 1) nti framework, this In (10). NSLts Articles Latest PNAS u 0 = ftedfeetmodels different the of α (πα α oalwgenerating allow to ) f )du safie parameter fixed a is T (31). < α caecn ihthe with -coalescent .9 o apeo five of sample a for )) MRCA n 0.01 = Θ n .02 with 0.00026 and = caecn with -coalescent α 1 h distribu- the 1, ρ T log T E https://github. steparam- the is C α IAppendix, SI MRCA o sample a for f f (−λ (r -Coalescent. 0 = (ρ|X MRCA 2 ), s0.008. is .01 α k o the for | t , 0 α = α) have f6 of 3 look [16] [15] ), and n SI -

EVOLUTION A which the standard coalescent is based. We compared the f- coalescent with our heterogeneity simulations, and Fig. 2B shows 0.15 f-coalescent =0.9 that the f-coalescent with α < 1 may give good descriptions of the f-coalescent =0.8 fluctuating environment in the past when we use Bayesian-model n-coalescent selection criteria. 0.10 n-coalescent g=-500 n-coalescent g=500 Comparing the f-Coalescent with Structured Coalescence and Popu- lation Growth. The f -coalescent has an additional parameter α

Probability which could reflect hidden structure. To explore whether the 0.05 parameter α responds to population structure, we simulated data using the structured coalescent for two populations exchanging migrants using three different magnitudes of gene flow (details 0.00 are in SI Appendix, section K). These datasets were evaluated 0.005 0.010 0.015 0.020 0.025 0.030 under two different scenarios: For scenario A, we sampled data from both subpopulations, and for scenario B, we collected data B Time to most recent common ancestor only from subpopulation 1. For scenario A, we ran models that assumed that the population is (i) not structured with α < 1, (ii) the standard single-population n-coalescent, and (iii) a struc- 0.15 f-coalescent =0.9 tured n-coalescent model. A Bayesian-model selection approach f-coalescent =0.8 excludes all models with α < 1. The nonstructured standard n-coalescent n-coalescent model is rejected for the low and medium gene- BS-coalescent Tc=0.005 0.10 flow scenarios, but the single-population n-coalescent model is BS-coalescent Tc=0.01 the best model for data simulated with high immigration rates (details are in SI Appendix, section K). For scenario B, we ran Probability 0.05 models that assume that the population is (i) not structured with α < 1, (ii) the standard single-population n-coalescent, and (iii)a model that assumes a ghost population (39). Models that include

0.00 0.005 0.010 0.015 0.020 0.025 0.030 Time to most recent common ancestor A

Fig. 1. Empirical distribution of the time of the most recent common ancestor for various coalescents: strictly bifurcating (A) and f-coalescent vs. n-coalescent and multifurcating BS-coalescent (B). The x axis is trun- cated at 0.03. Each curve represent a histogram of 100,000 draws of the TMRCA.

a considerable range when estimated. A problem with these simulations is that the number of variable sites is dependent on the mutation-scaled population size Θ and the simulated branch length. Despite having the same Θ = 0.01 among all simulations, the number of variable sites varies considerably over the range 78 of α: With an α = 0.4, ∼ 10,000 sites (mean) are variable, com- 253 pared with α = 1.0 with 10,000 sites. With α ≤ 0.4, it is common to see no variable sites in a locus; median among 100 loci is zero. Low-variability datasets are difficult to use in any coalescent B framework.

Simulated Heterogeneity and the f-Coalescent. Environmental het- erogeneity can have an effect on the capacity to produce offspring. We have developed a simple forward population simulator (open source software on https://github.com/pbeerli/ fractional-coalescent-material) that assigns an environmental quality affecting the chance of each individual of having off- spring; otherwise, the simulator assumes a Wright–Fisher pop- ulation with constant population size and every generation the parental population dies. If the simulator runs long enough, then one can generate a population genealogy out of which we can draw a coalescent sample of n individuals. Using these genealo- gies, we then generated DNA-sequence datasets that come from different regimes: (i) the environmental quality is the same for all (“equal”); (ii) the environmental quality is different, with three categories with relative rates of 1, 2, and 4 (“skewed”); and Fig. 2. Comparison of the effect of variable environment on genealogy (iii) with relatives rates of 1, 2, and 8 (“more skewed”). Fig. 2A and estimation. (A) Effect on the time of the most recent common ancestor; shows the distribution of the TMRCA for the three scenarios. The histogram of 1,000 independent genealogies of 10 individuals. (B) Ln mL of results suggest that variable environments change the TMRCA to a single-locus dataset for different values of α as a measure of the effect of become more recent than those in invariant environments on the environment on the data.

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1810239116 Mashayekhi and Beerli Downloaded by guest on September 30, 2021 Downloaded by guest on September 30, 2021 ugsigta h rwhmdli neirt the ( to inferior was is model M model best growth the the that with suggesting comparison lmL relative the constant-size the than with model best the the for mL the (growth used parameters two that estimating model growth, a nential ran also We data. Kingman’s 0.09504. to At for range. at this and in considerably varied of ities range the within models of in at range mL Mexico the maximum from well-defined influenza a of had strain 2014 H1N1 the of dataset segment for mode n a had model a best The 0.00693. At was range. this size, in population mutation-scaled mated probabilities l etdmdl ihnterneof range the within models mod- tested data, all of malaria-parasite range the the within For els dataset. whale humpback Kingman’s 0.03480. to 0.0000 ahykiadBeerli and Mashayekhi of mode a had Θ size, at population was mL maximum an with models the within lmL) models that suggests of mL range using selection model data, pattern time S3 and waiting single exponential a the of disturb at will migrants (mLs) rare likelihoods marginal the rates, higher immigration low the With the model. picked rejecting best, immigration the high as very to high fNrhAlni upakwae Fg ;dtie description dataset detailed a 3; in and (Fig. falciparum), is whales (P. humpback Atlantic parasite North malaria of a 2014, in break n the whether Results. Data Real different with models of mLs ln marks the connects line solid The 3. Fig. ABC https://github.com/pbeerli/fractional-coalescent-material). caecn a orfi h aai-aaiedt.Teeight- The data. malaria-parasite the fit poor a was -coalescent caecn:a 11iflez aae fteMxc iyout- City Mexico the of dataset influenza H1N1 an -coalescent: rdblt nevlfo .24 o0072 Kingman’s 0.03762. to 0.02340 from interval 95%−credibility IGRATE a .33,adat and 0.03030, was 0 = Θ > 4–2.Frtehmbc whale humpback the For (40–42). J) section Appendix, SI > Θ ). ,a (A), eight-locus influenza H1N1 an of mLs relative using selection Model α ftebs oe o ahdataset. each for model best the of 0.Teetmtdmtto-cldpplto size, population mutation-scaled estimated The .01. − 0. 0 = .05790 hc rnltst oe rbblte of probabilities model to translates which 2.07, 8<α uptfie r vial ntegtu repository github the in available are files output θ n 0.60≤α80 .80, f .1 h aiu Lwsat was mL maximum The >0.01. + alsS2 Tables Appendix, (SI population -coalescent caecn ol eabte tta Kingman’s than fit better a be could -coalescent ≤1.0 0.01470 = Θ G aidcnieal nti ag.At range. this in considerably varied Θ, 0.6 α< n a and Θ α eue he ilgcldtst oexplore to datasets biological three used We oe lL= (lmL model α 0 = a .31.Tebs oe a mode a had model best The 0.03210. was r rfre ihrltv o L(relative mL log relative with preferred are 0 = α n 0.55<α n α .55, a oe rbblt f000.The 0.0000. of probability model a had caecn oe lL= (lmL model -coalescent 0 = rdblt nevlfo 0.02640 from interval 95%−credibility caecn a orfi h influenza the fit poor a was -coalescent 1 = .80 f caecn n h ghost-population the and -coalescent a eaielmL relative had . Θ 0. h siae mutation-scaled estimated The 95. . lL= (lmL n a and 0, 0.85 55<α< n ≤0.85 a .10,adat and 0.11007, was Θ caecn sago tfrthe for fit good a is -coalescent a .17.Tebs model best The 0.01170. was 9452)wslwrthan lower was −19,455.27) rdblt for 95%−credibility 9382)adas lower also and −19,338.28) a eaielmL relative had 0. α α f 0.85 55<α< n aidconsiderably varied Θ, 0 = α caecn oe had model -coalescent 0 = caecn n expo- and -coalescent a oe probabil- model had 0 = n . > ugsigthat suggesting .9, 6, caecn model -coalescent α . − 0.03051 = Θ oeswithin Models 7. Θ 0 = l tested all 3.20; α g f a 0.10530, was −19,342.24); aus h ahdln ak h uainsae fetv ouainsize population effective mutation-scaled the marks line dashed the values; h esti- The .7 and α -coalescent. a model had 0 = > −118.14, α .The Θ). Θ − >0.04; .85, 0 = 3.49; from and Θ, .8, Θ sltdpnitcpplto ilw e etrmdlfit model better a a an analyze is get and this we data that will a assume with simulated population that the panmictic models isolated of with half sim- subpopulation were single remove that intervals we data time structured of when a and Analyses using shortened, lengthened. by are ulated are more root tips have the the to near near seem the intervals but than time tails, inter- branches heavy time have the longer also genealogy, populations the are ing of the date shortened; root sampling are the the vals near vari- near and the intervals enlarged, the match time of not growth, intervals do population time structure of population ability or the growth lation of Extensions intervals. ieo nunai osdrbyudrsiae hnusing when underestimated considerably on is dependent influenza highly of is The diver- species size decisions. the parameter different thus heterogeneity informed the and to the make of sizes need population to estimates may effective sity want the heterogeneity we of that malaria-parasite estimates if the suggest considered whereas data be lit- whales, have influenza dataset. humpback resulting may and on the heterogeneity in effect environmental sys- variability tle that high immune show to vertebrates; individual lead Results in of encounter may live bloodstream may that to the tems viruses able only and be influenza have mosquitoes to and and of needs far, parasite very saliva move malaria the can the time, offspring; long Hump- few a strategies: live the life-history whales different by back very explained represent The be datasets three selection. these could in under population datasets are a that within organ- heterogeneity organisms short-lived environmental vs. fast-evolving long-lived or the of isms that evolution of the indicate understanding rejected may data influenza This the The and whales. data Kingman’s humpback the different reject long-lived not of the comparison for mL sug- not examples and data real pathogens the three extend The that models. to gest of need types will these we study but population, single-population hetero- single our environmental a mimic within geneity may times waiting that long compartmentalization and short the of mix of unique The generations. 10 whereas branch possible, short are lengths very branch of large Mixtures the very intervals. with time lengths variable the very of feature A Discussion n .falciparum P. caecn ocsamr vndsrbto fteetime these of distribution even more a forces -coalescent f f caecn oe hnteimgainrt s1per 1 is rate immigration the when model -coalescent caecn hsmyalwifrne ihunknown with inferences allow may thus -coalescent f caecn sabte tt h aafrthe for data the to fit better a is -coalescent ,adahmbc hl tN (C mtDNA whale humpback a and (B), mtDNA f caecn steaiiyt accommodate to ability the is -coalescent f caecn osrcue ouain to populations structured to -coalescent n caecnswt xoetal shrink- exponentially with -coalescents n caecn,bttemalaria-parasite the but -coalescent, f n caecn.I the In -coalescent. n α caecn oe hwta only that show model -coalescent freape h population the example, α—for f caecn oalwfrpopu- for allow to -coalescent f caecn a mrv our improve may -coalescent o h upakwae did whales humpback the for caecn.Wt exponential With -coalescent. f NSLts Articles Latest PNAS caecn.Tethree The -coalescent. n caecn clearly. -coalescent n h square the and Θ; f -coalescent, dataset. ) | f6 of 5

EVOLUTION Kingman’s n-coalescent. Our introduction of the f -coalescent ACKNOWLEDGMENTS We thank John Wakeley and three anonymous refer- opens a window into further research that allows handling of ees for giving us many suggestions that helped us to improve our manuscript considerably. This work has been supported by National Science Foundation heterogeneity that cannot be explained by population growth Award DBI 1564822 and the IT resources at the research computing center or population structure. of Florida State University.

1. Kingman JFC (1982) The coalescent. Stoch Process Their Appl 13:235–248. 22. Roman J, Palumbi S (2003) Whales before whaling in the North Atlantic. Science 2. Kingman JF (1982) On the genealogy of large populations. J Appl Probab 19:27–43. 301:508–510. 3. Griffiths RC, Tavare´ S (1994) Sampling theory for neutral alleles in a varying 23. Joy DA, et al. (2003) Early origin and recent expansion of Plasmodium falciparum. environment. Philos Trans R Soc Lond B Biol Sci 344:403–410. Science 300:318–321. 4. Kuhner MK, Yamato J, Felsenstein J (1998) Maximum likelihood estimation of 24. Neher RA, Hallatschek O (2012) Genealogies of rapidly adapting populations. Proc population growth rates based on the coalescent. Genetics 149:429–434. Natl Acad Sci USA 110:437–442. 5. Notohara M (1990) The coalescent and the genealogical process in geographically 25. Eldon B, Wakeley J (2006) Coalescent processes when the distribution of offspring structured population. J Math Biol 29:59–75. number among individuals is highly skewed. Genetics 172:2621–2633. 6. Wilkinson-Herbots HM (1998) Genealogy and subpopulation differentiation under 26. Wakeley J (2009) : An Introduction (Roberts & Company, Green- various models of population structure. J Math Biol 37:535–585. wood Village, CO),p 326. 7. Takahata N (1988) The coalescent in two partially isolated diffusion populations. 27. Haubold HJ, Mathai AM, Saxena RK (2011) Mittag-Leffler functions and their Genet Res 52:213–222. applications. J Appl Math 2011:298628. 8. Neuhauser C, Krone SM (1997) The genealogy of samples in models with selection. 28. MacNamara S, Henry B, McLean W (2017) Fractional Euler limits and their Genetics 145:519–534. applications. SIAM J Appl Math 77:447–469. 9. Griffiths R, Marjoram P (1996) Ancestral inference from samples of DNA sequences 29. Takahata N, Nei M (1985) Gene genealogy and variance of interpopulational with recombination. J Comput Biol 3:479–502. nucleotide differences. Genetics 110:325–344. 10. Beerli P (2006) Comparison of Bayesian and maximum-likelihood inference of 30. Beerli P, Felsenstein J (1999) Maximum-likelihood estimation of migration rates population genetic parameters. Bioinformatics 22:341–345. and effective population numbers in two populations using a coalescent approach. 11. Hey J, Nielsen R (2004) Multilocus methods for estimating population sizes, migra- Genetics 152:763–773. tion rates and divergence time, with applications to the divergence of Drosophila 31. Gorenflo R, Loutchko J, Luchko Y (2002) Computation of the Mittag-Leffler function pseudoobscura and D. persimilis. Genetics 167:747–760. eα,β (z) and its derivative. Fractional Calculus Appl Anal 5:491–518. 12. Kuhner M (2006) LAMARC 2.0: Maximum likelihood and Bayesian estimation of 32. Bolthausen E, Sznitman AS (1998) On Ruelle’s probability cascades and abstract cavity population parameters. Bioinformatics 22:768–770. method. Commun Math Phys 197:247–276. 13. Orsingher E, Polito F (2012) The -fractional Poisson process. Stat Probab Lett 33. Hein J, Schierup MH, Wiuf C (2005) Gene Genealogies, Variation and Evolution: A 82:852–858. Primer in Coalescent Theory (Oxford Univ Press, Oxford). 14. Beghin L, Orsingher E (2009) Fractional Poisson processes and related planar random 34. Repin ON, Saichev AI (2000) Fractional Poisson law. Radiophysics Quan Electron motions. Electron J Probab 14:1790–1826. 43:738–741. 15. Meerschaert MM, Nane E, Vellaisamy P (2011) The fractional Poisson process and the 35. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795. inverse stable subordinator. Electron J Probab 16:1600–1620. 36. Beerli P, Palczewski M (2010) Unified framework to evaluate panmixia and migration 16. Politi M, Kaizoji T, Scalas E (2011) Full characterization of the fractional Poisson direction among multiple sampling locations. Genetics 185:313–326. process. Europhys Lett 96:20004. 37. Palczewski M, Beerli P (2014) Population model comparison using multi-locus 17. Podlubny I (1998) An Introduction to Fractional Derivatives, Fractional Differential datasets. Bayesian : Methods, Algorithms, and Applications, eds Equations, to Methods of Their Solution and Some of Their Applications, Fractional Chen MH, Kuo L, Lewis PO (CRC, Boca Raton, FL), pp 187–200. Differential Equations (Academic, New York), Vol 198. 38. Burnham K, Anderson D (2002) Model Selection and Multimodel Inference: A Practical 18. Mainardi F (2010) Fractional Calculus and Waves in Linear Viscoelasticity: An Information-Theoretic Approach (Springer, New York). Introduction to Mathematical Models (Imperial College Press, London). 39. Beerli P (2004) Effect of unsampled populations on the estimation of population sizes 19. Carpinteri A, Mainardi F (2014) Fractals and Fractional Calculus in Continuum and migration rates between sampled populations. Mol Ecol 13:827–836. Mechanics, CISM Courses and Lectures (Springer, New York), Vol 378. 40. Felsenstein J, Churchill GA (1996) A hidden Markov model approach to variation 20. Mashayekhi S, Miles P, Hussaini MY, Oates WS (2018) Fractional viscoelasticity in frac- among sites in rate of evolution. Mol Biol Evol 13:93–104. tal and non-fractal media: Theory, experimental validation, and uncertainty analysis. 41. Yang Z (1993) Maximum-likelihood estimation of phylogeny from DNA sequences J Mech Phys Sol 111:134–156. when substitution rates differ over sites. Mol Biol Evol 10:1396–1401. 21. Laskin N (2003) Fractional Poisson process. Commun Nonlinear Sci Numer Simul 8: 42. Swofford D (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other 201–213. Methods) (Sinauer Associates, Sunderland, MA), Version 4.

6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1810239116 Mashayekhi and Beerli Downloaded by guest on September 30, 2021