<<

Bayesian inference of reassortment networks reveals fitness benefits of reassortment in human influenza Nicola F. Muller¨ a,b,c,1 , Ugne˙ Stolza,b , Gytis Dudasd , Tanja Stadlera,b, and Timothy G. Vaughana,b,1

aDepartment of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; bSwiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; cVaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109; and dDepartment of Biological and Environmental Sciences, University of Gothenburg, SE-40530 Gothenburg, Sweden

Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved May 11, 2020 (received for review October 22, 2019)

Reassortment is an important source of genetic diversity in seg- ding of segment trees within these, allowing us to infer these mented viruses and is the main source of novel pathogenic parameters from available sequence data. influenza viruses. Despite this, studying the reassortment process The reassortment process modeled in this way differs from has been constrained by the lack of a coherent, model-based infer- other recombination processes in that it is known where on the ence framework. Here, we introduce a coalescent-based model recombination of genetic material occurs and in that that allows us to explicitly model the joint coalescent and reas- there is no ordering of the segments. The lack of linkage between sortment process. In order to perform inference under this model, segments means that at a reassortment event, any subset of we present an efficient Markov chain Monte Carlo algorithm to segments can originate from either parent. sample rooted networks and the embedding of phylogenetic trees In order to perform inference under such a model, the reas- within networks. This algorithm provides the means to jointly sortment network and the embedding of each segment tree infer coalescent and reassortment rates with the reassortment within that network must be jointly inferred. This is similar to network and the embedding of segments in that network from the well-known and challenging problem of inferring ancestral full-genome sequence data. Studying reassortment patterns of recombination graphs (ARGs). While many approaches to infer- different human influenza datasets, we find large differences ring ARGs exist, some are restricted to tree-based networks (12, in reassortment rates across different human influenza viruses. 13), meaning that the networks consist of a base tree where Additionally, we find that reassortment events predominantly recombination edges always attach to edges on the base tree. occur on selectively fitter parts of reassortment networks show- Other approaches (e.g., ref. 14) rely on approximations (15) ing that on a population level, reassortment positively contributes and are not applicable to the reassortment model due to its to the fitness of human influenza viruses. aforementioned lack of segment ordering. Completely general inference methods exist (16), but these are again not directly phylogenetics | phylodynamics | infectious diseases | BEAST | MCMC

Significance hrough rapid evolution, human influenza viruses are able to evade host immunity in populations around the globe. In T processes, such as reassortment, make addition to mutation, reassortment of the different physically it complex or impossible to use standard phylogenetic and unlinked segments of influenza provides an impor- phylodynamic methods. This is due to the fact that the shared tant source of viral diversity (1). If a cell is infected by more evolutionary history of individuals has to be represented by a than one , progenitor viruses can carry segments from more phylogenetic network instead of a tree. We therefore require than one parent (2). With the exception of accidental release novel approaches that allow us to coherently model these pro- of antigenically lagged human influenza viruses (3), reassort- cesses and that allow us to perform inference in the presence ment remains the sole documented mechanism for generating of such processes. Here, we introduce an approach to infer influenza strains (e.g., refs. 4–6). reassortment networks of segmented viruses using a Markov To characterize reassortment events, tanglegrams, comparison chain Monte Carlo approach. Our approach allows us to study between tree heights (7, 8), or ancestral state reconstructions (9) different aspects of the reassortment process and allows us are typically deployed. These approaches identify discordance to show fitness benefits of reassortment events in seasonal between different segment tree topologies or differences in human influenza viruses. pairwise distances between isolates across segment trees. Tangle- grams in particular require a substantial amount of subjectivity Author contributions: N.F.M., T.S., and T.G.V. designed research; N.F.M., U.S., and T.G.V. and have been described as potentially misleading (10). performed research; N.F.M., G.D., and T.G.V. analyzed data; N.F.M., U.S., G.D., T.S., and While the reassortment process has been intensively studied T.G.V. wrote the paper; and N.F.M., U.S., and T.G.V. implemented software.y (e.g., refs. 7–9 and 11), there is currently no explicit model-based The authors declare no competing interest.y inference approach available. We address this by introducing a This article is a PNAS Direct Submission.y coalescent-based model for the reassortment of viral lineages. This open access article is distributed under Creative Commons Attribution License 4.0 In this phylogenetic network model, ancestral lineages carry (CC BY).y genome segments, of which only a subset may be ancestral to Data deposition: All other data, such as log files of BEAST2 runs, as well as scripts to ana- sampled viral genomes. As in a normal coalescent process, net- lyze and plot results, are available on GitHub at https://github.com/nicfel/Reassortment- work lineages coalesce (merge) with each other backward in Material. The source code for the BEAST2 package CoalRe itself is available on GitHub at https://github.com/nicfel/CoalRe. The python scripts to plot phylogenetic networks is time at a rate inversely proportional to the effective popula- available on GitHub at https://github.com/evogytis/baltic.y tion size. We model reassortment (splitting) events as a result 1 To whom correspondence may be addressed. Email: [email protected] or of a constant-rate Poisson process on network lineages. At such [email protected] a splitting event, the ancestry of segments on the original lin- This article contains supporting information online at https://www.pnas.org/lookup/suppl/ eage diverges, with a random subset following each new lineage. doi:10.1073/pnas.1918304117/-/DCSupplemental.y We thus explicitly model reassortment networks and the embed- First published July 6, 2020.

17104–17111 | PNAS | July 21, 2020 | vol. 117 | no. 29 www.pnas.org/cgi/doi/10.1073/pnas.1918304117 Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 rmlf orgt h esoteteet r o ewrswt w,tre n orsegments. four M and three, two, with networks for are events reassortment the 5 right, of to rate left evolutionary From a with simulated sequences the I (y CIs 1. Fig. S3 and S2 Figs. Appendix, Hasegawa– complex more + the (HKY) using Kishino–Yano when hold Additionally, population also S1 ). effective Fig. results infer , Appendix these (SI to rates rates ability reassortment and our evolutionary sizes decrease Lower greatly events. in not events reassortment do coalescent than more con- network many expected typically a is are which there that rates, simu- sidering reassortment from than estimated precisely are rates sizes more population reassortment Effective sequences. and genetic lated sizes population effective these from rates reassortment embedding, and sequences. tree genetic sizes, segment then population network, we effective approach, reassortment MCMC the our inferred Using simula- (Simulations). well-calibrated study a rates performed tion reassortment we and sequences, sizes genetic population from effective infer to Data. Are ability Sequence Rates Genetic Reassortment from and Inferred Sizes Reliably Population Effective of these Inference of fitness low Results and high with edges networks. on reassortment differ as rates subtypes, ment influenza human in seasonal listed five as reas- the quantify well to across as approach sortment this sizes, apply population then We effective rates. evolutionary of can inference reassortment with the coalescent influence the of using lack how show a parameters. we reassortment how these Thirdly, of discuss inference and the we influences information Secondly, sizes, genetic data. population simulated from effective events rates, sortment rates. the evolutionary as segment, well each as rates, of coalescent trees and reassortment phylogenetic those additional reas- the the any infer within network, jointly without to sortment trees us model, allows segment approach coalescent This approximations. of the reassortment embedding under sample networks the jointly and to networks designed specifically approach be to demanding. tend computationally furthermore highly and reassortment modeling to applicable le tal. et uller ¨ i.1 Fig. reas- retrieve to able is approach this that show first We (MCMC) Carlo Monte chain Markov a introduce we Here, x xs s iuae fetv ouainszso the on sizes population effective simulated vs. axis) xs (C axis. siae fetv ouainszsad95% and sizes population effective Estimated (A) sequences. genetic simulated from rates reassortment and sizes population effective of Estimates ial,w td o reassort- how study we Finally, S1. Table Appendix, SI A otro upr o rerasrmn vns(y events reassortment true for support Posterior ) and B hw htw r bet orcl retrieve correctly to able are we that shows B A estimated reassortment rate estimated effective population size Γ 10 0.05 0.10 0.30 0.50 ). 3 5 4 iemdlt iuaesqecs(SI sequences simulate to model site .501 .00.50 0.30 0.10 0.05 true effectivetrue populationsize 51035 true reassortment rate reassortment true × nodrt etour test to order In 10 −3 x uain e ieadya tprw n 5 and row) (top year and site per mutations

posterior support C siae esotetrtsad9%C ( y CI 95% and rates reassortment Estimated (B) axis. 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 . . . . 00002550751. . . . . 10.0 7.5 5.0 2.5 0.0 10.0 7.5 5.0 2.5 0.0 10.0 7.5 5.0 2.5 0.0 xs ie h esotetdsac (x distance reassortment the given axis) 234 hn05 ete omlzdtedfeec ewe h lower the sup- between was difference more the that of normalized probability clade then posterior We each a 0.5. with for than approaches heights both by node ported of the interval computed indepen- (HPD) evolved first segment We each nor- that dently. a assuming across using size but population segments, once effective all shared and with reassortment prior coalescent with mal coalescent the of using sample results. to random the order biasing a in from generated sequences of were individual datasets prevent consisting different 10 datasets The sequences. 10 generated the dif- influenza we as in of well datasets (details as subtypes, compiled B A first influenza we human seasonal this, ferent do standard To the under model. independently coalescent Node evolved assump- segments the of all under that inferred tion Precision ages to Increases reassortment with coalescent Genomes Full Ages. from Inference Joint the segments. simulate the to of used sequences are correctly rates number evolutionary that lower the support when posterior methods decreases as the for uncertainty, phylogenetic satisfy expected account into to As take increases. harder trees: segments segment becomes the of that below clade requirement the same at the a direction exactly relative with same the time in is same reassort the decrease segments only This are all events segments. if reassortment same four two that or definition three our and by at segments driven look of we pairs when between we events drops when reassortment reassortment true at the particularly look of is only viruses This parent Distance ). (Reassortment two event the evo- on independent happened much lution how denotes distance reassortment distance. The reassortment increasing with particularly support, good relative the the if exactly is Summary). and event (Network same reassortment same the the at is segment each node of that subtree direction the below if segment same We each the reas- network. be of (simulated) to same events true reassortment the the two in considered exactly present observing as events of sortment probability the computed reassortment distanceinyears reassortment ete nlzdec fteesbape aaesonce datasets subsampled these of each analyzed then We with recovered are events reassortment 1C, Fig. in shown As we recovered, are events reassortment true well how test To ecmae h nenlnd gsifre sn the using inferred ages node internal the compared We .Fo aho these, of each From Methods). and Materials PNAS × 10 xs.Ifrneo esotetntok from networks reassortment of Inference axis). −4 | uy2,2020 21, July uain e ieadya bto row). (bottom year and site per mutations xs s iuae esotetrtson rates reassortment simulated vs. axis)

95% ihlow high density posterior highest | o.117 vol. | o 29 no. | 17105

EVOLUTION AB

10.0 1.00

0.55 0.65 0.57 0.49 rt 0.75 2.0

1.0 0.50

0.5 rior node suppo 0.25 poste

0.1 0.00

ratio of relative node height uncertainty of relative ratio p1918−like H1N1 p2009−like H1N1 H3N2 B p1918−like H1N1 p2009−like H1N1 H3N2 Influenza B CD 0.0035 ear independent segments y with reassortment 0.0030 10

0.0025 rate per site and rate

y 0.0020 5 effective population size effective 0.0015 evolutionar

p1918−like H1N1 p2009−like H1N1 H3N2 Influenza B p1918−like H1N1 p2009−like H1N1 H3N2 Influenza B

Fig. 2. Comparison of estimates between the coalescent with reassortment and assuming that each segment codes for an independent realization of the same coalescent process. (A) Comparison of the relative width of the 95% HPD interval of segment tree node heights using 10 random subsets. The vertical axis shows the distribution of ratios of the relative width of the 95% HPD intervals of the coalescent with reassortment over the coalescent assuming independent segment evolution. These values demonstrate a strong reduction in node height uncertainty when using the coalescent with reassortment over the coalescent with independent segments. (B) Comparison between the distribution of posterior clade support of segment trees found the MCC segment trees using 10 random subsets. (C) Comparison between the inferred effective population sizes. When assuming each segment is an independent realization of the same coalescent process, the effective population sizes are inferred to be much smaller and much more certain. (D) Comparison between the inferred clock rates. The coalescent with reassortment infers lower clock rates.

and upper bound of the 95% HPD interval, by the median node more representative of those between geographically more sepa- height estimate to get the relative width of the HPD interval rated lineages. These events therefore provide information about for each clade. As shown in Fig. 2A, using the coalescent with larger effective population sizes. Coalescent events across differ- reassortment reduces the uncertainty of node height estimates ent segments that occur close to the tips are less likely to have of segment tree nodes by 35% for p2009-like H1N1 up to over encountered reassortment events. In the coalescent with reas- 50% for influenza B. sortment, they are therefore interpreted as one event, whereas Next, we computed the distribution of clade supports for in the coalescent with independent segments, they are inter- clades represented in the maximum clade credibility (MCC) preted as eight. Coalescent events deeper in the tree are more trees (as described by ref. 17) inferred using the two approaches. likely between lineages that encountered reassortment events As shown in Fig. 2B, segment tree clades are far better resolved and are therefore more likely to provide independent informa- when using the coalescent with reassortment for all datasets. tion about the population process. The coalescent with indepen- We then compared the effective population sizes and evolu- dent segments assumes that all coalescent events provide the tionary rates inferred using the two approaches. The coalescent same amount of information about the population process and with reassortment infers higher effective population sizes for will consequently favor information about the population pro- all datasets (Fig. 2C). This also influences the inferred clock cess closer to the tips. This leads to differences in the estimated rates, since lower effective population sizes put stronger weight effective population sizes, which then leads to differences in the on shorter branches and therefore larger clock rates (Fig. 2C). estimated clock rates. We explain this discrepancy as follows. While we currently do We also compared the performance of the two approaches not account for population structure, the datasets we analyze by inferring tip dates (18). The tips (leaf nodes) are the only are in part shaped by population structure, such as geographic nodes in the trees or network for which we can actually pre- structure. Ignoring this structure likely affects the coalescent sume to know the true age, which is set by the sample collection with reassortment differently compared with assuming indepen- time. To compare the two approaches, we compiled 1,000 smaller dent segments. Coalescent events closer to the tips are more influenza A/H3N2 datasets, each composed of 20 genomes. Of likely between lineages that are, for example, geographically those datasets, 500 were randomly sampled from an random more closely related and can be assumed to occur rapidly and interval of 2 y between 1995 and 2019. The remaining 500 provide information about low effective population size values. datasets were assembled using a random sampling interval of Coalescent events deeper in the tree on the other hand are 10 y between 1995 and 2019. From each of these datasets, we

17106 | www.pnas.org/cgi/doi/10.1073/pnas.1918304117 Muller¨ et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 M in shown are the subsets on random viruses all influenza of networks MCC The ,iflez /22(C A/H2N2 influenza (B), A/H1N1 influenza like 3. Fig. interval time sampling true the in contains reassortment with cent the of times sampling genomes. the remaining on conditional time approaches, sampling both its inferred using and genome single a selected randomly atclryitrsigsneteesrissaemn common many share strains these since interesting particularly quite is estimates those reas- on of uncertainty large. rates the intermediate although show sortment, 2009 H1N1 and show H2N2 (p2009)-like 3). B (Fig. pandemic reassortment influenza of and rates inferred H1N1 lowest the (p1918)-like 1918 pandemic while different between greatly viruses. vary that influenza find rates We reassortment viruses. estimated these of the rates popu- reassortment effective and rates, sizes, evolutionary lation 1957 trees, the network, segment between reassortment of sampled the embedding inferred Ages, dataset jointly then A/H2N2 Node We influenza 1970. in of and an Precision described as Increases datasets well Genomes as same Full in the from listed include Inference These types S1. influenza Table human seasonal Influenza ferent Human Different across Viruses. Rates Reassortment Contrasting cases. of sam- interval) only true sampling in the time contains model pling coalescent segment hand, independent other the On S4). Fig. the Appendix, (SI interval sampling 10-y DE ABC le tal. et uller ¨ The ifrne ewe 11-ieHN,HN,adHN are H3N2 and H2N2, H1N1, p1918-like between Differences reassortment, of rates highest the shows A/H3N2 Influenza 91% 95% 95% ,p2009- (A), A/H1N1 influenza p1918-like of networks MCC viruses. influenza human different of rates reassortment and networks MCC of Estimates fcssfrte2ysmln nevladin and interval sampling 2-y the for cases of p1918-like Influenza A/H1N1 ecmae h esotetrtso h v dif- five the of rates reassortment the compared We P ftesml iepseir eeae ythe by generated posteriors time sample the of HPD P ftesml iepseir ne h coales- the under posteriors time sample the of HPD Influenza A/H2N2 Influenza B Influenza Influenza A/H2N2 68% x xs h esotetrtsaeprlnaeadyear. and lineage per are rates reassortment The axis. 2ysmln nevl and interval) sampling (2-y ,adiflez (E B influenza and (D), A/H3N2 influenza ), IAppendix SI IAppendix, SI p2009-like Influenza A/H1N1 89% 77% o the for ( F S5–S8. Figs. , (10-y Joint oee,silmasta h iue r ieyfi nuht be to enough fit having likely are while edge viruses future, here, any Unfit the the unfit. that be be into means to to y defined still is 2 however, network edge the least in fit edge at other a either persist every as define still that networks We edge descendants inferred “unfit.” network of or every distribution classify “fit” posterior we the so, Net- do from Reassortment To of events. Parts reassortment Fitter on works. Occur Events Reassortment transmitted. be and to fitness likely different less or a more average, be on therefore human have, seasonal could different that viruses the in influenza in rate viruses reassortment reassortants B observed Additionally, influenza higher case. the or to viruses and contribute events A A/H3N2 may coinfection influenza influenza other of with number of compared higher incidence likely and correspondingly higher time the appear. same the reassortants the which particular, at at rate prob- host In the different in same be difference the to a may to in lead therefore being (which size) viruses rate population of effective coinfection abilities the in to Differences linked factors. of ber H1N1 p1918-like the with compared strain. levels elevated H3N2 to highly rates a reassortment but similar from shows descended It are (5). that strain and viruses p1918-like NP, swine that classic (HA, from (PB1) segments derived three segment NS) and one H3N2 has the p2009- human hand, after from PB1. other years originates and the the HA on in seasonal pandemic, in became 2009 differ which only H1N1, H3N2 human like strain and H1N1 H2N2 p1918-like PB1 and the and (19), from NA, originate HA, A/H3N2 of influenza exception of the with segments All segments. uhvrain nrasrmn ae a edie yanum- a by driven be may rates reassortment in variations Such ee eso h nerdrasrmn ae ( rates reassortment inferred the show we Here, ) et ets fteei tesefc soitdwith associated effect fitness a is there if test we Next, .TeeMCntok r hw o fte1 admsubsets. random 10 the of 1 for shown are networks MCC These ). PNAS F reassortment rate 0.1 0.2 0.3 0.4 0.5 11−ieHN 20−ieHN 32HN InfluenzaB H2N2 H3N2 p2009−like H1N1 p1918−like H1N1 | Inferred reassortmentrates uy2,2020 21, July Influenza A/H3N2 | o.117 vol. y | xs o different for axis) o 29 no. | 17107

EVOLUTION A 0.8

fit 0.6 unfit

0.4

0.2 reassortment rate per lineage and year

0.0 p1918−like H1N1 p2009−like H1N1 H3N2 H2N2 Influenza B B

erence 0.50

0.25

0.00 per lineage and year −0.25 reassortment diff rate p1918−like H1N1 p2009−like H1N1 H3N2 H2N2 Influenza B

Fig. 4. Estimates of reassortment rates on fit and unfit edges. (A) Here, we show the number of reassortment events on fit and unfit edges of the networks divided by the total length of fit and unfit edges. Fit edges are defined as having a sampled descendant at least 2 y into the future. Every other edge is considered unfit. These rates are shown for different human influenza viruses on the x axis. The violin plots denote the distribution of theses ratios over the posterior distribution of networks. (B) Here, we show the difference between fit and unfit reassortment rates. Values above 0 indicate that reassortment events are more likely to occur on fitter, while values below 0 indicate that reassortment events are more likely to occur less fit edges.

transmitted. If reassortment events are beneficial, lineages that all nine datasets, we find the short-term reassortment rate to are the result of reassortment events should have a higher sur- be approximately 0.2 reassortment events per lineage per year, vival probability and are therefore more likely to persist further which is consistent with the reassortment rate estimates for unfit into the future. edges (SI Appendix, Fig. S12). To test if reassortment is beneficial, we calculated the num- Finally, we sought to rule out the possibility that these pat- ber of reassortment events on fit edges and on unfit edges for all terns are simply a property of our reassortment model. To do networks in the posterior distribution of the MCMC. We then this, we simulated networks under the coalescent with reassort- divided this number by the total length of fit respectively unfit ment with the reassortment rates and effective population sizes edges. As shown in Fig. 4, reassortment events occur at a higher fixed to the mean values estimated from the empirical data and rate on fit edges of the H3N2 and influenza B networks than they the network tip times fixed to those from the same data. In do on unfit edges. This suggests that reassortment is beneficial to these simulated networks, each edge has the same fitness. We the fitness of influenza A/H3N2 and influenza B viruses. then recomputed the same fit/unfit reassortment rate statistics For the other human influenza viruses, fitness benefits of reas- from these simulated networks (SI Appendix, Fig. S13) and found sortment are less pronounced. For p2009-like H1N1 and H2N2, that the patterns we observed in the empirical data completely the sampling time windows (both) or number of samples (H2N2) disappeared. This strongly suggests that the elevated rate of reas- was however rather small, and the results are likely driven by a sortment on fit lineages is not due to the particulars of our model lack of data. p1918-like H1N1 on the other hand had relatively but is instead a real effect. few reassortment events, overall driven by a low reassortment rate, and the results are likely driven by a lack of reassortment Conclusion events. These same general patterns hold when the definition We here present a Bayesian approach to jointly infer the reas- of what is a fit edge is changed to having descendants at least sortment network, the embedding of segment trees, and the cor- 4 or 6 y into the future (SI Appendix, Figs. S9 and S10). For responding evolutionary parameters. We show that this approach p2009-like H1N1, where the overall sampling interval is only 10 y, is able to retrieve reassortment rates, effective population size, the evidence for fitness benefits of reassortment events, however, and reassortment events from simulated data. decreases when using 4 and 6 y instead of 2 y. We have used this approach to show that there are large dif- We conducted further analysis to probe the robustness of our ferences in the rates of reassortment across different influenza result for H3N2, for which densely sampled sequences of long viruses and that reassortment events occur predominantly in time intervals are available. We analyzed two more influenza fitter parts of the corresponding reassortment networks. We pro- A/H3N2 datasets, one sampled between 1980 and 2005 and one pose that this is due to selection favoring lineages that have sampled from 2005 until the present. For both datasets, we find reassorted. Although we have deployed a relatively simple way higher rates of reassortment on fit edges (SI Appendix, Fig. S11). of defining which edges of a network are fit and which are not, Over shorter time frames, the effect of fitness is less visible since future approaches could more directly incorporate fitness mod- selection has not had enough time to filter out less fit strains. In els into these network-type approaches (20, 21). Additionally, turn, this means that if we look at datasets sampled over short our ability to infer which segments coreassort will allow us to times (for example, over 2 y), we should estimate reassortment further investigate if coreassortment between specific segments rates consistent with the estimated rates on unfit edges. To test drive these fitness benefits. this, we compiled 9 datasets, each with 100 to 200 sequences Even if one is not directly interested in reassortment patterns, sampled from 2 seasons between 2000 and 2018. Averaged over our approach allows phylogenetic and phylodynamic inferences

17108 | www.pnas.org/cgi/doi/10.1073/pnas.1918304117 Muller¨ et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 steuino hs are by carried common those recent of event, union lineage coalescent a the parent a is share Upon the coalesce. that to to segments are are ancestral they the lineages pop- likely two effective more the the likely i.e., smaller ancestor, more The the 1. by size, lineages 1. ulation network active by size of population lineages number effective the the network lineages to proportional extant network inversely two is of that between number occur the events coalescent increase predefined Coalescent for at simply happen case These and the events. times usually sampling on is condition As we sampling, approaches, events. events: possible reassortment three and involves It coalescent, time. in backward proceeds that M time in backward present the from past. denote coales- the network in lines to the done track Dashed As we descendants. network. approaches, sampled have the cent not through do differen- that colors segments lineages different segment different the three by track tiated we where network reassortment 5. Fig. of number total the define further We load.” “carrying segments the as subset this to define we so, do To reassortment. with process coalescent a Reassortment. with Coalescent The Methods and the of Materials variety pursue a will for we account processes. to future, recombination approaches other the related recombination In of of kind development reassortment. special a i.e., for process, accounting directly by ence the in account to size done 25). model (24, population the be structure extending population for with for to together models 23) remains skyline (22, dynamics development incorporating of of for direction lot with, independently. compared a completely precision evolve However, increase This segments and viruses. assuming bias reassorting instance, avoid for to sequences helps full-genome exploit to subset a and time sample at l recent extant most lineages the network before of past) the into (increasing ∈ le tal. et uller ¨ h olsetwt esoteti otnostm akvprocess Markov time continuous a is reassortment with coalescent The nsmay hsapoc losu opromntokinfer- network perform to us allows approach this summary, In L t are h ulsto eoesegments, genome of set full the carries xml esotetntok ee egv neapeo a of example an give we Here, network. reassortment Example C |S (l ) | ⊆ n h ubro neta segments ancestral of number the and S has sampleddescendents no sampleddescendents fteeaedrcl neta osmldvrss erefer We viruses. sampled to ancestral directly are these of C (p) l and = ee eitoueamdlt describe to model a introduce we Here, C t (l l Fg ) ahetn ewr lineage network extant Each 5). (Fig. 0 ) i.e.: , C ∪ present (l 0 past ). S p ngnrl oee,only however, general, In . flineages of |C Reassortment Event Coalescent Event Reassortment Event Coalescent Event Coalescent Event Coalescent Event (l )|. l and t N l ob h time the be to and e L t l n reduce and 0 steset the as tarate a at l 0 carries emnsis segments otetocr. ic ahsgetcossisprn deuiomyat uniformly either edge of parent probability its the chooses segment random, each Since occurs.) sortment that in events on lineage the events ancestral model, “reassortment” our In by are omission. omitted done this reassort- are for the is account modifying exactly this and to model. rate process models, our ment the recombination in from ancestry events with nonancestral this coalescent omitting over standard integrate explicitly with we As sample, our to tral follow to segment given a of lineage by lineages carried parent segments The 1. by lineage on event sortment flnae xatimdaeypirt hseet Ta is, [That event. this to prior [t immediately extant lineages of the factorize to us likelihoods: tree segment allows individual the analyses of terms phylogenetic in likelihood in network made Likelihood. assumption Network here.] us The concern not does and model the of likelihood marginal the both in present is segment corresponding the event when coalescent and a tree to segment corresponds a only network in the in event coalescent This h vn otiuini h rbblt est fti atclrpi of pair particular this of density time probability at the coalescing lineages is contribution event the the the If if event. on depending different is Contribution. Event define we where sampling and coalescent, reassortment, events): (i.e., events between times waiting with coalescent the under analyses. trees size rate segment sortment population effective of with embedding model, reassortment the pruning and standard work the Prior. Network using The computed be can (26). likelihoods algorithm tree These P hood joint the characterize to algorithm MCMC the a of parameters density. use posterior the we with models, together associated networks reassortment of Probability. inference Posterior the Calculating lineage on reassortments ρ(1 “observable” of rate effective The fetv ouainsz n e-ieg esotetrate. reassortment per-lineage parameters and The size population parameters. effective and molec- models associated vectors substitution their the ular and of alignments sequence elements multiple the segment-specific trees); segment the of Here, ( i −1 ~µ, ecncalculate can We sw r o neetdi h itr fsget htaentances- not are that segments of history the in interested not are we As esoteteet apna rate a at happen events Reassortment h em ntergthn ieo Eq. of side right-hand the on terms The − C θ , t , (l N f P i .Ec fteetrsi icse eo.[h denominator [The below. discussed is terms these of Each ρ). (l ).] 0 ( ~ D|N ). ersnstefl esotetntok(nldn h embedding the (including network reassortment full the represents )). P (N , ,tentokprior network ~µ the ), |θ i tpasterl ftete ro nsadr phylodynamic standard in prior tree the of role the plays It ρ. P heeti olseteetbtenlineage between event coalescent a is event th P , (C p ρ) (N 1 (p t = , i and h term The h vn otiuino the of contribution event The 1 P ~µ, C ob h ieo the of time the be to ) #events (N (l = θ Y sasge otesm aet Tu otu reas- true no (Thus parent. same the to assigned is ) i PNAS =1 |θ , p P C ∨ ∅ ρ| 2 ( , ~ D|N hsmasta h rbblt fteancestry the of probability the that means This . h sa odtoa needneo sites of independence conditional usual The yepesn ta h rdc fexponential of product the as it expressing by ρ) ~ D) P t l P i (event o osatszdcaecn oe,ti is this model, coalescent constant-sized a For . ilices h ubro ewr lineages network of number the increase will (event p = (p | P , 1 ~µ) (N 2 o xml,i 0.5. is example, for , uy2,2020 21, July P ) |θ ( i = = heeti olseto reassortment or coalescent a is event th ~ D|N p i |L P , i 1 Y s∈S ∅) |L eoe h rbblt ftenet- the of probability the denotes ρ) (N i or , i , nodrt efr on Bayesian joint perform to order In = θ , |ρ, ~µ)P l P θ , p r admyasge otetwo the to assigned randomly are (D 2 ρ ρ) , 2 θ ρ) × (N e ieg e nttm.Areas- A time. unit per lineage per s ,adtejitprmtrprior parameter joint the and ), = P l en hsna neta oall to ancestral as chosen being |T × i ( nwihteacsr fevery of ancestry the which in  |θ th ~ D) s 1 | θ 1 , P , 1 2 µ . vn and event nld h ewr likeli- network the include ρ)P (interval o.117 vol.  s ). |C i ( heeti h network the in event th (l ~µ, θ ~ D )| n e-ieg reas- per-lineage and θ = and , i ρ) f |L | (l i . L ). θ , ~µ o 29 no. i θ l and L , ersn the represent ob h set the be to ste simply then is i ρ), = l L ρ 1 t | r the are and for P 17109 ( ~ is D) C t [1] l (l ∈ 2 ) ,

EVOLUTION On the other hand, if the ith event is a reassortment event on lineage l, the we define when two coalescent or reassortment nodes are the same. We event contribution is the probability density of an (observable) reassortment define two coalescent nodes to be the same if 1) the parent edges of those event to occur on that lineage, i.e.: nodes carry the same segments and 2) if the subtree below each segment includes exactly the same clades between the two coalescent nodes. We " #  1 |C(l)| define two reassortment nodes to be the same if 1) both parent edges carry P(event |L , θ, ρ) = ρ 1 − 2 × . i i 2 the same segments in the same relative orientation and 2) if the subtree below each segment includes exactly the same clades between the two reas- As we condition on sampling events, their event contribution is always sortment events. A side effect of this definition is that the more segments simply 1. we include in the summary, the more likely two nodes will be considered different nodes. While the number of coalescent and reassortment nodes in the network Interval Contribution. The interval contribution P(intervali|Li, θ, ρ) is the probability of not observing any event in a given time interval. Three dif- changes over the course of the MCMC, the number of coalescent nodes on ferent types of events can happen in the coalescent with reassortment: the segment trees is constant. In order to avoid dimensionality issues when sampling, coalescent, and reassortment events. Since we condition on the summarizing, we first compute the frequency of observing each coalescent times of the sampling events, only coalescent and reassortment events node over the course of the MCMC. We then weight this frequency by the number of coalescent events on segment trees this coalescent node cor- are produced. Given the total rate Λi (probability per unit time) with which these occur in the interval immediately prior to event i, the interval responds to. We next choose the network that maximizes those weighted contribution can be written as: clade credibilities as the MCC network. In order to compute the posterior support of each reassortment event in the MCC network, we next compute the frequency of observing each P(intervali|Li, θ, ρ) = exp [−Λi(ti − ti−1)]. reassortment event in the MCC network during the MCMC. The total rate is the sum of the coalescence rate λ(c) and the reassortment Since we require the network to be rooted, we track segments even after i the root of a segment tree is reached. These patterns are however not sup- rate λ(r). The coalescence rate depends on the number of lineages extant at i ported by any genetic information and follow the prior distribution only. For a particular time and the effective population size in the usual way: the summary of networks, we therefore remove segments from edges if the root of a segment tree has been reached. Additionally, we remove reassort- (c) |Li| 1 λ = . ment loops, i.e., events that start on one edge and then directly reattach to i 2 θ the same edge. Since the support for individual events can greatly depend The rate of observable reassortment events is: on how many segments are analyzed, we have also implemented an option to only summarize over a subset of the segments, while ignoring others.    |C(l)|−1 (r) X 1 λ = ρ |Li| − . i 2 Reassortment Distance. For any reassortment event where segments a and b l∈Li take different paths, follow segment a until it reaches a network edge that carries segment b. We repeat this for any pair of segments that took differ- (r) Note that λi is generally less than the total rate of reassortment events in ent paths to get the common ancestor height between any two segments. (r) We then define the reassortment distance and a reassortment event as the this interval, which would be simply ρ|Li|, as, i.e., λi excludes reassortment events that produce lineages carrying no ancestral segments. minimal difference between the height of the reassortment event and the common ancestor height of any pair of segments. This seeks to denote for The Parameter Priors. The term P(~µ, θ, ρ) denotes the joint prior distribution how long segments in the two parent viruses at the reassortment event of all model parameters. We factorize the prior distribution by writing it evolved independently. as the product of the individual parameter priors P(~µ), P(θ), and P(ρ). This asserts that our prior information on any one of these model parameters is Implementation. We implemented the MCMC framework for the coalescent independent of the prior information we have for the others. with reassortment as a BEAST2 package called CoalRe. This package includes the classes to do simulation and inference under the coalescent with reas- An MCMC Algorithm for Reassortment Networks. In order to perform MCMC sortment. The implementation is such that the tree likelihood calculations sampling of network and the embedding of segment trees within these net- are separate from the network framework, which allows using the various works, we introduce several MCMC operators (i.e., proposal distributions). different site and clock models implemented in BEAST2. Additionally, it can These operators often have analogs in operators used to explore different be used with other Bayesian approaches such as nested sampling or parallel phylogenetic trees. Here, we briefly summarize each of these operators, tempering. Further, model comparison as well as integration over evolution- providing the complete details in the supplement. ary models can be performed. The package can be downloaded by using Add/remove operator. The add/remove operator adds and removes reas- the package manager in BEAUti. The source code for the software pack- sortment events. An extension of the subtree prune and regraft move for age can be found here: https://github.com/nicfel/CoalRe. A tutorial on how networks (27) to jointly operate on segment trees as well. to set up an analysis using the coalescent with reassortment is available at Segment diversion operator. The segment diversion operator changes the https://taming-the-beast.org/tutorials/Reassortment-Tutorial (28). Networks path segments take at reassortment events. are logged in the extended Newick format (29) and can be visualized using, Exchange operator. The exchange operator changes the attachment of for example, https://icytree.org/ (30). Additionally, we provide python scripts edges in the network while keeping the network length constant. to plot networks based on https://github.com/evogytis/baltic. Subnetwork slide operator. The subnetwork slide operator changes the height of nodes in the network while allowing to change the topology. Simulations. In the simulation study, we simulated reassortment networks, Scale operator. The scale operator scales the heights of individual nodes or using the structured coalescent with randomly drawn effective population the whole network without changing the network topology. sizes and reassortment rates. To do so, we first sampled random effective Gibbs operator. The Gibbs operator efficiently samples any part of the net- population sizes from a log normal distribution (mean: 5; SD: 0.5) and work that is older than the segment tree roots and is thus not informed by reassortment rates from another log normal distribution (mean: 0.2; SD: any genetic data. 0.5). An average effective population size of 5 is similar to the estimated Empty segment preoperator. The empty segment preoperator augments effective population sizes of seasonal human influenza viruses (Fig. 2C), the network with edges that do not carry any segments for the duration of and an average reassortment rate of 0.2 per lineage per year is similar a move, to allow larger jumps in network space. to the later estimated reassortment rates for seasonal human influenza We validate the implementation of the coalescent with reassortment viruses (Fig. 3). We then randomly sampled the sampling times of 100 network prior as well as all operators in the supplement. taxa, each with 4 segments, from a uniform distribution between 0 and 20. We simulated reassortment networks alongside the embedding of the Summarizing Reassortment Networks. To summarize a distribution of net- segment trees using these parameters. For each segment tree, we next works, we use a similar strategy to the MCC strategy (as described by ref. simulated genetic sequences by using the Jukes–Cantor substitution model 17) used by phylogenetic inference software such as BEAST to summarize (31) with an evolutionary rate of 5 × 10−3 per site and year. We used the distributions of trees. We first compute all unique coalescent and reassort- Jukes–Cantor model in order to limit the uncertainty coming from the ment nodes that were encountered during the MCMC. To do so requires that substitution model, allowing us to test the performance of the coalescent

17110 | www.pnas.org/cgi/doi/10.1073/pnas.1918304117 Muller¨ et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 0 .M eVen,Tnlgasaemsedn o iuleauto fte congru- tree of evaluation visual for misleading are Tanglegrams Vienne, De M. D. 10. 2 .Ddlt .Lwo,A arig .Fls,Ifrneo oooosrecombination homologous of Inference Falush, D. Daarling, A. Lawson, D. Didelot, X. 12. Nelson I. M. 11. 6 .W loqit .A uhr,Uiyn etcladnnetcleouin A evolution: nonvertical and vertical Unifying Suchard, recombination. with A. M. coalescent of Bloomquist, the inference W. Approximating E. Genome-wide Cardin, 16. Siepel, J. N. A. McVean, Gronau, T. A. I. G. Hubisz, 15. J. M. Rasmussen, D. M. 14. Vaughan G. T. 13. 7 .Hld .R ocar,Loigfrtesi h oet umr refo posterior from tree Summary forest: the in trees for Looking Bouckaert, R. R. Heled, J. 17. 1 .A ee,C .Rsel .I hamn rdcigeouinfo h hp of shape the from evolution Predicting Shraiman, I. B. Russell, A. C. Neher, A. R. 21. and time resolve to genomes full vs single of ability The Bedford, T. Dudas, G. 18. 2 .J rmod .Rmat .Saio .G yu,Bysa olsetinference coalescent Bayesian Pybus, G. O. Shapiro, B. Rambaut, A. Drummond, J. A. 22. human the of origin L the On M. Rott, Łuksza, R. M. Hoyningen, 20. Von V. Rohde, W. Scholtissek, C. 19. h rttocdnpstosadtetidhvn ifrn ae 3) This (39). rates different having third the and positions + codon HKY two assumed an We first under 36). the evolved (35, have tempering to parallel sequences using the (34) all 2.5.2 aligned BEAST in We sortment distribution. containing temporal set even final (33). an 3.8.31 a Muscle with using produce samples segments to 500 subsampling least used at we dataset, A/H2N2 5 of rate evolutionary slower the of had effect segments the all the considered where study additionally scenario we to information, genetic of order of amount consisted In the reducing segment nucleotides. evolving Each independently conditions. 1,000 idealized under reassortment with h aesqecsa nrf 2 eotie hs eune from sequences used these we obtained We dataset, oridata/gisaid at 32. found A/H2N2 be ref. influenza can github.com/nicfel/Reassortment-Material/blob/master/Applications/H2N2/ table in the acknowledgments as (https://gisaid.org; For GISAID sequences Influenza B). same the influenza the H1N1, from and seasonal downloaded and H3N2, (pandemic data (https://www.fludb.org) sequence Database Research using viruses influenza Analysis. and Availability Data Sequence (5 higher are 2C (Fig. rates evolutionary two (5 These lower year. and site per M .L u .J yet .J .Bon esotetpten fainiflez virus influenza avian of patterns Reassortment Brown, L. J. A. Lycett, J. S. Lu, lin- L. B 9. influenza between Reassortment Rambaut, A. Lycett, S. Bedford, T. Dudas, G. 8. Westgeest B. K. 7. Guan Y. 6. Smith J. G. 5. Smith are viruses J. (H1N1) G. a influenza segmented 4. human in Recent Reassortment Palese, P. Patton, Desselberger, T. U. J. Nakajima, Turner, K. E. P. 3. Nelson, I. M. McDonald, M. S. reassortment. virus 2. A Influenza Lowen, C. A. Steel, J. 1. le tal. et uller ¨ ete nlzdeeyiflez iu ne h olsetwt reas- with coalescent the under virus influenza every analyzed then We ence. subtypes. different among segments internal complex. (2014). PB1–PB2–HA 162–172 coadapted 32, a of emergence the and eages 2011. and 1968 (2014). between circulating viruses (H3N2) a influenza (2010). epidemic. a influenza U.S.A. Sci. Acad. 1950. in isolated strains to genetically related closely outcomes. and Mechanisms viruses: RNA (2014). 377–401 385, data. sequences. whole-genome using bacteria in 1918. since virus a influenza tcatcAGbsdframework. ARG-based stochastic Sci. Biol. B Lond. Soc. R. Trans. Philos. graphs. recombination ancestral samples. eelgcltrees. genealogical (2014). analysis. outbreak in space (2005). sequences. molecular from dynamics population past of H3N2. and H2N2 subtypes virus influenza ). o.Bo.Evol. Biol. Mol. × h mrec fpnei nunaviruses. influenza pandemic of emergence The al., et M vl Biol. Evol. BMC 10 rgn n vltoaygnmc fte20 wn-rgnH1N1 swine-origin 2009 the of genomics evolutionary and Origins al., et aigteeegneo admciflez viruses. influenza pandemic of emergence the Dating al., et −4 utperasrmn vnsi h vltoayhsoyo H1N1 of history evolutionary the in events reassortment Multiple al., et acknowledge nern neta eobnto rpsfo atra genomic bacterial from graphs recombination ancestral Inferring al., et 5–7 (2017). 857–870 205, hntoeetmtdfrsaoa ua nunaviruses influenza human seasonal for estimated those than ) eoeieaayi frasrmn n vlto fhuman of evolution and reassortment of analysis Genomewide al., et si,Apeitv tesmdlfrinfluenza. for model fitness predictive A assig, 10–11 (2009). 11709–11712 106, ¨ Elife 7–7 (2018). 174–176 36, Nature 058(2014). e03568 3, 2 (2013). 221 13, M vl Biol. Evol. BMC LSPathog. PLoS 1212 (2009). 1122–1125 459, .Fraldtst u h influenza the but datasets all For table.xls). LSGenet. PLoS yt Biol. Syst. 3719 (2005). 1387–1393 360, a.Rv Microbiol. Rev. Nat. 1002(2008). e1000012 4, 3 (2019). 232 19, ecmie aaesfo several from datasets compiled We 74 (2010). 27–41 59, Genetics 1032(2014). e1004342 10, M vl Biol. Evol. BMC 32 (1978). 13–20 87, Γ Nature 4514 (2010). 1435–1449 186, ur o.Mcoil Immunol. Microbiol. Top. Curr. 4 o.Bo.Evol. Biol. Mol. oe 3,3) allowing 38), (37, model 3–3 (1978). 334–339 274, 6(2014). 16 14, 4–6 (2016). 448–460 14, .Virol. J. rti Cell Protein Nature × o.Bo.Evol. Biol. Mol. 2844–2857 88, 1185–1192 22, 10 57–61 507, rc Natl. Proc. −3 × https:// 9–13 1, and ) 10 −4 0 .Bahl J. 40. models substitution appropriate Choosing Drummond, J. A. Rambaut, when A. sequences Shapiro, DNA B. from 39. phylogeny of estimation molecular Maximum-likelihood a by Yang, splitting Z. human-ape the 38. of (9 Dating Yano, bioRxiv:10.1101/603514 T. Kishino, 2. H. BEAST Hasegawa, in M. MCMC 37. Coupled Bouckaert, R. Mueller, F. N. 36. coupled Metropolis Parallel Ronquist, F. Huelsenbeck, P. J. Dwarkadas, high S. Altekar, and G. accuracy 35. high with Bouckaert alignment R. sequence 34. Multiple Muscle: Edgar, C. R. 33. in molecules” protein Joseph of U. “Evolution 32. Cantor, R. C. Jukes, H. and T. trees phylogenetic 31. for visualization browser-based Rapid IcyTree: Vaughan, G. T. 30. 3 .Pumr .Bs,K ols .Vns OA ovrec igoi n output and diagnosis Convergence CODA: Vines, K. iterative Cowles, K. of Best, convergence N. monitoring Plummer, M. for methods 43. General Gelman, A. Brooks, P. S. 42. Bedford T. 41. Rossell F. Cardona, and G. prune subtree 29. Generalising space? in likelihood Barido-Sottani Lost J. maximum Semple, 28. C. A Linz, sequences: S. DNA Bordewich, M. from 27. trees Evolutionary Felsenstein, J. 26. M F. N. 25. K D. Vaughan, G. T. 24. 3 .N ii,E .Bomus,M .Scad mohsyietruharuhsky- rough a through skyride Smooth Suchard, A. M. Bloomquist, W. E. Minin, N. V. 23. ailHsnfrueu icsin nhwt umrz ewrs We the Foundation on Science networks. National feedback Swiss CR32I3-166258. summarize helpful by Grant their funded to are for T.S. how reviewers and N.F.M. anonymous manuscript. on two discussions the thank useful also for Huson Daniel at available ACKNOWLEDGMENTS. are results, plot license files and log analyze relevant to as scripts such allowing the https://github.com/nicfel/Reassortment-Material. as data, well other still with as All runs, thus data. comply BEAST2 these and of on to intact, based results order numbers of accession in reproduction the files leaving XML regulations, char- the sequence the from removed we A/H2N2 acters GISAID, influenza language from the obtained For were markup online. that available are sequences extensible datasets the run full to files (XML) the (https://www.fludb.org), Database conver- S14 ). Fig. assessed only , Appendix We (SI (42) which known. (43) factors coda scale-reduction was for potential using taken and computed sequences sizes was sample all sample effective using the for gence which times in addi- year we sampling dataset, the the A/H2N2 rates influenza estimated reassortment the the For tionally as sizes. well population as effective trees, and segments of embedding and works, 41). and 40 refs. example, for seasonal (see, of analyses viruses phylogenetic influenza previous human in used regularly been has model o iu ye ihsqecsdwlae rmteIflez Research Influenza the from downloaded sequences with types virus For net- reassortment the rates, evolutionary all estimated jointly then We o h hlgntcaayi fpoencdn sequences. protein-coding of (2006). analysis phylogenetic the for sites. over differ rates substitution DNA. mitochondrial of clock 2019). April inference. phylogenetic Bayesian (2004). for 407–415 Carlo Monte chain Markov analysis. evolutionary throughput. (2015). 21–132. 2442–2447 pp. 89, 1969), York, New Press, (Academic Ed. Munro, E. R. Metabolism, networks. nlssfrMCMC. for analysis simulations. drift. antigenic humans. in virus (2011). H3N2 A influenza networks. phylogenetic of representation 2. BEAST for networks. phylogenetic of spaces to regraft approach. tions. coalescent. structured the (2014). under inference Bayesian (2008). 1459–1471 ie aeincaecn-ae neec fpplto dynamics. population of inference coalescent-based Bayesian line: o.Bo.Evol. Biol. Mol. le,D .Rsusn .Salr h tutrdcaecn n t approxima- its and coalescent structured The Stadler, T. Rasmussen, A. D. uller, ¨ eprlysrcue eaouaindnmc n essec of persistence and dynamics metapopulation structured Temporally al., et dpaino admcHN nunaavrssi humans. in viruses a influenza H2N2 pandemic of Adaptation al., et Bioinformatics .Ml Evol. Mol. J. lblcruainpten fsaoa nunavrssvr with vary viruses influenza seasonal of patterns circulation Global al., et .Cmu.GahStat. Graph Comput. J. uli cd Res. Acids Nucleic yt Biol. Syst. ES .:A dacdsfwr ltomfrBayesian for platform software advanced An 2.5: BEAST al., et Nature aigteBAT omnt ecigmtra resource material teaching community A BEAST? the Taming al., et .News R. 9028 (2017). 2970–2981 34, LSCmu.Biol. Comput. PLoS het .Ppna .Wlh .J rmod Efficient Drummond, J. A. Welch, D. Popinga, A. uhnert, ¨ ,G aine xeddNwc:I stm o standard a for time is It Newick: Extended Valiente, G. o, 7–7 (2017). 170–174 67, 1–2 (2015). 217–220 523, ´ 6–7 (1981). 368–376 17, PNAS etakAee .Dumn,Smn iz and Linz, Simone Drummond, J. Alexei thank We 3229 (2017). 2392–2394 33, –1(2006). 7–11 6, .Ml Evol. Mol. J. 7219 (2004). 1792–1797 32, | uy2,2020 21, July o.Bo.Evol. Biol. Mol. 3–5 (1998). 434–455 7, rc al cd c.U.S.A. Sci. Acad. Natl. Proc. 6–7 (1985). 160–174 22, 1060(2019). e1006650 15, M Bioinf. BMC .Ter Biol. Theor. J. 3610 (1993). 1396–1401 10, | o.117 vol. Bioinformatics 3 (2008). 532 9, –2(2017). 1–12 423, o.Bo.Evol. Biol. Mol. | amla Protein Mammalian o.Bo.Evol. Biol. Mol. Bioinformatics o 29 no. 19359–19364 108, 2272–2279 30, | 7–9 23, .Virol. J. 17111 20, 25,

EVOLUTION