<<

Journal of General Virology (2014), 95, 1969–1982 DOI 10.1099/vir.0.065227-0

Dating the origin of the genus in the light of Beringian biogeography

John H.-O. Pettersson and Omar Fiz-Palacios

Correspondence Department of Systematic Biology, Evolutionary Biology Centre, Uppsala University, John H.-O. Pettersson Uppsala, Sweden [email protected] or [email protected] The genus Flavivirus includes some of the most important human viral , and its members are found in all parts of the populated world. The temporal origin of diversification of the genus has long been debated due to the inherent problems with dating deep RNA evolution. A generally accepted hypothesis suggests that Flavivirus emerged within the last 10 000 years. However, it has been argued that the -borne Powassan flavivirus was introduced into North America some time between the opening and closing of the Beringian land bridge that connected Asia and North America 15 000–11 000 years ago, indicating an even older origin for Flavivirus. To determine the temporal origin of Flavivirus, we performed Bayesian relaxed molecular clock dating on a dataset with high coverage of the presently available Flavivirus diversity by combining tip date calibrations and internal node calibration, based on the and Beringian land bridge biogeographical event. Our analysis suggested that Flavivirus originated ~85 000 (64 000–110 000) or 120 000 (87 000–159 000) years ago, depending on the circumscription of the genus. This is significantly older than estimated previously. In light of our results, we propose that it is likely that modern humans came in contact with several members of the genus Flavivirus much earlier than suggested previously, and that it is possible that the spread of several Received 28 February 2014 coincided with, and was facilitated by, the migration and population expansion of Accepted 4 June 2014 modern humans out of Africa.

INTRODUCTION not exclusively, vectored by either or mosqui- toes. The second group, tick-borne flaviviruses (TBFs), the The genus Flavivirus, within the family , currently only monophyletic group of Flavivirus, are vectored by consists of .70 virus species distributed all over the globe ixodid , and are further subdivided into a and (Gould et al., 2003; Gubler et al., 2007; Lindenbach et al., a seabird group based on their specificity (Grard et al., 2007). It includes numerous of major human health 2007; Gritsun et al., 2003). The no-known vector flaviviruses concern, such as (DENV), Japanese enceph- (NKVFs) are associated with either or and alitis virus, and the type species for infect without an apparent vector flaviviruses, virus (flavus:‘yellow’),givingthe transmitting the viruses (Porterfield, 1980). The majority of family and the genus its name (Gould & Solomon, 2008; Mackenzie et al., 2004). The of flaviviruses consists the known flaviviruses are zoonotic, i.e. pathogenic viruses of a positive-sense ssRNA molecule of ~11 kbp. One single that can be transmitted between humans and other animals. ORF encodes three structural proteins (, pre-mem- However, the fourth group, the insect-specific flaviviruses brane and envelope) and seven non-structural proteins (NS1, (ISFs; flaviviruses that are only capable of replicating in NS2A, NS2B, NS3, NS4A, NS4B and NS5) flanked by insect cells) (Kuno, 2007), is most likely an undersampled untranslated regions (Chambers et al., 1990; Lindenbach & and very diverse group (Cook et al., 2012). Rice, 2003). The idea of a molecular clock has been used to address The genus Flavivirus has been divided into four main many hypotheses in the study of emerging viral diseases, groups based on ecological characteristics, molecular phylo- especially for diseases caused by RNA viruses (Bromham & genetic analyses, vector specificity and virus performance in Penny, 2003), e.g. to reject the hypothesis of a potential host cells (Gaunt et al., 2001; Gould et al., 2001; Kuno, spread of human immunodeficiency virus (HIV) in the 2007). Most of the recognized flaviviruses belong to the 1950s through a contaminated polio vaccine (Korber et al., -borne flaviviruses (MBFs) that are commonly, but 2000). However, dating the origin of viruses is a complex and challenging task. For viruses with high rates of evolu- Three supplementary figures and three supplementary tables are tion, the original phylogenetic signal is difficult to deduce available with the online version of this paper. even with complex evolutionary models because the signal

065227 G 2014 The Authors Printed in Great Britain 1969 J. H.-O Pettersson and O. Fiz-Palacios diminishes with time due to repeated substitutions at the and Russian POWV lineages in North America is consistent same site (Holmes, 2003a). The extremely high substitution with the idea that the colonization of POWV into North rate observed for RNA viruses is mostly due to their error- America happened under a single event by a land route, also prone replication and repair machinery (Bromham & supported by the lack of evidence for continuous movement Penny, 2003; Duffy et al., 2008). Unequal substitution rates of POWV and TBEV between Russia and North America by among lineages of the same genus further adds to the either seabirds or mosquitoes (Heinze et al., 2012). complexity of accurately estimating the time to the most Therefore, it is unlikely that POWV emerged in North recent common ancestor (tMRCA) (Duffy et al., 2008; America before the land bridge became accessible again after Holmes, 2003a; Sanjua´n, 2012). the last glacial maximum. Consequentially, it is improbable that POWV emerged after the Bering Strait was formed. The Reconstruction of divergence times requires a temporal most probable explanation, given the current knowledge and reference to convert branch lengths of a present distribution of the POWV lineages, is that during the into time. This temporal reference is usually in the form of course of TBF evolution POWV diverged from other TBF a fossil or biogeographical event (i.e. internal node calibra- lineages in a single introduction event by the land route tion), as in most eukaryote studies, or isolation dates (i.e. during the presence of the Beringian land bridge 15 000– tip date calibration), as with bacteria and virus studies. 11 000 years ago (Heinze et al., 2012). From the biogeographical events, fossils or isolation dates, the ages of internal nodes can be estimated. The match POWV could have been introduced into North America between virus and host phylogenies has led to the sugges- by either humans or other that colonized the tion that some virus lineages originated millions of years Americas during the existence of the Beringian land ago. However, this has been strongly rejected by molecular bridge via one of two routes: the interior route or coastal clock studies indicating a virus origin of only a few thousand route. (i) The interior route became accessible when the years ago (Holmes, 2003a; Worobey et al., 2010). This process of deglaciation created a corridor between the controversy has sparked a debate regarding the validity and Laurentide and Cordilleran glaciers. The corridor did not utility of molecular clock reconstructions using tip dates exist during the last glacial maximum. The corridor versus internal (deep) calibration to study and infer the started to open ~15 000 years ago (Dixon, 2013; Dyke, temporal scale of virus evolution (Sharp & Simmonds, 2004), and became habitable for humans and other 2011). mammals ~13 500 years ago (Dixon, 2013). (ii) Towards the end of the last glacial maximum, the glaciers border- Previous studies based on molecular clocks have suggested ing to the south-west coast of Alaska through western that the Flavivirus clade originated ~10 000 years ago in Canada started to melt. Around 16 000 years ago, the Africa (Gould et al., 2001) from a non-vectored mam- glaciers had receded to the extent that the coastal habitats malian virus ancestor (Gould et al., 2003), followed by the could support human populations (Dixon, 2013 and radiation of TBFs and MBFs during the last 5000 and 3000 references therein). The coastal route is also likely to be years, respectively (Zanotto et al., 1996), and ISFs ~3000 the route by which humans first entered the Americas years ago (Crochu et al., 2004). Other studies have also (Achilli et al., 2013; Bodner et al., 2012; Fagundes et al., dated small clades of individual flaviviruses using tip dates 2008; Goebel et al., 2008; Schurr, 2004). as calibration points. These estimates are broadly congruent with the notion of a Flavivirus origin within the last 10 000 Presently, there are coding nucleotide available from years (e.g. Dunham & Holmes 2007; May et al., 2011; Pan .70 unique strains of the genus Flavivirus, including viruses et al., 2011; Twiddy et al., 2003). However, recent analyses recognized by the International Committee on Taxonomy of on TBFs, based on complete genomes analysed with a Viruses (http://www.ictvonline.org/virusTaxonomy.asp?version= relaxed molecular clock and Bayesian methods, have esti- 2013). Here, we combine this information together with mated the age for the TBF clade to be at least 16 000 years relaxed molecular clock methods implementing a Bayesian (Heinze et al., 2012), indicating that the age for the approach, again to estimate and shed some light on diver- Flavivirus genus as a whole should be significantly older gence times of the genus Flavivirus and groups within. Aided than suggested previously. by biogeographical calibration, we report the first study, to the best of our knowledge, estimating the age and rates of Within the TBFs, a close relationship between the bio- substitution of the genus Flavivirus as a whole, including its geography of the Beringian land bridge, the historical major groups, using complete coding nucleotide genomes. landmass that connected north-east Siberia with Alaska and We explore multiple internal calibration points using north-west Canada (Hulte´n, 1937), and the evolution of the Bayesian methods in combination with a relaxed molecular Powassan virus (POWV) clade, the only TBFs present in clock to reconstruct divergence times. We compare our North America, has been pointed out recently (Ebel, 2010; results with previous studies reconstructing divergence Heinze et al., 2012). Geological studies estimate that the land times for different clades within the genus Flavivirus.We bridge was open for mammal land migration before the end also discuss the incongruence between virus-dating studies of the last glaciation between 15 000 and 11 000 years ago using molecular clocks and RNA virus evolution. Finally, we (Elias, 2001; Elias et al., 1996; Kelly, 2003; Mandryk et al., propose a temporal and biogeographical scenario for the 2001). The absence of tick-borne virus (TBEV) evolution of the genus Flavivirus.

1970 Journal of General Virology 95 Origin of the genus Flavivirus

RESULTS AND DISCUSSION previously been suggested to be the sister group to the ISFa clade (node E) based on complete coding amino acid Phylogenetic reconstruction of the genus sequence analysis (Cook et al., 2012). However, in the Flavivirus present study, TABV appears to have diverged before the ISF clade (node B, 1.0 pp, based on Bayesian nucleotide The phylogeny of Flavivirus and its major groups was a and amino acid phylogenetic analyses), which instead is inferred using a genus-wide sampling approach includ- sister to the TBFs, NKVFs and MBFs (Figs 1, S1 and S2). ing 86 complete coding flavivirus genomes with known Our results are consistent with the topology of previous isolation dates (Table 1) analysed by both Bayesian and complete genome and gene-based phylogenetic studies (de maximum-likelihood methodologies. The overall tree Lamballerie et al., 2002; Hoshino et al., 2009; Lobo et al., topology from all of the different analyses was in agree- 2009). ment with previous published phylogenies based on the NS3 gene (Billoir et al., 2000; Cook & Holmes, 2006; Grard et al., 2007), multiple genes (Medeiros et al.,2007) Divergence times of the genus Flavivirus and complete genomes (Cook & Holmes, 2006; Cook et al., Viruses, and RNA viruses in particular, provide an excel- 2012; Grard et al., 2007, 2010; Kuno & Chang, 2006; Lobo et al., 2009). Our results were consistent with what is lent opportunity to study evolutionary change because of commonly referred to as an NS3-like topology (Fig. 1). their relatively high rate of substitution that allows for evolution to be observed within human timescales (Bromham Two datasets were analysed, a nucleotide and an amino & Penny, 2003; Drummond et al., 2003; Duffy et al., 2008; acid sequence alignment, using Bayesian and maximum- Holmes, 2003a). Many studies have used molecular clocks to likelihood approaches. In our Bayesian phylogenetic anal- date recent divergences (hundred-year scale; see: Kumar et al., ysis, the nucleotide and amino acid 50 % majority rule 2010; Mehla et al., 2009; Mohammed et al., 2011; Patil et al., consensus trees from MrBayes (Figs S1 and S2, available in 2011; Ramı´rez et al., 2010; Weidmann et al., 2013), but here the online Supplementary Material) and the maximum we show the utility of using internal calibration that allows for clade credibility tree from BEAST (Figs 1 and S3), all rooted the reconstruction of deeper divergence times. with virus 1 (BVDV-1), produced topologies with strong support for the focal nodes (A–O). The divergence times under different calibration schemes The Bayesian trees were also supported by the topology of are summarized in Tables 2 and S1. All BEAST runs were the maximum-likelihood trees, although the sister rela- performed with an uncorrelated log-normal relaxed tionships between Tamana virus (TABV) and all other molecular clock, as the null hypothesis of a strict global groups were unresolved in the maximum-likelihood trees substitution rate was rejected by the maximum-likelihood (data not shown, available upon request). molecular clock test (data not shown). There was maximum posterior probability (1.0 pp) for the four major clades roughly corresponding to the conven- Divergence times of the genus Flavivirus: tional grouping of Flavivirus (Gould et al., 2001, 2003); calibration schemes compared the TBF clade (node M), the NKVF clade (NKVFa,node To examine the temporal origin of Flavivirus, we performed N), a MBF-dominated group (MBFdom, node F) and the several analyses in BEAST as outlined in Methods. Using only ISF clade (ISFa, node E), sister to the other three groups tip date calibration resulted in 12–14 % older mean diver- (Fig. 1). The MBFdom group (node F) includes (i) the gence times for nodes A–O than using the internal Beringian Aedes MBF clade (MBFAedes,nodeH),(ii)aNKVFclade calibration (node O calibrated to 15 000–11 000 years ago) (NKVFb), sister to the MBFAedes clade, (iii) a grade of together with tip dates. For example, the age of the root likely insect-specific flaviviruses consisting of Chaoyang (node A) varied between 265 000 [95 % highest posterior virus, Donggang virus, Lammi virus and Nounane virus density (HPD): 25 600–2 768 000] and 230 500 (156 100– (nodes I and J) (Huhtamo et al., 2009; Kolodziejek et al., 322 700) years ago, respectively (Fig. 1, Table 2). When using 2013; Lee et al., 2013) and (iv) the Culex MBF clade TBF internal calibration (node M, calibrated to 16 100– (MBFCulex, node K). The present study supports the fact 44 929 years ago) based on the results from Heinze et al. that the majority of the members of the genus Flavivirus (2012) using only tip dates as calibration, 17–18 % younger show a very strong pattern of host–vector association mean divergence times were recovered for nodes A–O com- (Gould et al., 2001, 2003), as all groups showed distinct pared with using tip dates only (Table 2). Both Beringian clustering depending on vector association. It is possible and TBF calibration recovered similar mean divergence that the positioning of the NKVF clade is a consequence b times, although TBF calibration recovered 4–5 % younger of secondary loss of vector capability (Gould et al.,2001, mean estimates for nodes A–O than Beringian calibration 2003). Likewise, it is possible that the grade of ISFs (nodes (Table 2). Also, allowing for a wider range of the Beringian I and J) within the MBF group is also due to secondary dom calibration, i.e. 16 000–10 000 years, following Dixon (2013), loss of vector capability. TBF recovered similar ages for the split of POWV and its The main disagreement between our findings and other sister (node O) at 12 500 (10 000–15 600) years ago and for studies relates to the position of TABV. TABV has the root (node A) at 228 500 (142 900–332 500) years ago as http://vir.sgmjournals.org 1971 1972 Table 1. Flaviviruses used in this study Fiz-Palacios O. and Pettersson H.-O J.

Vector groups include insect-specific (IS), mosquito-borne (MB), no-known vector (NKV) and tick-borne (TB) flaviviruses.

Virus Virus abbreviation Strain Source of isolate Vector Year GenBank group accession no.

Aedes virus AEFV AEFV_SPFLD_MO_2011_MP6 Aedes albopictus IS 2011 KC181923.1 Alfuy virus ALFV MRM3929 Centropus phasianius MB 1966 AY898809.1 AHFV 1176 Homo sapiens TB 1995 AF331718.1 Apoi virus APOIV ApMAR-Kitaoka Apodemus spp. NKV 1954 AF160193.1 Bagaza virus BAGV Spain H/2010 Alectoris rufa MB 2010 HQ644143.1 Bagaza virus BAGV DakAr-B209 Mixed Culex MB 1966 AY632545.2 Baiyangdian virus BYDV BYD-1 Duck MB 2010 JF312912.1 Banzi virus BANV SAH-336 Homo sapiens, Culex, Mansonia MB 1956 DQ859056.1 africana Bouboui virus BOUV DAK-AR-B490 Anopheles paludis MB 1967 DQ859057.1 Bovine viral diarrhoea virus 1 BVDV-1 NADL 1963 NC_001461.1 Bussuquara virus BSQV BeAn-4073 Alouatta beelzebul MB 1956 NC_009026.2 Cell fusing agent virus CFAV Aedes spp., Aedes egypti IS 1975 NC_001564.1 Chaoyang virus CHAOV ROK144 Aedes vexans nipponii IS 2003 JQ068102.1 Culex flavivirus CxFV Tokyo Culex pipiens IS 2003 AB262759.2 DTV CTB30-CT95 Ixodes dammini TB 1995 NC_003218.1 Dengue virus type 1 DENV-1 45AZ5 Homo sapiens MB 1974 U88536.1 Dengue virus type 2 DENV-2 New-Guinea-C Homo sapiens MB 1944 AF038403.1 Dengue virus type 3 DENV-3 H87 Homo sapiens MB 1956 M93130.1 Dengue virus type 4 DENV-4 H241 Homo sapiens MB 1956 M14931.2 Donggang virus DGV DG0909 Aedes spp. IS 2009 NC_016997.1 Edge Hill virus EHV YMP-48 Aedes vigilax MB 1969 DQ859060.1 Entebbe bat virus ENTV UgIL-30 Tadarida limbata NKV 1957 DQ837641.1 Gadgets Gully virus GGYV CSIRO122 Ixodes uriae TB 1975 DQ235145.1 Greek goat encephalitis virus GGEV Vergina Capra aegagrus hircus TB 1969 DQ235153.1 Hanko virus HANKV Mosquitoes IS 2005 JQ268258.1 Iguape virus IGUV SPAn-71686 Sentinel mouse MB 1979 AY632538.4 Ilheus virus ILHV Original Aedes spp. and Psorophora spp. MB 1944 AY632539.4 virus JEV JaOArS982 Mosquito? MB 1982 M18370.1

ora fGnrlVirology General of Journal Jugra virus JUGV P9-314 Aedes spp. MB 1968 DQ859066.1 KADV AMP6640 Rhipicephalus pravus TB 1967 DQ235146.1 Kamiti River virus KRV SR-82 Aedes macintoshi IS 1999 NC_005064.1 Karshi virus KSIV LEIV_2247 Ornithodoros papillipes TB 1972 NC_006947.1 Kedougou virus KEDV DakAar-D1470 Aedes minutus MB 1972 AY632540.2 Kokobera virus KOKV AusMRM-32 Culex annulirostris MB 1960 NC_009029.2 Koutango virus KOUV DakAr-D-5443 Tatera kempi MB 1968 EU082200.1 KUNV MRM61C Culex annulirostris MB 1960 D00246.1 virus KFDV G11338 Haemaphysalis spinigera TB 1957 JF416959.1

95 Lammi virus LAMV Mosquitoes IS 2004 FJ606789.1 http://vir.sgmjournals.org Table 1. cont.

Virus Virus abbreviation Strain Source of isolate Vector Year GenBank group accession no.

Langat virus LGTV TP21 Ixodes granulatus Supino TB 1956 AF253419.1 virus LIV 369/T2 Ixodes ricinus TB 1963 NC_001809.1 Meaban virus MEAV T70 Ornithodoros maritimus TB 1981 DQ235144.1 MODV M544 Peromyscus maniculatus NKV 1958 AJ242984.1 Montana myotis leukoencephalitis virus MMLV ATCC-VR-537 Myotis lucifugus NKV 1958 AJ299445.1 Murray Valley encephalitis MVEV MVE-1-51 Homo sapiens MB 1956 AF161266.1 Nakiwogo virus NAKV Uganda08 Mansonia africana nigerrima H4A1 IS 2008 GQ165809.2 New Mapoon Virus NMV CY1014 Culex annulirostris MB 1998 KC788512.1 Nounane virus NOUV Uranotaenia mashonaensis IS 2004 FJ711167.1 Ntaya virus NTAV IPDIA Mosquitoes MB 1966 JX236040.1 virus OHFV Kubrin Homo sapiens TB 1947 AY438626.1 Omsk hemorrhagic fever virus OHFV Guriev Homo sapiens TB 1947 AB507800.1 PCV 56 Coquillettidia xanthogaster IS 2010 KC505248.1 Potiskum virus POTV IBAN-10069 Cricetomys gambianus MB 1989 DQ859067.1 Powassan virus POWV LB Homo sapiens TB 1958 L06436.1 Quang Binh virus QBV VN180 Culex tritaeniorhynchus IS 2002 NC_012671.1 Rio Bravo virus RBV RiMAR Tadarida brasiliensis mexicana NKV 1954 AF144692.1 Rocio virus ROCV SPH-34675 Homo sapiens MB 1975 AY632542.4 RFV Eg-Art-371 Argas hermanni TB 1968 DQ235149.1 Saboya virus SABV DakAR-D4600 Tatera kempi MB 1968 DQ859062.1 Saumarez Reef virus SREV CSIRO-4 Ornithodoros capensis TB 1974 DQ235150.1 SEPV MK7148 Mansonia septempunctata MB 1966 DQ859063.1 Sitiawan virus STWV Broiler chicken MB 2000 JX477686.1 Spanish sheep encephalitis virus SSEV 87/2617 TBF 1987 DQ235152.1 SPOV SM-6V-1 Mansonia spp.? MB 1955 DQ859064.1 St. Louis encephalitis virus SLEV MSI-7 MB 1975 DQ359217.1 Tamana bat virus TABV Tr127154 Pteronotus parnellii NKV 1973 NC_3996.1 Tick-borne encephalitis virus TBEV-Sib Vasilchenko Homo sapiens TB 1969 AF069066.1 Tick-borne encephalitis virus TBEV-FE Sofjin-HO Homo sapiens TB 1937 AB062064.1 Tick-borne encephalitis virus TBEV-Eu Salem Macaca sylvanus TB 2006 FJ572210.1 Tick-borne encephalitis virus TBEV-FE Oshima-5-10 Dog TB 1995 AB062063.2

Tick-borne encephalitis virus TBEV-Eu Neudoerfl Ixodes ricinus TB 1971 TEU27495 genus the of Origin Tick-borne encephalitis virus TBEV-Sib Est54 Ixodes persulcatus TB 2000 GU183384.1 Tick-borne encephalitis virus TBEV 886-84 Clethrionomys rufocanus TB 1984 EF469662.1 Tick-borne encephalitis virus TBEV 178-79 Ixodes persulcatus TB 1979 EF469661.1 Tembusu virus TMUV ZJ GH-2 Goose MB 2010 JQ314465.1 Turkish sheep encephalitis virus TSEV TTE80 TB 1969 DQ235151.1 Tyuleniy virus TYUV 6017 TB 1971 DQ235148.1 Uganda S virus UGSV ORIGINAL Aedes spp. MB 1947 DQ859065.1 Flavivirus 1973 USUV SAAR-1776 Culex neavei MB 1958 AY453412.1 J. H.-O Pettersson and O. Fiz-Palacios

compared with the more narrow Beringian calibration (Table S1). The use of the Yang96 model with tip dates as calibration recovered older ages and broader intervals (95 % HPD) 1563.2 _ compared with using the SRD06 model (Table S1). accession no. However, both SRD06 and Yang96 model analyses under Beringian calibration recovered similar ranges for the 95 % HPD intervals (Table S1). The difference in estimates seen

Year GenBank between SRD06, allowing for the third codon position to vary independently of the linked first and second codon positions, and Yang96, allowing for all three codon positions to vary, is an indication that divergence time MBMBMB 1997NKV 1968 1979 AY765264.1 1971 EU082199.1 AF094612.1 AB114858.1 MBMB 1999 1937 AF196835.2 NC group estimates become biased towards the present, with higher substitution rates per nucleotide (Worobey et al., 2010). The differences seen between the calibration schemes are explained by the fact that estimating the tMRCA for entire virus families or genera requires a broad sampling scheme in order to cover as much of the variation as possible from the group in question. However, the wider the sampling, the higher the variation in substitution rates among lineages. This will consequently lead to an increasing departure from a constant molecular clock (Bromham & Penny, 2003; Duffy et al., 2008; Holmes, 2003a). A high variation in substitution rates will lead to uncertainty in the reconstructed divergence Culex pipiens Culex nebulosis Haemagogus spegazzini Miniopterus fuliginosus Phoenicopterus chilensis Homo sapiens times unless calibration points restrict the node ages. Therefore, using tip dates alone results in a high level of uncertainty for the deeper nodes and thus internal calibration becomes crucial (Ho & Phillips, 2009). As the Beringian calibration constraint (15 000–11 000 years) is narrower than the TBF calibration constraint (16 100–42 300 years; Heinze et al., 2012), the resulting 95 % HPD intervals from the Beringian analysis are also narrower. Our study shows how incorporation of internal calibration in large-scale virus phylogenies can help reconstruct more precise divergences times, independent of the substitution model, given a robust and reliable calibration by narrowing down the 95 % HPD intervals. Furthermore, as the mean tMRCA estimates are generally congruent between the calibration schemes applied (Table S1), we will hereafter only report and discuss the results from the combined tip dates and the internal node calibration based on the Beringian- POWV biogeographical event, i.e. the Beringian calibration, with the most narrow 95 % HPD intervals, unless otherwise stated.Thisisbecausetheincorporation of biogeographical information is essential to date virus origins, especially for ancient events (Katzourakis et al., 2009; Wertheim & Kosakovsky Pond, 2011), as is the case with Flavivirus.

Divergence times of the genus Flavivirus: congruences and incongruences The genus Flavivirus is defined as sensu lato (node B) or

cont. sensu stricto (node C). Our analysis, inferred from complete coding nucleotide genomes, indicated that Flavivirus sensu lato originated ~119 800 (87 100–158 900) years ago if West Nile virus WNV Rabensburg-97-103 Yaounde virusYellow fever virusYokose virusZika virus YAOV YFV YOKV DakAr-Y276 ZIKV Trinidad-79A-788379 Oita-36 MR-766 Rhesus monkey MB 1947 DQ859059.1 Virus Virus abbreviation Strain Source of isolate Vector Wesselsbron virusWest Nile virus WESSV WNV SAH-177-99871-2 NY99-flamingo382-99 Merino sheep MB 1955 DQ859058.1 West Nile virus WNV B956

Table 1. TABV is to be considered a part of the genus, or ~84 700

1974 Journal of General Virology 95 Origin of the genus Flavivirus

(63 700–109 600) years ago if TABV is excluded, i.e. Our study is supported by the fact that both Beringian Flavivirus sensu stricto. Either way, this first molecular calibration and only tip date calibration give similar mean dating for the whole genus indicates a significantly older tMRCA estimates, although only tip date calibration gives age than the 10 000 years that was previously suggested for broader 95 % HPD intervals. In contrast to several studies Flavivirus sensu stricto (Gould et al., 2001). (Sharp & Simmonds, 2011), we have not found any conflict between internal node and tip date calibrations. Our results Our results also contrast the 3 000 years age of the MBF dom are also congruent with several other studies (Table S3). group (node F), including MBFs, ISFs and NKVFs, The estimated tMRCA for the TBFs clade [node M, at estimated in Zanotto et al. (1996). In the present analysis, 27 500 (21 700–34 200) years ago] is supported by previous the MBF group (node F) was instead suggested to have dom estimates for the whole clade (Heinze et al., 2012) (28 600 emerged .41 500 (32 600–51 600) years ago, with a years ago) and for the lineages within (Bertrand et al., 2012; subsequent diversification of the MBF clade (node Aedes Uzca´tegui et al., 2012) (Table S3). For the split of the H) and the MBF clade (node K) clade ~25 900 (19 800– Culex POWV clade and its sister (node O), our tMRCA estimates 32 000) and ~26 900 (21 000–33 400) years ago, respect- of 14 800 years ago from tip date analysis also match ively. Previously, the ISF clade (node E) was indicated to a broadly with previous molecular clock studies (12 300 years have emerged between 3500 and 350 000 years ago (Crochu ago; Heinze et al., 2012) and support the use of the et al., 2004). The combination of tip date calibration and Beringian biogeographical calibration. internal node calibration allows us to more precisely pinpoint the origin of the ISFa clade to have occurred ~40 700 (30 800–52 300) years ago, i.e. ~44 000 years after Evolutionary rates of the genus Flavivirus and the split from the last common ancestor of Flavivirus sensu molecular clocks in RNA viruses stricto (node C) that emerged ~84 700 (63 700–110 000) Rates of nucleotide substitution were estimated from the years ago. We also showed the first proposed dating of the Beringian calibrated BEAST analyses for the entire ORF of NKVF clade (node N), here estimated to have emerged a the genome (Table 3). Rates for the Beringian, TBF and tip ~24 100 (18 000–30 900) years ago, ~15 000 years after the date calibration are summarized in Table S2. split from the last common ancestor of TBFs and NKVFs (node L). The genomic substitution rate of positive-sense ssRNA viruses varies between 1022 and 1025 substitutions site–1 The sharp contrast between the estimates in the present year21 (Hanada et al., 2004; Jenkins et al., 2002; Sanjua´n, study and previous studies can possibly be explained 2012). Even though the estimated mean rates in the present because evolutionary rates in our study are inferred from a study for Flavivirus sensu lato (including TABV; node B), genus-wide sampled dataset using a codon-based substi- 461025 (261025 to 661025) substitutions site–1 year21, tution model as compared with a nucleotide-based are within range of those estimated for positive-sense substitution model used in many other studies. It is likely ssRNA viruses, they are at least an order of magnitude that nucleotide-based substitution models will not be able slower than previous estimates for different lineages within to account for variation in selective pressure throughout Flavivirus, i.e. 161023 and 261024 substitutions site–1 evolutionary history, where the effect of purifying selection year21 as found for DENV-4 and St. Louis encephalitis is likely to cause underestimation of the actual ages virus, respectively (Sanjua´n, 2012 and references therein). (Wertheim & Kosakovsky Pond, 2011). Using the SLAC, FEL, Our results are not very surprising given that rate IFEL, FUBAR and MEME methods available within the HyPhy differences of up to 125 times have been shown for simian package (Pond et al., 2005) and at the Datamonkey immunodeficiency virus as compared with its sister, HIV webserver (Delport et al., 2010), there were none-to-weak (Worobey et al., 2010). signs of positive selection, but strong signs of purifying selection in the alignment used in the present study (results Furthermore, we did not find that the MBFdom group available upon request). Signs of strong purifying selection (node F) evolved at a significantly faster rate than the TBF within Flavivirus are in concordance with what has been clade (node M), as reported previously (Zanotto et al., found previously for members of the genus Flavivirus and 1995, 1996). Instead, our study showed that the MBFdom for vector-borne RNA viruses in general (Holmes, 2003b; group (node F) evolved at a mean rate of 3.261025 Pybus et al., 2007; Woelk & Holmes, 2002). A codon-based (1.761025 to 5.061025) substitutions site–1 year–1 and substitution model can to some extent compensate for that the TBF clade evolved at a mean rate of 3.961025 purifying selection, producing older tMRCA estimates (2.361025 to 5.761025) substitutions site–1 year–1 (Table than nucleotide-based substitution models (Wertheim & 3). Therefore, the implications for Flavivirus evolution Kosakovsky Pond, 2011). However, and more importantly, derived from Zanotto et al. (1995, 1996) need to be revised to accurately date deep RNA virus origins, the use of other and reconsidered. The fastest rate of nucleotide substi- sources of evidence (such as biogeography) will eventually tution was found for the ISFa clade (node E), with a mean become necessary as even codon-based models cannot rate of 6.261025 (4.061025 to 8.661025) substitutions account for the complex interactions of events and factors site–1 year–1. Their fast rate of evolution could perhaps be that have occurred throughout evolution (Katzourakis & explained by their vertical mode of cycling transmission in Gifford, 2010; Katzourakis et al., 2009). combination with their specificity to insects with relatively http://vir.sgmjournals.org 1975 J. H.-O Pettersson and O. Fiz-Palacios

TBEV-Eu_1971_TEU27495 TBEV-Eu_2006_FJ572210.1 SSEV_1987_DQ235152.1 LIV_1963_NC_001809.1 GGEV_1969_DQ235153.1 TSEV_1969_DQ235151.1 TBEV-FE_1995_AB062063.2 TBEV-FE_1937_AB062064.1 TBEV_1979_EF469661.1 TBEV_1984_EF469662.1 TBEV-Sib_1969_AF069066.1 TBEV-Sib_2000_GU183384.1 Beringian calibration OHFV_1947_AY438626.1 TBF OHFV_1947_AB507800.1 (11 000–15 000 years) LGTV_1956_AF253419.1 O AHFV_1995_AF331718.1 KFDV_1957_JF416959.1 TBF calibration POWV_1958_L06436.1 DTV_1995_NC_003218.1 (16 100–44 929 years) GGYV_1975_DQ235145.1 KSIV_1972_NC_006947.1 M RFV_1968_DQ235149.1 KADV_1967_DQ235146.1 SREV_1974_DQ235150.1 L MEAV_1981_DQ235144.1 TYUV_1971_DQ235148.1 MMLV_1958_AJ299445.1 N RBV_1954_AF144692.1 NKVF MODV_1958_AJ242984.1 APOIV_1954_AF160193.1 KUNV_1960_D00246.1 WNV_1999_AF196835.2 WNV_1937_NC_001563.2 WNV_1997_AY765264.1 KOUV_1968_EU082200.1 YAOV_1968_EU082199.1 MVEV_1956_AF161266.1 ALFV_1966_AY898809.1 JEV_1982_M18370.1 USUV_1958_AY453412.1 SLEV_1975_DQ359217.1 ROCV_1975_AY632542.4 ILHV_1944_AY632539.4 TMUV_2010_JQ314465.1 BYDV_2010_JF312912.1 D STWV_2000_JX477686.1 MBF BAGV_1966_AY632545.2 BAGV_2010_HQ644143.1 NTAV_1966_JX236040.1 IGUV_1979_AY632538.4 BSQV_1956_NC_009026.2 K NMV_1998_KC788512.1 KOKV_1960_NC_009029.2 DENV-1_1974_U88536.1 DENV-3_1956_M93130.1 J DENV-2_1944_AF038403.1 DENV-4_1956_M14931.2 ZIKV_1947_DQ859059.1 I SPOV_1955_DQ859064.1 KEDV_1972_AY632540.2 NOUV_2004_FJ711167.1 ISF CHAOV_2003_JQ068102.1 C LAMV_2004_FJ606789.1 ISF DGV_2009_NC_016997.1 SABV_1968_DQ859062.1 F POTV_1989_DQ859067.1 JUGV_1968_DQ859066.1 BANV_1956_DQ859056.1 UGSV_1947_DQ859065.1 MBF BOUV_1967_DQ859057.1 H EHV_1969_DQ859060.1 G WESSV_1955_DQ859058.1 B SEPV_1966_DQ859063.1 YFV_1979_AF094612.1 YOKV_1971_AB114858.1 ENTV_1957_DQ837641.1 NKVF CxFV_2003_AB262759.2 QBV_2002_NC_012671.1 A NAKV_2008_GQ165809.2 E PCV_2010_KC505248.1 ISF KRV_1999_NC_005064.1 AEFV_2011_KC181923.1 CFAV_1975_NC_001564.1 HANKV_2005_JQ268258.1 TABV_1973_NC_003996.1 NKVF BVDV-1_1963_NC_001461.1 322 200 150 100 50 0 Time (×103 years)

land bridge land bridge land bridge igratio glacier glacier glacier OWV m n path glacier P PO WV m igr atio n pa th

25 000–18 000 years 16 000–14 000 years 13 000–11 000 years <10 500 years

Fig. 1. Chronogram based on tip date calibrated terminals and internal nodes using Beringian calibration (see text for details). All named nodes have maximum posterior probability support (1.0 pp). Terminal tips are named by virus name (abbreviations,

1976 Journal of General Virology 95 Origin of the genus Flavivirus

see Table 1), year of isolation and GenBank accession number. Development of the Beringia region between the last glacial maximum until the Bering Strait was formed (25 000–10 500 years ago). Beringia development after Elias et al. (1996) and Dixon (2013). short generation times (Bolling et al., 2012; Lutomiah et al., as humans do get infected with POWV, perhaps it is more 2007). likely that ticks and associated tick hosts jointly brought POWV into North America. Human infections with Estimates of substitution rates are often found to be much POWV in present times are infrequent (Ebel, 2010), higher for tips than for deep nodes (Wertheim & although numbers appear to be on the increase (Hinten Kosakovsky Pond, 2011). From our perspective, a highly et al., 2008). Between 15 000 and 11 000 years ago, the variable non-constant molecular clock with pulses of very migrating human population who settled the Americas had high substitution rates for short periods of time may reflect an effective population size of perhaps ,80 individuals the difference of orders of magnitude seen when comparing 23 (Hey, 2005). This, in combination with the fact that evolutionary rates in short and recent times (e.g. 1610 POWV is considered to be maintained solely in an enzootic for DENV-4 in Klungthong et al., 2004) with long and deep 25 cycle between ixodid ticks and their hosts (Ebel, times (e.g. 4610 for Flaviviruses sensu lato in this study). 2010; Ebel et al., 2000), is a strong argument against humans being the likely vectors by which POWV first Beringia and the emergence of POWV in North entered North America. Nor is it likely that POWV entered America the Americas by the coastal route. Even though the coastal areas did support marine mammals and had terrestrial The Beringian land bridge was open for land migration mammals living in refugia (Heaton & Grady, 2003), the between 15 000 and 11 000 years ago until it was flooded deglaciated coastal areas had sparse vegetation that did not and the Bering Strait was formed (Dixon, 2013; Elias, 2001; support large populations of terrestrial mammals (Dixon, Elias et al., 1996; Kelly, 2003), effectively blocking 2013). Also, animals were isolated and movement was migration between Asia and North America for animals inhibited due to the glaciers, and the coastal route was, in other than humans (Fig. 1). Our molecular reconstructions general, not considered to be a migratory route for using only tip dates are congruent with POWV being mammals other than humans (Dixon, 2013). introduced into North America when the Beringian land In view of the results from the present study, the most bridge was accessible and supports the use of tip dates in likely time period of introduction was after the deglaciation combination with an internal Beringian calibration (Fig. 1, corridor became habitable ~13 500 years ago (Catto, 1996; Table 2). Dixon, 2013) and before sea levels rose by 40 m and the How then did POWV enter North America? Although Bering strait reached its present width before 10 500 years humans might have been responsible for carrying POWV, ago (Elias et al., 1996). During this period, populations of

Table 2. Mean estimated tMRCA (¾103 years) and 95 % HPD interval: results for the Beringian, TBF and tip date calibration analyses

Node Beringian (11 000–15 000 years) TBF (16 100–42 300 years) Tip dates only

tMRCA 95 % HPD tMRCA 95 % HPD tMRCA 95 % HPD

A: Root 230.5 156.1–322.7 219.4 110.6–387.6 265.0 25.6–2.768.1 B: TABV/ISF–MBF–TBF–NKVF 119.8 87.1–158.9 114.0 62.2–194.6 138.7 12.9–1.456.1

C: ISFa/MBFdom–TBF–NKVFa 84.7 63.7–109.6 80.6 45.0–135.1 98.4 9.8–1.024.7 D: MBFdom/TBF–NKVFa 47.2 37.3–58.7 44.9 26.4–73.9 54.0 5.5–560.2 E: ISFa 40.7 30.8–52.3 38.5 21.7–64.6 46.8 4.6–486.7

F: MBFdom 41.5 32.6–51.6 39.4 23.1–64.9 47.4 4.9–492.7 G: NKVFb/MBFAedes 33.2 25.7–42.0 31.5 18.2–52.3 37.9 4.0–390.7 H: MBFAedes 25.9 19.8–33.0 24.6 14.0–41.0 29.5 2.9–304.1 I: ISFb/NOUV–MBFCulex 33.8 26.6–42.1 32.1 18.7–52.9 38.7 3.8–401.9 J: NOUV/MBFCulex 31.0 24.3–38.6 29.5 17.2–48.7 35.5 3.8–369.4 K: MBFCulex 26.9 21.0–3343 25.5 14.8–42.1 30.7 3.3–320.0 L: TBF/NKVFa 39.3 30.7–49.2 37.5 21.9–61.4 44.9 4.2–468.5 M: TBF 27.5 21.7–34.2 26.2 16.1–42.3 31.5 3.1–329.4

N: NKVFa 24.1 18.0–30.9 22.9 12.8–38.4 27.5 2.9–286.6 O: POWV/sister clade of TBF 12.8 11.0–14.8 12.2 7.0–20.0 14.8 1.5–152.3

http://vir.sgmjournals.org 1977 J. H.-O Pettersson and O. Fiz-Palacios

Table 3. Mean estimated evolutionary rates (¾10”5 substitu- years ago (Falush et al., 2003; Linz et al., 2007; Moodley et al., tions site–1 year–1) and 95 % HPD intervals for the Beringian 2012). Likewise, the Mycobacterium tuberculosis complex is calibration (11 000–15 000 years) estimated to have started to diverge ~70 000 years ago, consistent with an African origin linked to human expansion Node Mean rate 95 % HPD and migration (Comas et al., 2013; Wirth et al., 2008). Similarly, several human pathogenic viruses also show B: TABV/ISF–MBF–TBF–NKVF 3.96 2.04–6.14 C: ISF /MBF –TBF–NKVF 3.60 1.81–5.57 patterns of co-divergence associated with human diversifica- a dom tion (Holmes, 2004). Therefore, it is perhaps not unlikely D: MBF /TBF–NKVFa 2.56 1.57–3.70 dom that the spread and evolution of several Flavivirus strains, E: ISFa 6.24 4.00–8.65 especially if much of the Flavivirus diversity originated in F: MBFdom 3.24 1.73–5.00 Africa (Gould et al., 2001, 2003), was influenced by human G: NKVFb/MBFAedes 2.92 1.63–4.43 H: MBFAedes 3.11 1.70–4.72 migration and expansion across the globe. I: ISFb/NOUV–MBFCulex 4.02 2.28–6.03 J: NOUV/MBFCulex 3.95 2.12–6.06 K: MBFCulex 4.19 2.30–6.37 METHODS L: TBF/NKVF 4.27 2.39–6.38 a Virus genomes, sequence alignments and models of molecular M: TBF 3.89 2.32–5.73 evolution. Complete coding genomes of the genus Flavivirus, N: NKVFa 5.18 2.32–6.15 excluding the untranslated 59 and 39 flanking regions, were retrieved O: POWV/sister clade of TBF 4.30 3.31–7.23 from GenBank (Table 1). Genomes were selected based on available background information on the year of virus strain isolation (isolated between 1937 and 2011). BVDV-1 of the genus within the ixodid ticks and land mammals (including humans) could family Flaviviridae was included as an outgroup based on a previously have migrated from eastern Beringia through the then published Flavivirus phylogeny (Cook et al., 2012). To construct a robust alignment without frameshifts, all sequences were translated to habitable deglaciated corridor to the south of the glaciers amino acids in SeaView 4.4.2 (Gouy et al., 2010), aligned with MAFFT (Dixon, 2013; Goebel & Buvit, 2011; Shapiro et al., 2004). 7.037b using the default G-INS-i algorithm and parameters (Katoh & Ticks and tick hosts maintaining POWV might initially Standley, 2013), and then back-translated into nucleotides. The have been restricted to eastern Beringia, north-west of the resulting alignments were inspected visually and edited with AliView glaciers. As the deglaciated corridor became habitable and 0.8 (http://ormbunkar.se/aliview). To estimate the best-fitting model animals could migrate southwards, and after the Bering of evolution for the sequence alignments, model tests were performed Strait was formed preventing land migration back to Asia, for the nucleotide alignment using jModelTest 2.1.3 (Darriba et al., 2012) and for the amino acid alignment using ProtTest 3.2.1 (Darriba POWV became established in North America. Using an et al., 2011). To test the null hypothesis of a strict molecular clock, a internal calibration of 15 000–11 000 or 16 000–10 000 years maximum-likelihood clock test was performed using MEGA 5.22 for the split of POWV from closely related TBFs gives (Tamura et al., 2011). The best-fitting models were then specified in similar tMRCA dates for POWV crossing into the Americas the respective analyses. (12 800 and 12 500 years ago, respectively), which is in Phylogenetic analysis: maximum-likelihood and Bayesian agreement with dates of the opening and closing of the analysis. Maximum-likelihood trees were estimated using RAxML Beringian land bridge. 7.6.3 (Stamatakis, 2006) with 1000 rapid bootstraps under the GTR+ I+G model for the nucleotide alignment and the WAG+I+Gmodel Did flaviviruses and humans spread throughout for the amino acid alignment, respectively. Bayesian phylogenetic trees were inferred using MrBayes 3.2.1 (Ronquist et al., 2012) by executing the globe together? two parallel runs with four Metropolis-coupled chains for 20 million and Viruses, and RNA viruses in particular, are among the most 9 million Markov chain Monte Carlo (MCMC) generations, using GTR+I+G for nucleotides and WAG+I+G for amino acids as model successful evolving entities on the planet, having colonized of evolution, respectively, sampling every 1000 generations and run with and adapted to an array of different environments and default priors, discarding the first 25 % as burn-in. modes of transmission within all domains of life (Wasik & Turner, 2012). The most significant event that might have Estimating the tMRCA: Bayesian analysis with BEAST. To explore helped shape the distribution of flaviviruses is perhaps the the temporal scale of the entire genus Flavivirus, evolutionary rates spread of modern humans out of Africa. Modern humans and the tMRCA were estimated using a Bayesian MCMC method as implemented in BEAST 1.7.5 (Drummond et al., 2012). All BEAST runs arose in Africa (Cavalli-Sforza & Feldman, 2003; McDougall were performed with calibrated tip dates where the sequence with the et al., 2005; Stringer & Andrews, 1988), and began migrating most recent sampling date (2011) was set to represent the present. to other continents between 80 000 and 40 000 years ago, The SRD06 codon-based partition model (Shapiro et al., 2006) and having populated all continents except Antarctica by 10 000 HKY85+G nucleotide substitution model were used together with a years ago (Blome et al., 2012; Henn et al., 2012; Macaulay non-parametric Bayesian skyline population coalescent tree prior et al., 2005; Rasmussen et al., 2011). The migration of with a piecewise-constant skyline demographic model (Drummond humans out of Africa has been accompanied by several other et al., 2005), along with a log-normal uncorrelated relaxed molecular clock. The effect of using the Yang96 codon model (Yang, 1996) was pathogens. The pathogenic bacterium Helicobacter pylori also explored in a separate analysis. All analyses were run for 100 appears to have accompanied human migration out of Africa million generations in triplicate, sampling every 1000 generations to and is estimated to have spread from east Africa ~58 000 ensure mixing of chains and that a sufficiently effective sample size

1978 Journal of General Virology 95 Origin of the genus Flavivirus

(.200) was reached. Convergence of chains and effective sample size complete coding sequences of arthropod-borne viruses and viruses statistics were analysed with Tracer 1.5 (http://beast.bio.ed.ac.uk/ with no known vector. J Gen Virol 81, 781–790. Tracer). Log and tree files were combined in LogCombiner (BEAST Blome, M. W., Cohen, A. S., Tryon, C. A., Brooks, A. S. & Russell, J. package) to produce consensus files of the different runs. Finally, (2012). The environmental context for the origins of modern human maximum clade credibility trees of the three different analyses were diversity: a synthesis of regional variability in African climate produced using TreeAnnotator (BEAST package). Chronograms were 150,000–30,000 years ago. J Hum Evol 62, 563–592. viewed and annotated in FigTree 1.3.1 (http://tree.bio.ed.ac.uk/software/ Bodner, M., Perego, U. A., Huber, G., Fendt, L., Ro¨ ck, A. W., figtree). Computations were performed at the Bioportal webportal at the Zimmermann, B., Olivieri, A., Go´ mez-Carballa, A., Lancioni, H. & University of Oslo (www.bioportal.uio.no), the CIPRES webportal other authors (2012). Rapid coastal spread of First Americans: novel (Miller et al., 2010) and at the Uppsala Multidisciplinary Center for insights from South America’s Southern Cone mitochondrial Advanced Computational Science (www.uppmax.uu.se). genomes. Genome Res 22, 811–820.

Calibration schemes. In BEAST, three calibration schemes were Bolling, B. G., Olea-Popelka, F. J., Eisen, L., Moore, C. G. & Blair, applied. (i) The first analysis was run with default uniform priors C. D. (2012). Transmission dynamics of an insect-specific flavivirus in a using tip dates only (i.e. uncalibrated internal nodes) allowing the naturally infected Culex pipiens laboratory colony and effects of co- MCMC chains to freely explore the treespace and the node height. (ii) infection on vector competence for West Nile virus. Virology 427,90–97. The second analysis was run using tip dates together with secondary Bromham, L. & Penny, D. (2003). The modern molecular clock. Nat calibration data for the TBF node (split of mammalian TBFs from Rev Genet 4, 216–224. seabird TBFs), previously estimated to have originated between 44 929 Catto, N. R. (1996). Richardson Mountains, Yukon-Northwest and 16 100 years ago (Heinze et al., 2012), hereafter referred to as the territories: the northern portal of the postulated ‘Ice-Free Corridor’. TBF calibration. These dates were incorporated as upper and lower Quartern Int 32, 3–19. bounds using a uniform distribution. (iii) The third analysis was run using tip dates together with an internal node calibration based on the Cavalli-Sforza, L. L. & Feldman, M. W. (2003). The application of biogeographical event for the split of the POWV and closely related molecular genetic approaches to the study of human evolution. Nat TBFs during the existence of the Beringian land bridge, estimated to Genet 33 (Suppl), 266–275. have been open for mammal land migration between 15 000 and Chambers, T. J., Hahn, C. S., Galler, R. & Rice, C. M. (1990). Flavivirus 11 000 years ago (see Introduction). Here, the ages of the opening and genome organization, expression, and replication. Annu Rev Microbiol closing times of the Beringian land bridge were used to specify a 44, 649–688. maximum bound (15 000 years) and minimum bound (11 000 years) Comas, I., Coscolla, M., Luo, T., Borrell, S., Holt, K. E., Kato-Maeda, for the age of this split, given that the POWV clade is the only North M., Parkhill, J., Malla, B., Berg, S. & other authors (2013). Out-of- American TBF and that all other TBFs are either African, Eurasian or Africa migration and Neolithic coexpansion of Mycobacterium Oceanian (Heinze et al., 2012), hereafter referred to as the Beringian tuberculosis with modern humans. Nat Genet 45, 1176–1182. calibration. In addition to the Beringian calibration, we also tested the Cook, S. & Holmes, E. C. (2006). effect of allowing for a wider range of the opening and closing of the A multigene analysis of the phylogenetic relationships among the flaviviruses (Family: Flavi- Beringia land bridge (Dixon, 2013). Here, the age for the split viridae) and the evolution of vector transmission. Arch Virol 151, between POWV and closely related TBFs was set with an internal 309–325. calibration to an upper bound of 16 000 years ago and a lower bound of 10 000 years ago. Cook, S., Moureau, G., Kitchen, A., Gould, E. A., de Lamballerie, X., Holmes, E. C. & Harbach, R. E. (2012). Molecular evolution of the insect-specific flaviviruses. J Gen Virol 93, 223–234. ACKNOWLEDGEMENTS Crochu, S., Cook, S., Attoui, H., Charrel, R. N., De Chesse, R., Belhouchet, M., Lemasson, J. J., de Micco, P. & de Lamballerie, X. We are grateful to Allison Perrigo, Martin Ryberg and Petra Korall for (2004). Sequences of flavivirus-related RNA viruses persist in DNA their constructive comments and feedback on the manuscript. We form integrated in the genome of Aedes spp. mosquitoes. J Gen Virol acknowledge Allison Perrigo and Stina Weststrand for their help with 85, 1971–1980. the design of the figures. We are also grateful to James E. Dixon for Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. (2011). ProtTest valuable information and discussions on Beringian biogeography. 3: fast selection of best-fit models of protein evolution. Bioinformatics Finally, we wish to acknowledge all the hard work by people all over 27, 1164–1165. the world making the sequences used in the present study available to everyone. J. H.-O. P. was supported by Stiftelsen Olle Engkvist Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. (2012). Byggma¨stare. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9, 772. de Lamballerie, X., Crochu, S., Billoir, F., Neyts, J., de Micco, P., REFERENCES Holmes, E. C. & Gould, E. A. (2002). Genome sequence analysis of Tamana bat virus and its relationship with the genus Flavivirus. J Gen Achilli, A., Perego, U. A., Lancioni, H., Olivieri, A., Gandini, F., Virol 83, 2443–2454. Hooshiar Kashani, B., Battaglia, V., Grugni, V., Angerhofer, N. & Delport, W., Poon, A. F. Y., Frost, S. D. W. & Kosakovsky Pond, S. L. other authors (2013). Reconciling migration models to the Americas (2010). Datamonkey 2010: a suite of phylogenetic analysis tools for with the variation of North American native mitogenomes. Proc Natl evolutionary biology. Bioinformatics 26, 2455–2457. Acad Sci U S A 110, 14308–14313. Dixon, E. J. (2013). Late Pleistocene colonization of North America Bertrand, Y., To¨ pel, M., Elva¨ ng, A., Melik, W. & Johansson, M. (2012). from Northeast Asia: new insights from large-scale paleogeographic First dating of a recombination event in mammalian tick-borne reconstructions. Quartern Int 285, 57–67. flaviviruses. PLoS ONE 7, e31981. Drummond, A. J., Pybus, O. G., Rambaut, A., Forsberg, R. & Rodrigo, Billoir, F., de Chesse, R., Tolou, H., de Micco, P., Gould, E. A. & de A. G. (2003). Measurably evolving populations. Trends Ecol Evol 18, Lamballerie, X. (2000). Phylogeny of the genus flavivirus using 481–488. http://vir.sgmjournals.org 1979 J. H.-O Pettersson and O. Fiz-Palacios

Drummond, A. J., Rambaut, A., Shapiro, B. & Pybus, O. G. (2005). Grard, G., Moureau, G., Charrel, R. N., Holmes, E. C., Gould, E. A. & Bayesian coalescent inference of past population dynamics from de Lamballerie, X. (2010). Genomics and evolution of Aedes-borne molecular sequences. Mol Biol Evol 22, 1185–1192. flaviviruses. J Gen Virol 91, 87–94. Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. (2012). Gritsun, T. S., Nuttall, P. A. & Gould, E. A. (2003). Tick-borne Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol flaviviruses. Adv Virus Res 61, 317–371. 29, 1969–1973. Gubler, D., Kuno, G. & Markoff, L. (2007). Flaviviruses. In Fields Duffy, S., Shackelton, L. A. & Holmes, E. C. (2008). Rates of Virology, 5th edn, vol. 1, pp. 1153–1253. Edited by D. Knipe, evolutionary change in viruses: patterns and determinants. Nat Rev P. Howley, D. Griffin, R. Lamb, M. Martin, B. Roizman & S. Straus. Genet 9, 267–276. Philadelphia, PA: Lippincott-Raven. Dunham, E. J. & Holmes, E. C. (2007). Inferring the timescale of Hanada, K., Suzuki, Y. & Gojobori, T. (2004). A large variation in the dengue virus evolution under realistic models of DNA substitution. rates of synonymous substitution for RNA viruses and its relationship J Mol Evol 64, 656–661. to a diversity of viral infection and transmission modes. Mol Biol Evol Dyke, A. S. (2004). An outline of the deglaciation of North America 21, 1074–1080. with emphasis on central and northern Canada. In Quaternary Heaton, T. H. & Grady, F. (2003). The late Wisconsin vertebrate history Glaciations, Extent and Chronology Part II: North America, pp. 373– of Prince of Wales Island, Southeast Alaska. In Ice Age Cave Faunas of 424. Edited by J. Ehlers & P. L. Gibbard. Amsterdam: Elsevier. North America, pp. 17–53. Edited by B. W. Schubert, J. I. Mead & Ebel, G. D. (2010). Update on Powassan virus: emergence of a R. W. Graham. Bloomington, IN: Indiana University Press. North American tick-borne flavivirus. Annu Rev Entomol 55, 95– Heinze, D. M., Gould, E. A. & Forrester, N. L. (2012). Revisiting the 110. clinal concept of evolution and dispersal for the tick-borne flaviviruses Ebel, G. D., Campbell, E. N., Goethert, H. K., Spielman, A. & Telford, by using phylogenetic and biogeographic analyses. J Virol 86, 8663– 8671. S. R., III (2000). Enzootic transmission of deer tick virus in New England and Wisconsin sites. Am J Trop Med Hyg 63, 36–42. Henn, B. M., Cavalli-Sforza, L. L. & Feldman, M. W. (2012). The great human expansion. Proc Natl Acad Sci U S A 109, 17758–17764. Elias, S. A. (2001). Beringian paleoecology: results from the 1997 workshop. Quatern Sci Rev 20, 7–13. Hey, J. (2005). On the number of New World founders: a population genetic portrait of the peopling of the Americas. PLoS Biol 3, e193. Elias, S. A., Short, S. K., Nelson, C. H. & Birks, H. H. (1996). Life and times of the Bering land bridge. Nature 382, 60–63. Hinten, S. R., Beckett, G. A., Gensheimer, K. F., Pritchard, E., Courtney, T. M., Sears, S. D., Woytowicz, J. M., Preston, D. G., Smith, Fagundes, N. J. R., Kanitz, R., Eckert, R., Valls, A. C. S., Bogo, M. R., R. P., Jr & other authors (2008). Increased recognition of Powassan Salzano, F. M., Smith, D. G., Silva, W. A., Jr, Zago, M. A. & other encephalitis in the United States, 1999–2005. Vector Borne Zoonotic authors (2008). Mitochondrial population genomics supports a Dis 8, 733–740. single pre-Clovis origin with a coastal route for the peopling of the Americas. Am J Hum Genet 82, 583–592. Ho, S. Y. W. & Phillips, M. J. (2009). Accounting for calibration uncertainty in phylogenetic estimation of evolutionary divergence Falush, D., Wirth, T., Linz, B., Pritchard, J. K., Stephens, M., Kidd, M., times. Syst Biol 58, 367–380. Blaser, M. J., Graham, D. Y., Vacher, S. & other authors (2003). Traces of human migrations in Helicobacter pylori populations. Holmes, E. C. (2003a). Molecular clocks and the puzzle of RNA virus Science 299, 1582–1585. origins. J Virol 77, 3893–3897. Gaunt, M. W., Sall, A. A., de Lamballerie, X., Falconar, A. K., Holmes, E. C. (2003b). Patterns of intra- and interhost nonsynon- Dzhivanian, T. I. & Gould, E. A. (2001). Phylogenetic relationships of ymous variation reveal strong purifying selection in dengue virus. flaviviruses correlate with their epidemiology, disease association and J Virol 77, 11296–11298. biogeography. J Gen Virol 82, 1867–1876. Holmes, E. C. (2004). The phylogeography of human viruses. Mol Goebel, T. & Buvit, I. (editors) (2011). From the Yenisei to the Yukon: Ecol 13, 745–756. Interpreting Lithic Assemblage Variability in Late Pleistocene/Early Hoshino, K., Isawa, H., Tsuda, Y., Sawabe, K. & Kobayashi, M. Holocene Beringia. College Station, TX: Texas A&M University Press. (2009). Isolation and characterization of a new insect flavivirus from Goebel, T., Waters, M. R. & O’Rourke, D. H. (2008). The late Aedes albopictus and Aedes flavopictus mosquitoes in Japan. Virology Pleistocene dispersal of modern humans in the Americas. Science 319, 391, 119–129. 1497–1502. Huhtamo, E., Putkuri, N., Kurkela, S., Manni, T., Vaheri, A., Vapalahti, Gould, E. A. & Solomon, T. (2008). Pathogenic flaviviruses. Lancet O. & Uzca´tegui, N. Y. (2009). Characterization of a novel flavivirus 371, 500–509. from mosquitoes in northern europe that is related to mosquito- borne flaviviruses of the tropics. J Virol 83, 9532–9540. Gould, E. A., de Lamballerie, X., Zanotto, P. M. & Holmes, E. C. (2001). Evolution, epidemiology, and dispersal of flaviviruses revealed Hulte´ n, E. (1937). Outline of the History of Arctic and Boreal Biota During by molecular phylogenies. Adv Virus Res 57, 71–103. the Quaternary Period. Stockholm: Bokforlagsaktiebolaget Thule. Gould, E. A., de Lamballerie, X., Zanotto, P. M. & Holmes, E. C. Jenkins, G. M., Rambaut, A., Pybus, O. G. & Holmes, E. C. (2002). (2003). Origins, evolution, and vector/host coadaptations within the Rates of molecular evolution in RNA viruses: a quantitative genus Flavivirus. Adv Virus Res 59, 277–314. phylogenetic analysis. J Mol Evol 54, 156–165. Gouy, M., Guindon, S. & Gascuel, O. (2010). SeaView version 4: A Katoh, K. & Standley, D. M. (2013). MAFFT multiple sequence multiplatform graphical user interface for sequence alignment and alignment software version 7: improvements in performance and phylogenetic tree building. Mol Biol Evol 27, 221–224. usability. Mol Biol Evol 30, 772–780. Grard, G., Moureau, G., Charrel, R. N., Lemasson, J.-J., Gonzalez, Katzourakis, A. & Gifford, R. J. (2010). Endogenous viral elements in J.-P., Gallian, P., Gritsun, T. S., Holmes, E. C., Gould, E. A. & de animal genomes. PLoS Genet 6, e1001191. Lamballerie, X. (2007). Genetic characterization of tick-borne Katzourakis, A., Gifford, R. J., Tristem, M., Gilbert, M. T. P. & Pybus, flaviviruses: new insights into evolution, pathogenetic determinants O. G. (2009). Macroevolution of complex . Science 325, and taxonomy. Virology 361, 80–92. 1512.

1980 Journal of General Virology 95 Origin of the genus Flavivirus

Kelly, R. L. (2003). Maybe we do know when people first came to Medeiros, D. B. A., Nunes, M. R. T., Vasconcelos, P. F. C., Chang, North America; and what does it mean if we do? Quartern Int 109– G.-J. J. & Kuno, G. (2007). Complete genome characterization of 110, 133–145. Rocio virus (Flavivirus: Flaviviridae), a Brazilian flavivirus isolated Klungthong, C., Zhang, C., Mammen, M. P., Jr, Ubol, S. & Holmes, from a fatal case of encephalitis during an epidemic in Sao Paulo E. C. (2004). The molecular epidemiology of dengue virus serotype 4 state. J Gen Virol 88, 2237–2246. in Bangkok, Thailand. Virology 329, 168–179. Mehla, R., Kumar, S. R. P., Yadav, P., Barde, P. V., Yergolkar, P. N., Erickson, B. R., Carroll, S. A., Mishra, A. C., Nichol, S. T. & Mourya, Kolodziejek, J., Pachler, K., Bin, H., Mendelson, E., Shulman, L., D. T. (2009). Orshan, L. & Nowotny, N. (2013). Barkedji virus, a novel mosquito- Recent ancestry of Kyasanur Forest disease virus. Emerg borne flavivirus identified in Culex perexiguus mosquitoes, Israel, Infect Dis 15, 1431–1437. 2011. J Gen Virol 94, 2449–2457. Miller, M. A., Pfeiffer, W. & Schwartz, T. (2010). Creating the CIPRES Korber, B., Muldoon, M., Theiler, J., Gao, F., Gupta, R., Lapedes, A., Science Gateway for inference of large phylogenetic trees. Presented at the Gateway Computing Environments Workshop, 2010. Hahn, B. H., Wolinsky, S. & Bhattacharya, T. (2000). Timing the ancestor of the HIV-1 pandemic strains. Science 288, 1789–1796. Mohammed, M. A. F., Galbraith, S. E., Radford, A. D., Dove, W., Takasaki, T., Kurane, I. & Solomon, T. (2011). Molecular phylogenetic Kumar, S. R. P., Patil, J. A., Cecilia, D., Cherian, S. S., Barde, P. V., and evolutionary analyses of Muar strain of Japanese encephalitis Walimbe, A. M., Yadav, P. D., Yergolkar, P. N., Shah, P. S. & other virus reveal it is the missing fifth genotype. Infect Genet Evol 11, 855– authors (2010). Evolution, dispersal and replacement of American 862. genotype dengue type 2 viruses in India (1956–2005): selection pressure and molecular clock analyses. J Gen Virol 91, 707–720. Moodley, Y., Linz, B., Bond, R. P., Nieuwoudt, M., Soodyall, H., Schlebusch, C. M., Bernho¨ ft, S., Hale, J., Suerbaum, S. & other Kuno, G. (2007). Host range specificity of flaviviruses: correlation authors (2012). Age of the association between Helicobacter pylori with in vitro replication. J Med Entomol 44, 93–101. and man. PLoS Pathog 8, e1002693. Kuno, G. & Chang, G.-J. J. (2006). Characterization of Sepik and Pan, X.-L., Liu, H., Wang, H.-Y., Fu, S.-H., Liu, H.-Z., Zhang, H.-L., Li, Entebbe bat viruses closely related to yellow fever virus. Am J Trop M.-H., Gao, X.-Y., Wang, J.-L. & other authors (2011). Emergence of Med Hyg 75, 1165–1170. genotype I of Japanese encephalitis virus as the dominant genotype in Lee, J. S., Grubaugh, N. D., Kondig, J. P., Turell, M. J., Kim, H.-C., Asia. J Virol 85, 9847–9853. Klein, T. A. & O’Guinn, M. L. (2013). Isolation and genomic Patil, J. A., Cherian, S., Walimbe, A. M., Patil, B. R., Sathe, P. S., Shah, characterization of Chaoyang virus strain ROK144 from Aedes vexans P. S. & Cecilia, D. (2011). Evolutionary dynamics of the American nipponii from the Republic of Korea. Virology 435, 220–224. African genotype of dengue type 1 virus in India (1962–2005). Infect Lindenbach, B. D. & Rice, C. M. (2003). Molecular biology of Genet Evol 11, 1443–1448. flaviviruses. Adv Virus Res 59, 23–61. Pond, S. L. K., Frost, S. D. W. & Muse, S. V. (2005). HyPhy: hypothesis Lindenbach, B. D., Thiel, H.-J. & Rice, C. M. (2007). Flaviviridae:the testing using phylogenies. Bioinformatics 21, 676–679. viruses and their replication. In Fields Virology, 5th edn, vol. 1, pp. Porterfield, J. S. (1980). Antigenic characteristics and classification of 1101–1152. Edited by D. Knipe, P. Howley, D. Griffin, R. Lamb, Togaviridae.InThe Togaviruses: Biology, Structure, Replication, pp. M. Martin, B. Roizman & S. Straus. Philadelphia, PA: Lippincott-Raven. 13–46. Edited by R. W. Schlesinger. New York: Academic Press. Linz, B., Balloux, F., Moodley, Y., Manica, A., Liu, H., Roumagnac, P., Pybus, O. G., Rambaut, A., Belshaw, R., Freckleton, R. P., Falush, D., Stamer, C., Prugnolle, F. & other authors (2007). An Drummond, A. J. & Holmes, E. C. (2007). Phylogenetic evidence for African origin for the intimate association between humans and deleterious mutation load in RNA viruses and its contribution to viral Helicobacter pylori. Nature 445, 915–918. evolution. Mol Biol Evol 24, 845–852. Lobo, F. P., Mota, B. E. F., Pena, S. D. J., Azevedo, V., Macedo, A. M., Ramı´rez, A., Fajardo, A., Moros, Z., Gerder, M., Caraballo, G., Tauch, A., Machado, C. R. & Franco, G. R. (2009). Virus–host Camacho, D., Comach, G., Alarco´ n, V., Zambrano, J. & other authors coevolution: common patterns of nucleotide motif usage in (2010). Evolution of dengue virus type 3 genotype III in Venezuela: Flaviviridae and their hosts. PLoS ONE 4, e6282. diversification, rates and population dynamics. Virol J 7, 329. Lutomiah, J. J. L., Mwandawiro, C., Magambo, J. & Sang, R. C. (2007). Rasmussen, M., Guo, X., Wang, Y., Lohmueller, K. E., Rasmussen, S., Infection and vertical transmission of Kamiti river virus in laboratory Albrechtsen, A., Skotte, L., Lindgreen, S., Metspalu, M. & other bred mosquitoes. J Insect Sci 7, 1–7. authors (2011). An Aboriginal Australian genome reveals separate Macaulay, V., Hill, C., Achilli, A., Rengo, C., Clarke, D., Meehan, W., human dispersals into Asia. Science 334, 94–98. Blackburn, J., Semino, O., Scozzari, R. & other authors (2005). Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A., Single, rapid coastal settlement of Asia revealed by analysis of Ho¨ hna, S., Larget, B., Liu, L., Suchard, M. A. & Huelsenbeck, J. P. complete mitochondrial genomes. Science 308, 1034–1036. (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and Mackenzie, J. S., Gubler, D. J. & Petersen, L. R. (2004). Emerging model choice across a large model space. Syst Biol 61, 539–542. flaviviruses: the spread and resurgence of Japanese encephalitis, West Sanjua´ n, R. (2012). From molecular genetics to phylodynamics: Nile and dengue viruses. Nat Med 10 (Suppl), S98–S109. evolutionary relevance of mutation rates across viruses. PLoS Pathog Mandryk, C. A., Josenhans, H., Fedje, D. W. & Mathewes, R. W. 8, e1002685. (2001). Late Quaternary paleoenvironments of Northwestern North Schurr, T. G. (2004). The peopling of the new world: perspectives America: implications for inland versus coastal migration routes. from molecular anthropology. Annu Rev Anthropol 33, 551–583. Quat Sci Rev 20, 301–314. Shapiro, B., Drummond, A. J., Rambaut, A., Wilson, M. C., Matheus, May, F. J., Davis, C. T., Tesh, R. B. & Barrett, A. D. T. (2011). P. E., Sher, A. V., Pybus, O. G., Gilbert, M. T. P., Barnes, I. & other Phylogeography of West Nile virus: from the cradle of evolution in authors (2004). Rise and fall of the Beringian steppe bison. Science Africa to Eurasia, Australia, and the Americas. J Virol 85, 2964–2974. 306, 1561–1565. McDougall, I., Brown, F. H. & Fleagle, J. G. (2005). Stratigraphic Shapiro, B., Rambaut, A. & Drummond, A. J. (2006). Choosing placement and age of modern humans from Kibish, Ethiopia. Nature appropriate substitution models for the phylogenetic analysis of 433, 733–736. protein-coding sequences. Mol Biol Evol 23, 7–9. http://vir.sgmjournals.org 1981 J. H.-O Pettersson and O. Fiz-Palacios

Sharp, P. M. & Simmonds, P. (2011). Evaluating the evidence for (2013). Molecular phylogeography of tick-borne encephalitis virus in virus/host co-evolution. Curr Opin Virol 1, 436–441. central Europe. J Gen Virol 94, 2129–2139. Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based Wertheim, J. O. & Kosakovsky Pond, S. L. (2011). Purifying selection phylogenetic analyses with thousands of taxa and mixed models. can obscure the ancient age of viral lineages. Mol Biol Evol 28, 3355– Bioinformatics 22, 2688–2690. 3365. Stringer, C. B. & Andrews, P. (1988). Genetic and fossil evidence for Wirth, T., Hildebrand, F., Allix-Be´ guec, C., Wo¨ lbeling, F., Kubica, T., the origin of modern humans. Science 239, 1263–1268. Kremer, K., van Soolingen, D., Ru¨ sch-Gerdes, S., Locht, C. & other Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. & Kumar, authors (2008). Origin, spread and demography of the Mycobac- S. (2011). MEGA5: molecular evolutionary genetics analysis using terium tuberculosis complex. PLoS Pathog 4, e1000160. maximum likelihood, evolutionary distance, and maximum par- Woelk, C. H. & Holmes, E. C. (2002). Reduced positive selection in simony methods. Mol Biol Evol 28, 2731–2739. vector-borne RNA viruses. Mol Biol Evol 19, 2333–2336. Twiddy, S. S., Holmes, E. C. & Rambaut, A. (2003). Inferring the rate Worobey, M., Telfer, P., Souquie` re, S., Hunter, M., Coleman, C. A., and time-scale of dengue virus evolution. Mol Biol Evol 20, 122– Metzger, M. J., Reed, P., Makuwa, M., Hearn, G. & other authors 129. (2010). Island biogeography reveals the deep history of SIV. Science Uzca´ tegui, N. Y., Sironen, T., Golovljova, I., Ja¨ a¨skela¨ inen, A. E., 329, 1487. Va¨limaa, H., Lundkvist, A., Plyusnin, A., Vaheri, A. & Vapalahti, O. Yang, Z. (1996). Maximum-likelihood models for combined analyses (2012). Rate of evolution and molecular epidemiology of tick-borne of multiple sequence data. J Mol Evol 42, 587–596. encephalitis virus in Europe, including two isolations from the same Zanotto, P. M., Gao, G. F., Gritsun, T., Marin, M. S., Jiang, W. R., focus 44 years apart. J Gen Virol 93, 786–796. Venugopal, K., Reid, H. W. & Gould, E. A. (1995). An cline Wasik, B. R. & Turner, P. E. (2013). On the biological success of across the northern hemisphere. Virology 210, 152–159. viruses. Annu Rev Microbiol 67, 519–541. Zanotto, P. M., Gould, E. A., Gao, G. F., Harvey, P. H. & Holmes, E. C. Weidmann, M., Frey, S., Freire, C. C. M., Essbauer, S., Ru˚ zˇek, D., (1996). Population dynamics of flaviviruses revealed by molecular Klempa, B., Zubrikova, D., Vo¨ gerl, M., Pfeffer, M. & other authors phylogenies. Proc Natl Acad Sci U S A 93, 548–553.

1982 Journal of General Virology 95