DISCLAIMER

This paper was submitted to the Bulletin of the World Health Organization and was posted to the Zika open site, according to the protocol for public health emergencies for international concern as described in Christopher Dye et al. ( http://dx.doi.org/10.2471/BLT.16.170860 ).

The information herein is available for unrestricted use, distribution and reproduction in any medium, provided that the original work is properly cited as indicated by the Creative Commons Attribution 3.0 Intergovernmental Organizations licence (CC BY IGO 3.0).

RECOMMENDED CITATION

Shrinet J, Agrawal A, Bhatnagar RK, Sujatha Sunil S. Analysis of the genetic divergence in Asian strains of ZIKA with reference to 2015-2016 outbreaks. [Submitted]. Bull World Health Organ. E-pub: 22 Apr 2016. doi: http://dx.doi.org/10.2471/BLT.16.176065

Analysis of the genetic divergence in Asian strains of with reference to 2015-2016 outbreaks Jatin Shrinet, a Aditi Agrawal, a Raj K Bhatnagar a & Sujatha Sunil a aInternational Centre for Genetic Engineering and Biotechnology, New Delhi-110067, India Correspondence to: Sujatha Sunil (e-mail: [email protected]) (Submitted: 20 April 2016 – Published online: 22 April 2016)

Abstract Objective: To compare Zika virus (ZIKV) genomes of the 2015-2016 outbreaks with the older strains and evaluate evolution of ZIKV. Method: We performed several genetic analyses to 50 ZIKV genomes currently available in the public domain. Phylogenetic and mutation analysis, recombination analysis, molecular evolution and selection analysis identified amino acid variations that were unique to the 2015-2016 outbreak strains and the status of recombination and evolution amongst these sequences. Findings: We report distinct amino acid variations in the structural and non- structural proteins of all 2015-2016 outbreak strains that are conserved amongst these strains. Our results also reveal unique motifs in the UTRs of the new ZIKV strains. We identified recombination events in the African strains but not in the recent isolates of Asian lineage. Population level analysis revealed over dominant selection of alleles in the genome. Conclusion: 2015-2016 strains of ZIKV show distinct molecular signatures in their genomes that are conserved across strains isolated from different parts of the globe during the outbreak period. Our analysis at the population level emphasizes on a possibility of balancing selection of the alleles.

Introduction are an important group of of medical relevance due to the wide range of illnesses they cause. In the last two decades, infections caused by these viruses have been major public health concerns resulting in pandemics and epidemics (1, 2). The latest addition to this list is Zika virus (ZIKV) with the World Health Organization declaring Zika fever (ZF) as a Public Health Emergency of International concern due to its possible association with neurological and birth conditions(3).

Zika virus is a member of the genus , family (4)that has other medically important like dengue, , West Nile, viruses. Originally maintained in a sylvatic cycle (5), the first virus was isolated from a Macaca monkey in 1947 in the Zika forest region of Uganda (6). In these conditions humans are considered to be incidental hosts; however, in the absence of non-human primates, humans probably serve as the primary amplification hosts (7). The first human case was reported in 1954 in Nigeria (8),and sporadic cases have been reported from different regions around the globe over the years (9-12). In addition to clinical cases, isolation of ZIKV from vectors has also been reported (13-15). The ZIKV genome consists of a 10794 bps long single stranded RNA of positive sense encoding a single open reading frame (ORF). Flanked by two non-coding regions (5’ and 3’ untranslated regions), the ORF encodes a polyprotein: C-prM-E-NS1-NS2A-NS2B-NS3-NS4A- NS4b-NS5, which is cleaved into three structural proteins, namely, capsid (C), premembrane/membrane (prM) and envelope (E) and seven non-structural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B, NS5) (16, 17). Based on serologic and genetic properties, three lineages, namely, East African, West African and Asian, have been identified (18).

In 2015, the Americas witnessed a huge outbreak of ZF with neurological implications and symptoms of Guillian-Barre syndrome in affected individuals (19).Epidemiological studies reveal the transmission to have originated on Yap in Micronesia in 2007 (18) that spread to other Pacific islands (20) and to South and Central America (21). With the rapid spread of this virus to several parts of the globe, it is imperative to understand the cause of spread. Until 2012, there were eight genomes available; however, post 2012, 42 genomes have been reported in the public domain (till 20 th March 2016) of which 25 genomes reported post January 2016. Analyzing these genomes at a molecular level may reveal the genetic divergence the newer viruses may exhibit thereby providing insights to the evolution of the virus. The present report is a bioinformatics characterization of the genomes of ZIKV isolated post 2015 and comparison with the older strains of ZIKV.

Materials and Methods

Genome sequences and Phylogenetic analysis A total of 50 genome sequences of ZIKV were retrieved from NCBI database. The sequences were multiple aligned and manually edited to discard any aberration in the sequences. Twelve different gene sequences of ZIKA virus namely, Capsid, pr, M, Envelope (E), NS1, NS2A, NS2B, NS3, NS4A, 2K, NS4B and NS5 were extracted from the multiple aligned genome sequences and were further used for analysis of variations in the proteins. The phylogenetic analysis of trimmed genome sequences were performed using MEGA6 tool (22). Neighbor- joining method, Minimum Evolution method with Gamma parameter 1 and 100 bootstrap replications, Maximum likelihood method, UPGMA method and Maximum Parsimony method models were used to construct the phylogenetic tree. Phylogeny test was performed using bootstrap method and by taking 1000 number of bootstrap replications.

UTR analysis 5’ UTR and 3’ UTR sequences were extracted from the genome and aligned using MEGA6. UTR sequences were not present for all the genomes and also some of the genomes have short UTR sequences. The aligned sequences were then analyzed to study the conservation of residues. The multiple aligned sequences were also subjected to RNAalifold web server to predict consensus secondary structures of both the UTR sequences(23).

Recombination analysis The multiple aligned ZIKV genome sequences were subjected to recombination analysis using RDP tool (24). RDP analyze the sequences using 7 methods namely, RDP (R), GENECONV (G), MaxChi (M), Chimaera (C), Bootscan (B), 3Seq (T) and SiScan (S). The events predicted by more than 5 methods and without any unknown parent and p-value<0.05 were considered recombination event.

Molecular evolution and selection analysis Transition/Transversion bias, Substitution matrix, overall means distance variations were calculated using MEGA 6. Tajima’s test of neutrality was also performed using MEGA6 tool.

Results and Discussion Fifty ZIKV genome details that were used in the study are listed in Supplementary Table 1. Of these sequences, 15 were belonged to year 2015; ten belonged to 2016 (as of March, 2016). Amongst the remaining sequences, two sequences each were isolated in the years 2014, 2013, 2001, 1968 and 1974. One sequence each was reported from years 2012, 2010, 2007, 2000, 1997, 1984, 1976 and 1966. Information about the isolation date was not available for seven sequences. The geographical distribution of these sequences showed that nine sequences were from Brazil and all were isolated in year 2015. Two sequences of 2015 were isolated from Guatemala and one sequence each of 2015 belongs to Suriname, Puerto Rico, Martinique and Colombia. Several of these sequences have been previously used to study molecular evolution of ZIKV in the earlier years (25, 26). The phylogenetic tree of 50 ZIKV sequences was constructed using Neighbor-joining methods (Figure 1). The tree was also constructed using other methods, namely, Minimum Evolution method with Gamma parameter 1 and 100 bootstrap replications, Maximum likelihood method, UPGMA method and Maximum Parsimony method with 1000 bootstrap replications (Supplementary Figure 1). The sequences from 2015-2016 showed similarity to Asian lineage and grouped in the same clade. These results showed that the Asian strain has caused the recent outbreak in western part of the world as reported by others (27).

To study the molecular variations specific to Asian strains, Malaysian isolate (HQ234499.1; 1966) (13) was used as reference for all further analyses. Sequence comparison of structural and non-structural ZIKA virus proteins revealed several variations in the 2015-2016 genomes that are discussed in detail below.

Sequence analysis of the 2015-2016 isolates with Asian genotype

Structural region Year 2015 and 2016 outbreak samples (n=25) were compared against the year 1966 sequence from Malaysia. Nucleotide variations were too numerous to discuss here. With respect to amino acid variations, structural proteins showed several variations in their sequences revealing high mutational rate of the new ZIKV strains (28). Variations observed were classified into two categories, those that were seen in all 2015-2016 samples and variations that were strain-specific. Both these categories will be discussed in detail in the following sections. For better clarity of analyzing common variations in the samples, consensus sequences were acquired for each region year-wise and compared with the reference sequence (Table 1a). Capsid showed variations at five aa positions, namely, N25S, L27F, R101K, I110V and I113V in all the sequences. Amino acid variations in individual samples are listed in Table 1b. In Capsid, apart from the above- mentioned variations, sequence KU729218.1 (from Brazil) showed variation at G105S. Five samples, KU647676.1 (from Martinique), KU820897.1 (from Colombia), 820898.1 (from China), KU922960.1, KU922923.1 (from Mexico) showed variation at position D107E. Sequences KU866423.1, KU820899.2 (from China) showed variation at position S109N. Amino acid E76D was seen in KU744693.1 (from China).

Sequence comparison of pr protein showed three aa variations, namely, V1A, S17N, V31M in all the 2015 and 2016 sequences (Table 1a). Sequence KU312312.1 (from Suriname) showed an additional change at M44T. No changes in M protein showed a single amino acid variation, P72L in a sequence from Mexico (KU922923.1) (table 1b).

Envelope protein of 2015-2016 isolates of ZIKA virus when compared to the reference Malaysian strain revealed changes at three positions, D393E, V473M and T487M in all sequences (Table 1a). Amino acid variations T47S, S64T, M68I and V255A were observed in KU729217.2 (from Brazil). Sequences KU729218.1 and KU497555.1 (from Brazil) showed M349T and S260T changes respectively. Sequences KU501216.1 and KU501217.1 (from Guatemala) showed V56I variation, isolate KU312312.1 (from Suriname) showed T479A, sequences KU866423.1 and KU820899.2 (from China) displayed K419R variation in their respective genomes. Of special mention is one isolate from China (KU744693.1) that displayed a total of 12 aa variations including 3 conserved changes (Table 1b).

Non-structural region The non-structural protein sequences comparison of 25 isolates of 2016 (10) and 2015 (15) with the reference sequence from Malaysia (HQ234499.1) indicates that the non-structural proteins of ZIKA virus is more conserved than the structural proteins. Non-structural proteins namely, NS1, NS2A, NS2B and NS3 showed very few conserved changes as compared to NS4B and NS5 which showed 7 and 15 aa variations respectively.

NS1 showed two changes namely A188V and V264M that were present in all 25 sequences (Table 2a). Sequences KU729217.1 and KU321639.1 (from Brazil) has additional aa variations at G190E and Y122H respectively (Table 2b). Some sequences revealed two types of aa variations at position R324. While sequences KU647676.1 (from Martinique), KU922923.1 and KU922960.1 (from Mexico) and KU820897.1 (from Colombia) showed R324W, KU866423.1 and KU820899.2 (from China) had R324Q instead of R324W indicating the evolving nature of the site. KU853013.1 (Italy) has an additional variation M349V. Sequences KU501216.1 and KU501217.1 (from Guatemala) showed an additional mutation at position G100A. One isolate from China (KU744693.1) showed a total of seven variations including two conserved variations (Table 2b).

NS2A has only one conserved change A143V that was present in all the sequences. Analyses of individual protein sequences showed some additional variations - L113F in KU49755.1 (Brazil), I80T in KU647676.1 (Martinique), KU922923.1 and KU922960.1 (from Mexico) Variation I139V was present in three sequences of China (KU820898.1, KU740184.2 and KU761564.1). NS2B protein was found to be conserved when 2015-2016 sequences were compared against reference sequence with one exception of KU729217.2 (Brazil) with variation M32I.

NS3 sequence has two conserved variation i.e., N400H and M472L seen in all sequences (Table 2a). Apart from these positions, isolates KU729218.1 and KU321639.1 (from Brazil) showed M334T and H355Y amino acid residue variations. Likewise both the sequences KU501216.1 and KU501217.1 (from Guatemala) have a variation at position M572L. Isolates KU922960.1 and KU922923.1 (from Mexico) showed A106E variation (Table 2b).

NS4B protein has changes at seven positions namely, G14S, M26I, L49F, M98I, I180V, V184I and L186S in all sequences analyzed. Apart from these changes, only KU321639.1 (Brazil) at one position I176M and KU744693.1 (China) at four additional sites, A44P, T48S, D150E and I176M showed variations. NS5 protein showed 15 numbers of evolved sites and this could be due to its large size of 902 amino acid residues. Details of all the aa variations in the non structural proteins are listed in Table 2a and 2b.

5’ and 3’ UTRs Untranslated regions (5’ and 3’) are known to play important roles in flavivirus replication and virulence (29). The untranslated regions (UTR) sequences of ZIKA virus from recent outbreak were aligned and checked for variations. Malaysian strain did not have 5’ and 3’ UTR sequence available for analysis and also UTR information were absent for two sequences each from 2015 and 2016 isolates respectively. The analysis revealed that both UTR sequences (5’ UTR and 3’ UTR) were mostly conserved. The sequences were also subjected to UTRscan web server to predict any conserved UTR motif. UTRscan analysis showed presence of two motifs in the 3 ‘UTR sequence, namely, uORF (upstream open reading frame) and MBE (Musashi binding element) and no motifs were detected in 5’ UTR. Analysis of these motifs revealed that there are nomenclature differences between prediction softwares and literature and uORF nomenclature was homologous to dORF (downstream open reading frame) that was observed in the case of 3’ UTR (30). This motif has been reported in flaviviruses for the first time in this study even though it has been shown to be present in mammalian UTRs and is found to be conserved thereby highlighting their importance (30). The relevance of dORF in ZIKV warrants in-depth functional studies. MBE, earlier referred to as polyadenylation response element are also known to play part in temporal regulation in Xenopus (31, 32). Studies have shown the importance of this conserved domain in promoting RNA genome cyclization (29, 33). In addition, secondary structures of both the UTR sequences were predicted. RNAalifold was used for this purpose and the structures are detailed in Figure 2. The results revealed that was conservation of the structures as previously shown in a recent study (34).

Analysis of Recombination events Recombination analyses were performed using RDP4 tool on all the 50 ZIKV genomes. Total of 11 events were predicted by more than five algorithms (p-value<0.05). Out of these, six events were shown to be having one of the unknown parents, so they were not considered for further analysis. The remaining five events consist of eight sequences namely, KF383115.1, KF383116.1, KF383117.1, KF383118.1, KF383120.1, KF383121.1, HQ234501.1 and HQ234498.1 (Table 3a). Further analysis showed that these sequences belong to African strains (East African and West African). Recombination analyses were also performed for the individual genes to predict the presence of any recombination event (Table 3b). The analysis done using all algorithms with the above mentioned criteria showed one recombination event each for Envelope and NS1 genes and two in NS3. In the case of Envelope, isolate KF383118.1 was a recombinant with site 1-459 and 1041-1512 from major parent (KF383117.1) and site 460-1040 from minor parent (LC002520.1). For NS1, recombinant was KF383117.1 (site 2-641) and minor and major parent were KF383119.1 and KF383116.1 respectively. NS3 showed two events, out of which one event showed KF383117.1 as recombinant (site 610-1070) with HQ23450.1 as major parent and KF383119.1 as minor parent. In second NS3 event, KF383116.1 was recombinant (site 732- 1035) with HQ234501.1 as major parent and KF383119.1 as minor parent. The p-value for all the events was found to be significant p-value<=0.05) for all the events. This result clearly indicates that recombination events are only present in African isolates and absent in Asian lineage at present. While studies have highlighted that flaviviruses have infrequent recombination events in the field (35), a study have provided evidence of the presence of such events in ZIKV (25).

Molecular Evolution - Selection test The estimated Transition/Transversion bias value is 6.00. Substitution pattern and rates were estimated under the Kimura-2 parameter model (+G+I). Selection analysis of genome sequences was performed using Tajima’s neutrality test involving 50 nucleotide sequences. Tajima’s test showed nucleotide diversity of 0.064089 and D value of 0.124450. Positive value of Tajima’s D test suggests over dominant selection of these alleles in the population resulting on negative selection (36, 37). Several studies have emphasized on the infection and transmission modes to influence accumulation of negatively selected sites (25, 38).

Conclusion In conclusion, our study is a comprehensive analysis of ZIKV genomes available till date. With ZIKV infection spreading across the globe at an alarming rate, it is important to understand the underlying molecular mechanisms that could aid the spread. Our analysis reveals balancing selection of the identified amino acid variations thereby favoring fitness to the strains.

Acknowledgements We thank ICGEB for the support. This work was supported by ICGEB core funds.

References 1. Weaver SC, Forrester NL. Chikungunya: Evolutionary history and recent epidemic spread. Antiviral research. 2015;120:32-9. 2. Wilson ME, Chen LH. Dengue: update on epidemiology. Current infectious disease reports. 2015;17(1):1-8. 3. Gulland A. Zika virus is a global public health emergency, declares WHO. Bmj. 2016;352:i657. 4. Thiel H, Collett M, Gould E, Heinz F, Houghton M, Meyers G, et al. Family flaviviridae. Virus taxonomy, VIII report of the International Committee on Taxonomy of Viruses, Academic Press, San Diego. 2005:981-98. 5. Haddow A, Williams M, Woodall J, Simpson D, Goma L. Twelve isolations of Zika virus from Aedes (Stegomyia) africanus (Theobald) taken in and above a Uganda forest. Bulletin of the World Health Organization. 1964;31(1):57. 6. Dick G, Kitchen S, Haddow A. Zika virus (I). Isolations and serological specificity. Transactions of the Royal Society of Tropical Medicine and Hygiene. 1952;46(5):509-20. 7. Duffy MR, Chen T-H, Hancock WT, Powers AM, Kool JL, Lanciotti RS, et al. Zika virus outbreak on Yap Island, federated states of Micronesia. New England Journal of Medicine. 2009;360(24):2536-43. 8. Macnamara F. Zika virus: a report on three cases of human infection during an epidemic of jaundice in Nigeria. Transactions of the Royal Society of Tropical Medicine and Hygiene. 1954;48(2):139-45. 9. Fagbami A. Zika virus infections in Nigeria: virological and seroepidemiological investigations in Oyo State. Journal of Hygiene. 1979;83(02):213-9. 10. Foy BD, Kobylinski KC, Chilson Foy JL, Blitvich BJ, Travassos da Rosa A, Haddow AD, et al. Probable non-vector-borne transmission of Zika virus, Colorado, USA. Emerg Infect Dis. 2011;17(5):880-2. 11. Heang V, Yasuda CY, Sovann L, Haddow AD, Travassos da Rosa AP, Tesh RB, et al. Zika virus infection, Cambodia, 2010. Emerg Infect Dis. 2012;18(2):349-51. 12. Smithburn K. Neutralizing antibodies against -borne viruses in the sera of long-time residents of Malaya and Borneo. American journal of hygiene. 1954;59(2):157-63. 13. Marchette N, Garcia R, Rudnick A. Isolation of Zika virus from Aedes aegypti mosquitoes in Malaysia. American Journal of Tropical Medicine and Hygiene. 1969;18(3):411-5. 14. Dakar IPd. WHO collaborating center for reference and research on arboviruses and hemorrhagic fever viruses: Annual report. Dakar, Senegal. 1999:143. 15. Monlun E, Zeller H, Le Guenno B, Traore-Lamizana M, Hervy J, Adam F, et al. [Surveillance of the circulation of of medical interest in the region of eastern Senegal]. Bulletin de la Societe de pathologie exotique (1990). 1992;86(1):21-8. 16. Chambers TJ, Halevy M, Nestorowicz A, Rice CM, Lustig S. envelope proteins: nucleotide sequence analysis of strains differing in mouse neuroinvasiveness. Journal of General Virology. 1998;79(10):2375-80. 17. Kuno G, Chang G-J. Full-length sequencing and genomic characterization of Bagaza, Kedougou, and Zika viruses. Archives of virology. 2007;152(4):687-96. 18. Lanciotti RS, Kosoy OL, Laven JJ, Velez JO, Lambert AJ, Johnson AJ, et al. Genetic and serologic properties of Zika virus associated with an epidemic, Yap State, Micronesia, 2007. Emerg Infect Dis. 2008;14(8):1232-9. 19. European Centre for Disease Prevention and Control.Rapid risk assessment:Zika virus epidemic in the Americas:potential association with microcephaly and Guillain-Barre syndrome. ECDC. 2015. From:http://ecdc.europa.eu/en/publications/_layouts/forms/Publication_DispForm .aspx?List=4f55ad51-4aed-4d32-b960-af70113dbb90&ID=1413 20. Roth A, Mercier A, Lepers C, Hoy D, Duituturaga S, Benyon E, et al. Concurrent outbreaks of dengue, chikungunya and Zika virus infections-an unprecedented epidemic wave of mosquito-borne viruses in the Pacific 2012-2014. Euro Surveill. 2014;19(41):20929. 21. Zanluca C, Melo VCAd, Mosimann ALP, Santos GIVd, Santos CNDd, Luz K. First report of autochthonous transmission of Zika virus in Brazil. Memórias do Instituto Oswaldo Cruz. 2015;110(4):569-72. 22. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Molecular biology and evolution. 2013:mst197. 23. Bernhart SH, Hofacker IL, Will S, Gruber AR, Stadler PF. RNAalifold: improved consensus structure prediction for RNA alignments. BMC bioinformatics. 2008;9(1):1. 24. Martin DP, Murrell B, Golden M, Khoosal A, Muhire B. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evolution. 2015;1(1):vev003. 25. Faye O, Freire CC, Iamarino A, Faye O, de Oliveira JVC, Diallo M, et al. Molecular Evolution of Zika Virus during Its Emergence in the 20 th Century. PLoS Negl Trop Dis. 2014;8(1):e2636. 26. Haddow AD, Schuh AJ, Yasuda CY, Kasper MR, Heang V, Huy R, et al. Genetic characterization of Zika virus strains: geographic expansion of the Asian lineage. PLoS Negl Trop Dis. 2012;6(2):e1477. 27. Lazear HM, Stringer EM, de Silva AM. The Emerging Zika Virus Epidemic in the Americas: Research Priorities. JAMA. 2016. 28. Logan IS. ZIKA-How Fast Does This Virus Mutate? bioRxiv. 2016:040303. 29. Villordo SM, Gamarnik AV. Genome cyclization as strategy for flavivirus RNA replication. Virus research. 2009;139(2):230-9. 30. Crowe ML, Wang X-Q, Rothnagel JA. Evidence for conservation and selection of upstream open reading frames suggests probable encoding of bioactive peptides. Bmc Genomics. 2006;7(1):1. 31. Charlesworth A, Ridge JA, King LA, MacNicol MC, MacNicol AM. A novel regulatory element determines the timing of Mos mRNA translation during Xenopus oocyte maturation. The EMBO journal. 2002;21(11):2798-806. 32. Charlesworth A, Wilczynska A, Thampi P, Cox LL, MacNicol AM. Musashi regulates the temporal order of mRNA translation during Xenopus oocyte maturation. The EMBO journal. 2006;25(12):2792-801. 33. Polacek C, Foley JE, Harris E. Conformational changes in the solution structure of the 5 ′ end in the presence and absence of the 3 ′ untranslated region. Journal of virology. 2009;83(2):1161-6. 34. Zhu Z, Chan JF-W, Tee K-M, Choi GK-Y, Lau SK-P, Woo PC-Y, et al. Comparative genomic analysis of pre-epidemic and epidemic Zika virus strains for virological factors potentially associated with the rapidly expanding epidemic. Emerging Microbes & Infections. 2016;5(3):e22. 35. Cook S, Moureau G, Kitchen A, Gould EA, de Lamballerie X, Holmes EC, et al. Molecular evolution of the insect-specific flaviviruses. Journal of General Virology. 2012;93(2):223-34. 36. Holsinger K. Tajima's D, Fu's FS, Fay and Wu's H, and Zeng et al.'s E. 2013. 37. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123(3):585-95. 38. Hanada K, Suzuki Y, Gojobori T. A large variation in the rates of synonymous substitution for RNA viruses and its relationship to a diversity of viral infection and transmission modes. Molecular biology and evolution. 2004;21(6):1074-80.

Figure Legends Figure 1. The phylogenetic tree constructed using Neighbor-Joining method is represented in the figure. Bootstrap values are written next to the branches. For computing evolutionary distance, maximum Likelihood method was used. Asian strains and African strains formed two distinct clusters and the tree is rooted using Spondweni Virus as outgroup.

Figure 2. The figure represents the consensus secondary structure of UTR generated using RNAalifold tool. The bases written in black font are conserve and the bases written in grey are absent or not sequenced in some of the isolates. a) Consensus secondary structure of 5’ UTR. b) Consensus secondary structure of 3’ UTR.

Table 1a. The table represents the mutation identified in the consensus sequences of the structural protein of the isolates of year 2015- 2016. The number of sequences used in the consensus for each region is also shown. The mutations were identified by comparing the sequences with Malaysian isolate. Protein Polypeptide Protein Malasiya French- Puertrico Brazil Martinque Colambia Guatemala Suriname Mexico China Italy position position 1966 polynasia 2015 2015 2015 2015 2015 2015 2016 2016 2016 (n=1) 2013 (n=1) (n=1) (n=9) (n=1) (n=1) (n=2) (n=1) (n=2) (n=6) (n=2) Capsid 25 25 N . S S S S S S S S S 27 27 L . F F F F F F F F F 76 76 E ...... E/D . 101 101 R . K K K K K K K K K 105 105 G . . G/S ...... 107 107 D . . . E E . . E D/E . 109 109 S ...... S/N . 110 110 I . V V V V V V V V V 113 113 I . V V V V V V V V V pr 123 1 V A A A A A A A A A A 139 17 S N N N N N N N N N N 153 31 V M M M M M M M M M M 166 44 M ...... T . . . M 287 72 P ...... P/L . . Envelope 323 33 V ...... V/A . 337 47 T . . T/S ...... 346 56 V . . . . . I . . . . 354 64 S . . T/S ...... 358 68 M . . I/M ...... 442 152 I ...... I/L . 503 213 V ...... V/A . 520 230 D ...... D/A . 545 255 V . . V/A ...... 550 260 S . . S/T ...... 612 322 L ...... L/V . 613 323 H ...... H/D . 620 330 V ...... V/G . 623 333 A ...... A/G . 639 349 M . . M/T ...... 683 393 D E E E E E E E E E E 709 419 K ...... K/R . 739 449 F ...... F/I . 763 473 V M M M M M M M M M M 769 479 T ...... A . . . 777 487 T M M M M M M M M M M

Table 1b. The table represents the mutation identified in the structural protein sequences of the isolates of year 2015-2016. The mutations were identified by comparing the sequences with Malaysian isolate. Protein Capsid pr M E KU501215.1_Puertrico_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU729217.2_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - T47S, S64T, M68I, V255A, D393E, V473M, I110V, I113V V31M T487M KU707826.1_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU729218.1_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - M349T, D393E, V473M, T487M G105S, I110V, I113V V31M KU321639.1_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU497555.1_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - S260T, D393E, V473M, T487M I110V, I113V V31M KU365780.1_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU365779.1_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU365778.1_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU365777.1_Brazil_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU647676.1_Martinque_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M D107E, I110V, I113V V31M KU820897.1_Colambia_2015 N25S, L27F, D107E, V1A, S17N, - D393E, V473M, T487M R101K, I110V, I113V V31M KU501217.1_Guatemala_2015 N25S, L27F, R101K, V1A, S17N, - V56I, D393E, V473M, T487M I110V, I113V V31M KU501216.1_Guatemala_2015 N25S, L27F, R101K, V1A, S17N, - V56I, D393E, V473M, T487M I110V, I113V V31M KU312312.1_Suriname_2015 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T479A, T487M I110V, I113V V31M, M44T KU922960.1_Mexico_2016 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M D107E, I110V, I113V V31M KU922923.1_Mexico_2016 N25S, L27F, R101K, V1A, S17N, P72L D393E, V473M, T487M D107E, I110V, I113V V31M KU866423.1_China_2016 N25S, L27F, R101K, V1A, S17N, - D393E, K419R, V473M, T487M S109N, I110V, I113V V31M KU820898.1_China_2016 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M D107E, I110V, I113V V31M KU740184.2_China_2016 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU820899.2_China_2016 N25S, L27F, S109N, V1A, S17N, - D393E, K419R, V473M, T487M R101K, I110V, I113V V31M KU761564.1_China_2016 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU744693.1_China_2016 N25S, L27F, E76D, V1A, S17N, - V33A, I152L, V213A, D230A, L322V, H323D, R101K, I110V, I113V V31M V330G, A333G, D393E, F449I, V473M, T487M KU853013.1_Italy_2016 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M KU853012.1_Italy_2016 N25S, L27F, R101K, V1A, S17N, - D393E, V473M, T487M I110V, I113V V31M

Table 2a. The table represents the mutation identified in the consensus sequences of non-structural protein of the isolates of year 2015-2016. The number of sequences used in the consensus for each region is also shown. The mutations were identified by comparing the sequences with Malaysian isolate. Protein Polypeptide Protein Malasiya French- Puertrico Brazil Martinque Colambi Guatemala Surinam Mexico China Italy position position 1966 polynasia 2015 2015 2015 a 2015 e 2016 2016 2016 (n=1) 2013 (n=1) (n=9) (n=1) 2015 (n=2) 2015 (n=2) (n=6) (n=2) (n=1) (n=1) (n=1)

NS1 795 1 D ...... D/G .

894 100 G . . . . . A . . . .

916 122 Y . . Y/H ......

970 176 S ...... S/W .

982 188 A V V V V V V V V V V

984 190 G . . G/E ......

1005 211 R ...... R/W .

1050 256 T ...... T/A .

1058 264 V M M M M M M M M M M

1107 313 C ...... C/S .

1118 324 R . . . W W . . W R/Q .

1143 349 M . . M/V ...... V

NS2A 1226 80 I . . . T . . . T . .

1259 113 L . . L/F ......

1285 139 I ...... I/V .

1289 143 A V V V V V V V V V V

NS2B 1404 32 M . . M/I ...... NS3 1608 106 A ...... E . .

1836 334 M . . M/T ......

1856 354 D ...... D/E .

1857 355 H . . H/Y . . . . . H/Y .

1867 365 S ...... S/R .

1902 400 N H H H H H H H H H H

1938 436 D ...... D/G .

1974 472 M L L L L L L L L L L

2027 525 R ...... R/K .

2074 572 M . . . . . L . . . .

NS4B 2283 14 G S S S S S S S S S S

2295 26 M I I I I I I I I I/M I

2313 44 A ...... A/P .

2317 48 T ...... T/S .

2318 49 L F F F F F F F F F F

2367 98 M I I I I I I I I I I

2419 150 D ...... D/E .

2445 176 I . . I/M . . . . . I/M .

2449 180 I V V V V V V V V V V

2453 184 V I I I I I I I I I I

2455 186 L S S S S S S S S S S

NS5 2611 91 A . V ...... 2634 114 T M V V V V V V V M/V V

2644 124 V ...... V/I .

2659 139 S P P P P P P P P P P

2694 174 K . . . . . R . . . .

2749 229 I T T T T T T T T T/I T

2778 258 N . . N/D ......

2787 267 A V V V V V V V V V/A V

2795 275 L M M M M M M M M M M

2800 280 N . . N/D ......

2802 282 V I I I I I I I I I I

2807 287 S ...... S/A .

2809 289 H ...... Q H/K .

2831 311 E . . E/V . . . . . E/D .

2833 313 P ...... P/A .

2842 322 I ...... V

2896 376 N S S S S S S S S S S

2974 454 N ...... N/I .

2975 455 M ...... M/T .

3030 510 G ...... V . .

3045 525 R . . . . . C . . . .

3046 526 T I I I I I I I I I I

3050 530 K R R R R R R R R R R 3107 587 R K K K K K K K K K K

3144 624 N ...... S/N .

3162 642 P S S S S S S S S S S

3167 647 S N N N N N N N N N N

3190 670 K ...... R/K .

3223 703 S D D D D D D D D D D

3239 719 Y H H H H H H H H H H

3334 814 V ...... V/A .

3353 833 T . . . A A . . A . .

3387 867 D N N N N N N N N N N

3392 872 V ...... V/M .

3398 878 D ...... E

3403 883 M ...... M/V .

Table 2b. The table represents the mutation identified in the non-structural protein sequences of the individual isolates of year 2015- 2016. The mutations were identified by comparing the sequences with Malaysian isolate. Sequences/Proteins NS1 NS2A NS2B NS3 NS4B NS5 KU501215.1_Puertrico_2015 A188V, A143V - N400H, G14S, M26I, L49F, A91V, T114V, S139P, I229T, A267V, V264M M472L M98I, I180V, L275M, V282I, N376S, T526I, K530R, V184I, L186S R587K, P642S, S647N, S703D, Y719H, D867N KU729217.2_Brazil_2015 A188V, A143V M32I N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, G190E, M472L M98I, I180V, N280D, V282I, N376S, T526I, K530R, V264M, V184I, L186S R587K, P642S, S647N, S703D, Y719H, M349V D867N KU707826.1_Brazil_2015 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M M472L M98I, I180V, V282I, N376S, T526I, K530R, R587K, V184I, L186S P642S, S647N, S703D, Y719H, D867N KU729218.1_Brazil_2015 A188V, A143V - M334T, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M N400H, M98I, I180V, V282I, N376S, T526I, K530R, R587K, M472L V184I, L186S P642S, S647N, S703D, Y719H, D867N KU321639.1_Brazil_2015 Y122H, A143V - H355Y, G14S, L49F, M98I, T114V, S139P, I229T, A267V, L275M, A188V, N400H, I176M, I180V, V282I, N376S, T526I, K530R, R587K, V264M M472L V184I, L186S P642S, S647N, S703D, Y719H, D867N KU497555.1_Brazil_2015 A188V, L113F, - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M A143V M472L M98I, I180V, V282I, E311V, N376S, T526I, K530R, V184I, L186S R587K, P642S, S647N, S703D, Y719H, D867N KU365780.1_Brazil_2015 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, N258D, A267V, V264M M472L M98I, I180V, L275M, V282I, N376S, T526I, K530R, V184I, L186S R587K, P642S, S647N, S703D, Y719H, D867N KU365779.1_Brazil_2015 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M M472L M98I, I180V, V282I, N376S, T526I, K530R, R587K, V184I, L186S P642S, S647N, S703D, Y719H, D867N KU365778.1_Brazil_2015 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M M472L M98I, I180V, V282I, N376S, T526I, K530R, R587K, V184I, L186S P642S, S647N, S703D, Y719H, D867N KU365777.1_Brazil_2015 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, N258D, A267V, V264M M472L M98I, I180V, L275M, V282I, N376S, T526I, K530R, V184I, L186S R587K, P642S, S647N, S703D, Y719H, D867N KU647676.1_Martinque_2015 A188V, I80T, - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M, A143V M472L M98I, I180V, V282I, N376S, T526I, K530R, R587K, R324W V184I, L186S P642S, S647N, S703D, Y719H, T833A, D867N KU820897.1_Colambia_2015 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M, M472L M98I, I180V, V282I, N376S, T526I, K530R, R587K, R324W V184I, L186S P642S, S647N, S703D, Y719H, T833A, D867N KU501217.1_Guatemala_2015 G100A, A143V - N400H, G14S, M26I, L49F, T114V, S139P, K174R, I229T, A267V, A188V, M472L, M98I, I180V, L275M, V282I, N376S, R525C, T526I, V264M M572L V184I, L186S K530R, R587K, P642S, S647N, S703D, Y719H, D867N KU501216.1_Guatemala_2015 G100A, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, A188V, M472L, M98I, I180V, V282I, N376S, T526I, K530R, R587K, V264M M572L V184I, L186S P642S, S647N, S703D, Y719H, D867N KU312312.1_Suriname_2015 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M M472L M98I, I180V, V282I, N376S, T526I, K530R, R587K, V184I, L186S P642S, S647N, S703D, Y719H, D867N KU922960.1_Mexico_2016 A188V, I80T, - A106E, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M, A143V N400H, M98I, I180V, V282I, H289Q, N376S, G510V, T526I, R324W M472L V184I, L186S K530R, R587K, P642S, S647N, S703D, Y719H, T833A, D867N KU922923.1_Mexico_2016 A188V, I80T, - A106E, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M, A143V N400H, M98I, I180V, V282I, H289Q, N376S, G510V, T526I, R324W M472L V184I, L186S K530R, R587K, P642S, S647N, S703D, Y719H, T833A, D867N KU866423.1_China_2016 A188V, A143V - N400H, G14S, M26I, L49F, T114M, V124I, S139P, I229T, A267V, V264M, M472L, M98I, I180V, L275M, V282I, N376S, T526I, K530R, R324Q R525K V184I, L186S R587K, N624S, P642S, S647N, K670R, S703D, Y719H, D867N, V872M, M883V KU820898.1_China_2016 A188V, I139V, - N400H, G14S, M26I, L49F, T114V, S139P, L275M, V282I, N376S, V264M A143V M472L M98I, I180V, T526I, K530R, R587K, P642S, S647N, V184I, L186S S703D, Y719H, D867N KU740184.2_China_2016 A188V, I139V, - N400H, G14S, M26I, L49F, T114V, S139P, L275M, V282I, N376S, V264M A143V M472L M98I, I180V, T526I, K530R, R587K, P642S, S647N, V184I, L186S S703D, Y719H, D867N KU820899.2_China_2016 A188V, A143V - N400H, G14S, M26I, L49F, T114M, S139P, I229T, A267V, L275M, V264M, M472L M98I, I180V, V282I, N376S, T526I, K530R, R587K, R324Q V184I, L186S N624S, P642S, S647N, K670R, S703D, Y719H, D867N KU761564.1_China_2016 A188V, I139V, - N400H, G14S, M26I, L49F, T114V, S139P, L275M, V282I, N376S, V264M A143V M472L M98I, I180V, T526I, K530R, R587K, P642S, S647N, V184I, L186S S703D, Y719H, D867N KU744693.1_China_2016 D1G, S176W, A143V - D354E, G14S, A44P, T48S, T114V, S139P, I229T, A267V, L275M, A188V, H355Y, L49F, M98I, V282I, S287A, H289K, E311D, P313A, R211W, S365R, D150E, I176M, N376S, N454I, T526I, K530R, R587K, T256A, N400H, I180V, V184I, P642S, S647N, S703D, Y719H, V814A, V264M, C313S D436G, L186S D867N M472L KU853013.1_Italy_2016 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M, M472L M98I, I180V, V282I, I322V, N376S, T526I, K530R, M349V V184I, L186S R587K, P642S, S647N, S703D, Y719H, D867N, D878E KU853012.1_Italy_2016 A188V, A143V - N400H, G14S, M26I, L49F, T114V, S139P, I229T, A267V, L275M, V264M M472L M98I, I180V, V282I, I322V, N376S, T526I, K530R, V184I, L186S R587K, P642S, S647N, S703D, Y719H, D867N, D878E

Table 3a. Recombination analysis of whole genome sequences of ZIKA virus. ‘+’ sign represents the prediction of event by respective method and ‘-‘symbol represent no result predicted by respective method. Recombinant Major Minor parent RDP GENECONV BootScan MaxiChi (P- Chimaera (P- SiScan 3Seq parent (P-Val) (P-Val) (P-Val) Val) Val) (P-Val) (P-Val)

KF383117.1 KF383116.1 KF383115.1 + (6.975E-32) + (2.081E-32) - (NA) + (8.794E-7) + (9.072E-12) + (7.309E-11) + (1.917E-9)

KF383118.1 KF383121.1 KF383117.1 + (4.050E-31) + (1.319E-26) - (NA) + (4.147E-8) + (7.136E-8) + (7.859E-8) + (4.460E-4)

KF383117.1 HQ234501.1 KF383121.1 + (7.007E-19) + (1.688E-18) - (NA) + (1.461E-5) + (1.020E-5) + (7.178E-7) + (2.394E-8)

KF383117.1 KF383116.1 KF383118.1 + (5.698E-22) + 5.905E-20) - (NA) + (7.158E-7) + (5.999E-7) + (2.570E-7) + (2.754E-3)

KF383118.1 HQ234498.1 KF383120.1 + (3.315E-19) + (2.896E-9) - (NA) + (3.221E-7) - (NA) + (5.905E-3) + (1.334E-2)

Table 3b. The recombination analysis results of individual genes of ZIKA virus. ‘+’ sign represents the prediction of event by respective method and ‘-‘symbol represent no result predicted by respective method. Genes Recombinant Major Minor RDP (P-Val) GENECONV BootScan MaxiChi Chimaera SiScan (P- 3Seq parent parent (P-Val) (P-Val) (P-Val) (P-Val) Val) (P-Val)

Envelope KF383118.1 LC002520.1 KF383117.1 + (1.065E-10) + (6.088E-10) + (7.162E-10) + (6.415E-10) + (4.014E-10) + (7.242E-10) + (5.422E-12)

NS1 KF383117.1 KF383116.1 KF383119.1 + (1.553E-07) + (6.592E-06) + (1.587E-05) + (1.418E-03) + (4.026E-07) + (2.529E-07) + (1.290E-12)

NS3 KF383117.1 HQ234501.1 KF383119.1 + (2.504E-13) + (6.474E-11) + (5.521E-09) + (7.745E-10) + (6.190E-10) + (2.222E-11) + (5.188E-18)

NS3 KF383116.1 HQ234501.1 KF383119.1 + (3.031E-08) + (2.575E-09) + (5.414E-11) + (9.034E-04) + (4.071E-05) + (5.747E-07) + (2.674E-09)

Supplementary Material Supplementary Table 1: Sequences used in the study. S. No Accession Number Country Year References (PMID/DOI) 1 KU501217.1 Guatemala 2015 10.3201/eid2205.160065 2 KU501216.1 Guatemala 2015 10.3201/eid2205.160065 3 KU501215.1 Puertrico 2015 10.3201/eid2205.160065 4 KU647676.1 Martinque 2015 10.1016/j.nmni.2016.02.013 5 KU729217.2 Brazil 2015 27013429 6 KU729218.1 Brazil 2015 27013429 7 KU853013.1 Italy 2016 26987769 8 KU853012.1 Italy 2016 26987769 9 KU321639.1 Brazil 2015 26941134 10 KU497555.1 Brazil 2015 26897108 11 KU312312.1 Suriname 2015 26775124 12 KU707826.1 Brazil 2015 26401719 13 KF268950.1 Central African Republic - 25514122 14 KF268949.1 Central African Republic - 25514122 15 KF268948.1 Central African Republic 1976 25514122 16 KF993678.1 Canada 2013 25294619 17 KJ776791.1 French Polynasia 2013 24903869 18 KF383121.1 East African - 24421913 19 KF383119.1 Senegal 2001 24421913 20 KF383118.1 Senegal 2001 24421913 21 KF383115.1 Central African Republic 1968 24421913 22 KF383120.1 Senegal 2000 24421913 23 KF383117.1 Senegal 1997 24421913 24 KF383116.1 Senegal - 24421913 25 JN860885.1 Cambodia 2010 22389730 26 HQ234499.1 Malasiya 1966 22389730 27 HQ234498.1 Uganda 1947 22389730 28 HQ234501.1 Senegal 1984 22389730 29 HQ234500.1 Nigeria 1968 22389730 30 DQ859059.1 Uganda - 19741066 31 EU545988.1 Micronesia 2007 18680646 32 AY632535.2 Uganda - 16223950 33 KU922960.1 Mexico 2016 NA 34 KU922923.1 Mexico 2016 NA 35 KU866423.1 China 2016 NA 36 KU820898.1 China 2016 NA 37 KU740184.2 China 2016 NA 38 KU820899.2 China 2016 NA 39 KU820897.1 Colambia 2015 NA 40 KU761564.1 China 2016 NA 41 KU681082.3 Phillipines 2012 NA 42 KU681081.3 Thialand 2014 NA 43 KU744693.1 China 2016 NA 44 KU509998.1 Haiti 2014 NA 45 KU365780.1 Brazil 2015 NA 46 KU365779.1 Brazil 2015 NA 47 KU365778.1 Brazil 2015 NA 48 KU365777.1 Brazil 2015 NA 49 KU720415.1 Uganda 1947 NA 50 LC002520.1 Uganda - NA Supplementary Figure 1: Phylogenetic trees of ZIKA virus predicted using other methods. a) Maximum Likelihood tree. b) Minimum-Evolution tree. C) Maximum Parsimony tree. d) UPGMA tree.