Analysis of the Genetic Divergence in Asian Strains of ZIKA Virus with Reference to 2015-2016 Outbreaks
Total Page:16
File Type:pdf, Size:1020Kb
DISCLAIMER This paper was submitted to the Bulletin of the World Health Organization and was posted to the Zika open site, according to the protocol for public health emergencies for international concern as described in Christopher Dye et al. ( http://dx.doi.org/10.2471/BLT.16.170860 ). The information herein is available for unrestricted use, distribution and reproduction in any medium, provided that the original work is properly cited as indicated by the Creative Commons Attribution 3.0 Intergovernmental Organizations licence (CC BY IGO 3.0). RECOMMENDED CITATION Shrinet J, Agrawal A, Bhatnagar RK, Sujatha Sunil S. Analysis of the genetic divergence in Asian strains of ZIKA virus with reference to 2015-2016 outbreaks. [Submitted]. Bull World Health Organ. E-pub: 22 Apr 2016. doi: http://dx.doi.org/10.2471/BLT.16.176065 Analysis of the genetic divergence in Asian strains of ZIKA virus with reference to 2015-2016 outbreaks Jatin Shrinet, a Aditi Agrawal, a Raj K Bhatnagar a & Sujatha Sunil a aInternational Centre for Genetic Engineering and Biotechnology, New Delhi-110067, India Correspondence to: Sujatha Sunil (e-mail: [email protected]) (Submitted: 20 April 2016 – Published online: 22 April 2016) Abstract Objective: To compare Zika virus (ZIKV) genomes of the 2015-2016 outbreaks with the older strains and evaluate evolution of ZIKV. Method: We performed several genetic analyses to 50 ZIKV genomes currently available in the public domain. Phylogenetic and mutation analysis, recombination analysis, molecular evolution and selection analysis identified amino acid variations that were unique to the 2015-2016 outbreak strains and the status of recombination and evolution amongst these sequences. Findings: We report distinct amino acid variations in the structural and non- structural proteins of all 2015-2016 outbreak strains that are conserved amongst these strains. Our results also reveal unique motifs in the UTRs of the new ZIKV strains. We identified recombination events in the African strains but not in the recent isolates of Asian lineage. Population level analysis revealed over dominant selection of alleles in the genome. Conclusion: 2015-2016 strains of ZIKV show distinct molecular signatures in their genomes that are conserved across strains isolated from different parts of the globe during the outbreak period. Our analysis at the population level emphasizes on a possibility of balancing selection of the alleles. Introduction Arboviruses are an important group of viruses of medical relevance due to the wide range of illnesses they cause. In the last two decades, infections caused by these viruses have been major public health concerns resulting in pandemics and epidemics (1, 2). The latest addition to this list is Zika virus (ZIKV) with the World Health Organization declaring Zika fever (ZF) as a Public Health Emergency of International concern due to its possible association with neurological and birth conditions(3). Zika virus is a member of the genus Flavivirus, family Flaviviridae (4)that has other medically important flaviviruses like dengue, yellow fever, West Nile, Japanese encephalitis viruses. Originally maintained in a sylvatic cycle (5), the first virus was isolated from a Macaca monkey in 1947 in the Zika forest region of Uganda (6). In these conditions humans are considered to be incidental hosts; however, in the absence of non-human primates, humans probably serve as the primary amplification hosts (7). The first human case was reported in 1954 in Nigeria (8),and sporadic cases have been reported from different regions around the globe over the years (9-12). In addition to clinical cases, isolation of ZIKV from vectors has also been reported (13-15). The ZIKV genome consists of a 10794 bps long single stranded RNA of positive sense encoding a single open reading frame (ORF). Flanked by two non-coding regions (5’ and 3’ untranslated regions), the ORF encodes a polyprotein: C-prM-E-NS1-NS2A-NS2B-NS3-NS4A- NS4b-NS5, which is cleaved into three structural proteins, namely, capsid (C), premembrane/membrane (prM) and envelope (E) and seven non-structural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B, NS5) (16, 17). Based on serologic and genetic properties, three lineages, namely, East African, West African and Asian, have been identified (18). In 2015, the Americas witnessed a huge outbreak of ZF with neurological implications and symptoms of Guillian-Barre syndrome in affected individuals (19).Epidemiological studies reveal the transmission to have originated on Yap in Micronesia in 2007 (18) that spread to other Pacific islands (20) and to South and Central America (21). With the rapid spread of this virus to several parts of the globe, it is imperative to understand the cause of spread. Until 2012, there were eight genomes available; however, post 2012, 42 genomes have been reported in the public domain (till 20 th March 2016) of which 25 genomes reported post January 2016. Analyzing these genomes at a molecular level may reveal the genetic divergence the newer viruses may exhibit thereby providing insights to the evolution of the virus. The present report is a bioinformatics characterization of the genomes of ZIKV isolated post 2015 and comparison with the older strains of ZIKV. Materials and Methods Genome sequences and Phylogenetic analysis A total of 50 genome sequences of ZIKV were retrieved from NCBI database. The sequences were multiple aligned and manually edited to discard any aberration in the sequences. Twelve different gene sequences of ZIKA virus namely, Capsid, pr, M, Envelope (E), NS1, NS2A, NS2B, NS3, NS4A, 2K, NS4B and NS5 were extracted from the multiple aligned genome sequences and were further used for analysis of variations in the proteins. The phylogenetic analysis of trimmed genome sequences were performed using MEGA6 tool (22). Neighbor- joining method, Minimum Evolution method with Gamma parameter 1 and 100 bootstrap replications, Maximum likelihood method, UPGMA method and Maximum Parsimony method models were used to construct the phylogenetic tree. Phylogeny test was performed using bootstrap method and by taking 1000 number of bootstrap replications. UTR analysis 5’ UTR and 3’ UTR sequences were extracted from the genome and aligned using MEGA6. UTR sequences were not present for all the genomes and also some of the genomes have short UTR sequences. The aligned sequences were then analyzed to study the conservation of residues. The multiple aligned sequences were also subjected to RNAalifold web server to predict consensus secondary structures of both the UTR sequences(23). Recombination analysis The multiple aligned ZIKV genome sequences were subjected to recombination analysis using RDP tool (24). RDP analyze the sequences using 7 methods namely, RDP (R), GENECONV (G), MaxChi (M), Chimaera (C), Bootscan (B), 3Seq (T) and SiScan (S). The events predicted by more than 5 methods and without any unknown parent and p-value<0.05 were considered recombination event. Molecular evolution and selection analysis Transition/Transversion bias, Substitution matrix, overall means distance variations were calculated using MEGA 6. Tajima’s test of neutrality was also performed using MEGA6 tool. Results and Discussion Fifty ZIKV genome details that were used in the study are listed in Supplementary Table 1. Of these sequences, 15 were belonged to year 2015; ten belonged to 2016 (as of March, 2016). Amongst the remaining sequences, two sequences each were isolated in the years 2014, 2013, 2001, 1968 and 1974. One sequence each was reported from years 2012, 2010, 2007, 2000, 1997, 1984, 1976 and 1966. Information about the isolation date was not available for seven sequences. The geographical distribution of these sequences showed that nine sequences were from Brazil and all were isolated in year 2015. Two sequences of 2015 were isolated from Guatemala and one sequence each of 2015 belongs to Suriname, Puerto Rico, Martinique and Colombia. Several of these sequences have been previously used to study molecular evolution of ZIKV in the earlier years (25, 26). The phylogenetic tree of 50 ZIKV sequences was constructed using Neighbor-joining methods (Figure 1). The tree was also constructed using other methods, namely, Minimum Evolution method with Gamma parameter 1 and 100 bootstrap replications, Maximum likelihood method, UPGMA method and Maximum Parsimony method with 1000 bootstrap replications (Supplementary Figure 1). The sequences from 2015-2016 showed similarity to Asian lineage and grouped in the same clade. These results showed that the Asian strain has caused the recent outbreak in western part of the world as reported by others (27). To study the molecular variations specific to Asian strains, Malaysian isolate (HQ234499.1; 1966) (13) was used as reference for all further analyses. Sequence comparison of structural and non-structural ZIKA virus proteins revealed several variations in the 2015-2016 genomes that are discussed in detail below. Sequence analysis of the 2015-2016 isolates with Asian genotype Structural region Year 2015 and 2016 outbreak samples (n=25) were compared against the year 1966 sequence from Malaysia. Nucleotide variations were too numerous to discuss here. With respect to amino acid variations, structural proteins showed several variations in their sequences revealing high mutational rate of the new ZIKV strains (28). Variations observed were classified into two categories, those that were seen in all 2015-2016 samples and variations that were strain-specific.