Sequence Analysis of the L Protein of the Ebola 2014 Outbreak: Insight Into Conserved Regions and Mutations
Total Page:16
File Type:pdf, Size:1020Kb
MOLECULAR MEDICINE REPORTS 13: 4821-4826, 2016 Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations GOHAR AYUB1 and YASIR WAHEED1,2 1Department of Health Biotechnology, Atta-ur-Rahman School of Applied Biosciences, National University of Sciences and Technology, Islamabad 44000; 2Multidisciplinary Labs, Foundation University Medical College, Foundation University Islamabad, Defence Housing Authority Phase I, Islamabad 46000, Pakistan Received February 17, 2015; Accepted December 11, 2015 DOI: 10.3892/mmr.2016.5145 Abstract. The 2014 Ebola outbreak was one of the largest number of mutations, which may explain the reasons behind that have occurred; it started in Guinea and spread to Nigeria, this unprecedented outbreak. Certain residues critical to the Liberia and Sierra Leone. Phylogenetic analysis of the current function of the polymerase remain conserved and may be virus species indicated that this outbreak is the result of a diver- targets for the development of antiviral therapeutic agents. gent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA-dependent Introduction RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence Filoviruses are responsible for complex hemorrhagic fevers in analysis demonstrated that the L protein of all non-segmented humans and primates with case fatality rates of 50-90% (1). negative-sense (NNS) RNA viruses consists of six domains The filovirus family belongs to the order Mononegavirales containing conserved functional motifs. The aim of the and is divided into two genera: Ebolavirus comprising five present study was to analyze the presence of these motifs in species, Zaire ebolavirus (EBOV), Taï Forest ebolavirus, 2014 EBOV isolates, highlight their function and how they Reston ebolavirus, Bundibugyo ebolavirus and Sudan ebola- may contribute to the overall pathogenicity of the isolates. For virus; and Marburgvirus, which contains one species, this purpose, 81 2014 EBOV L protein sequences were aligned Marburg marburgvirus. A third genus, Cuevavirus has also with 475 other NNS RNA viruses, including Paramyxoviridae been recently proposed (2). Ebolaviruses have a non-segmented and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV negative-sense (NNS) single-stranded RNA genome, a char- outbreak L protein sequences was also performed. Analysis acteristic of the order, Mononegavirales, and most closely of the amino acid substitutions in the 2014 EBOV outbreak resembling the families Rhabdoviridae and Paramyxoviridae was conducted using sequence analysis. The alignment in its genetic assembly. The EBOV genome is ~19,000 base demonstrated the presence of previously conserved motifs in pairs in length and contains seven open reading frames that the 2014 EBOV isolates and novel residues. Notably, all the code for seven structural proteins in order, from 3' to 5', mutations identified in the 2014 EBOV isolates were tolerant, nucleoprotein (NP), VP35, VP40, glycoprotein (GP), VP30, they were pathogenic with certain examples occurring within VP24 and the L protein (3). previously determined functional conserved motifs, possibly The L protein is 2,212 amino acids in length and is the altering viral pathogenicity, replication and virulence. The largest protein encoded by the EBOV virus. It does not func- phylogenetic analysis demonstrated that all sequences with tion independently but is part of an RNA-dependent RNA the exception of the 2014 EBOV sequences were clustered polymerase (RdRp) complex with the NP and viral transcrip- together. The 2014 EBOV outbreak has acquired a great tion factors, VP30 and VP35. This complex is important in the replication and transcription of the viral genome. The L protein is hypothesized to serve as the major catalytic subunit of this complex (4). The majority of the data available on the struc- ture and function of the EBOV L protein is theoretical and Correspondence to: Dr Yasir Waheed, Multidisciplinary Labs, inadequate, relying on comparative analysis and data from the Foundation University Medical College, Foundation University Islamabad, Jinnah Avenue, Defence Housing Authority Phase I, L proteins of other well-studied NNS RNA viruses of the order Islamabad 46000, Pakistan Mononegavirales rather than on experimental observations. E-mail: [email protected] The current 2014 EBOV epidemic in West Africa is the largest ever reported Ebola epidemic across multiple West Key words: Ebola virus, L protein, phylogenetic analysis, sequence African countries with unprecedented transmissibility. There comparison are no marketed antiviral therapeutic agents or vaccines against EBOV and the continued search for effective and safe therapeutic agents is required (5). RdRp of other RNA 4822 AYUB and WAHEED: SEQUENCE ANALYSIS OF EBOLA POLYMERASE Figure 1. Amino acid sequence alignment of the L proteins of vesicular stomatitis virus, rabies virus, Newcastle disease virus, Sendai virus, measles virus and Zaire ebolavirus. Analysis was performed with the MAFFT program using the L-INS-i algorithm. The dashes represent gaps introduced to optimize the alignment. Motif/residues discussed in the text are highlighted in dark grey. Only a representative consensus sequence for each virus is presented. MOLECULAR MEDICINE REPORTS 13: 4821-4826, 2016 4823 Figure 2. Phylogenetic analysis of EBOV species in different outbreaks. The phylogenetic analysis was inferred using the Neighbor-Joining method. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method and are in the units of the number of amino acid substitutions per site. The analysis involved 101 amino acid sequences. The EBOV 2014 sequences are presented as a single black triangle. There were a total of 2,172 positions in the final dataset. Evolutionary analyses were conducted in MEGAv.6.0.6. EBOV, Zaire ebolavirus. viruses, such as hepatitis C virus, have been successful Alignment Editor (v.2.8.1) (10) with 81 EBOV L protein amino targets for the development of antiviral therapeutic agents acid sequences from the 2014 outbreak, to draw a representative and vaccines due to the presence of various highly conserved consensus sequence for each of the six viruses. The resulting motifs and/or residues (6,7). A previous study identified single consensus sequences of all six viruses were then aligned using nucleotide polymorphisms that distinguish the 2014 West the built-in MAFFT multiple sequence alignment program African outbreak from previous outbreaks and demonstrated (v.7.205) (11) with the L-INS-i algorithm (12). how the virus spread from Guinea to Sierra Leone in a matter of few months (8). The authors have identified 16 novel Phylogenetic tree construction. A phylogenetic tree of all non-synonymous amino acid substitutions in the L protein that 101 EBOV sequences reported to date (including the 2014 are present in the 2014 outbreak isolates (8). EBOV sequences) was also constructed in MEGA.v.6.06 (13), In the present study, the 2014 EBOV L protein sequences employing the Neighbor-Joining algorithm with the Poisson were compared with other NNS RNA viruses, including correction method (14). All 81 2014 EBOV sequences were Paramyxoviridae and Rhabdoviridae. The majority of previ- used in constructing the tree; however, they are presented as a ously reported L protein specific linear were conserved (9). single group in the tree figure for clarity. Novel conserved regions that were not previously documented were also identified in the current study. Notably, all the Amino acid substitution analysis. Furthermore, a standard mutations identified in the 2014 EBOV isolates were tolerant, prediction tool, SIFT Blink was used to predict and analyze they were pathogenic with certain examples occurring within the effect of the 16 amino acid substitutions on the function of previously determined functional conserved motifs, possibly the L protein (15). attenuating viral pathogenicity, replication and virulence. Phylogenetic analysis of L protein sequences from past EBOV Results outbreaks was also conducted and was consistent with previous studies. Consensus sequencing. In the present study, 475 L protein sequences of vesicular stomatitis virus, rabies virus, Newcastle Materials and methods disease virus, Sendai virus, measles virus and EBOV were extracted. The sequences were fed into the Jalview Desktop Sequence alignment. In order to align the sequences, Multiple Alignment Editor (v.2.8.1) software to draw 475 amino acid sequences of the L proteins belonging to the consensus sequences of the L protein of each virus. The family Rhabdoviridae (vesicular stomatitis virus and rabies consensus sequences of the L proteins were aligned to analyze virus) and Paramyxoviridae (Sendai virus, Newcastle disease the sequence variation (Fig. 1). Each row demonstrates a repre- virus and measles virus) were randomly selected from the sentative consensus sequence for each virus respectively. The National Center for Biotechnology Information (NCBI) dashes represent gaps introduced to optimize the alignment. Protein Database (http://www.ncbi.nlm.nih.gov/protein). The The six-way alignment established that strong homologies sequences were analyzed in the Jalview Desktop Multiple exist over almost the entire length of the