Deep Phylogenetic Analysis of Orthocoronavirinae Genomes Traces the Origin, Evolution and Transmission Route of 2019 Novel Coronavirus
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.091199; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Deep phylogenetic analysis of Orthocoronavirinae genomes traces the origin, evolution and transmission route of 2019 novel coronavirus Amresh Kumar Sharma and Anup Som* Centre of Bioinformatics Institute of Interdisciplinary Studies University of Allahabad Prayagraj – 211002, India *Corresponding Author: Email: [email protected] Website: www.somlab.in bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.091199; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Wuhan city, China in December 2019 and thereafter its spillover across the world has created a global pandemic and public health crisis. Today, it appears as a threat to human civilization. Scientists and medical practitioners across the world are involved to trace out the origin and evolution of SARS-CoV-2 (also called 2019 novel coronavirus and referred as 2019-nCoV), its transmission route, cause of pathogenicity, and possible remedial action. In this work, we aim to find out the origin, evolutionary pattern that led to its pathogenicity and possible transmission pathway of 2019-nCoV. To achieve the aims we conducted a large-scale deep phylogenetic analysis on the 162 complete Orthocoronavirinae genomes consisting of four genera namely Alphacoronavirus, Betacoronavirus, Deltacoronavirus and Gammacoronavirus, their gene trees analysis and subsequently genome and gene recombination analyses. Our analyses revealed that i) bat, pangolin and anteater are the natural hosts of 2019-nCoV, ii) outbreak of 2019-nCoV took place via inter-intra species transmission mode, iii) host-specific adaptive mutation made 2019-nCoV more virulent, and iv) the presence of widespread recombination events led to the evolution of new 2019-nCoV strain and/or could be determinant of its pathogenicity. Keywords: Orthocoronavirinae; 2019-nCoV; Genome/Gene phylogeny; Adaptive mutation; Recombination; Transmission pathway, COVID-19 Highlights • Orthocoronavirinae genome phylogeny revealed that bat, pangolin and anteater are natural reservoir hosts of novel coronavirus (2019-nCoV/SARS-CoV-2). • Host-specific adaptive mutation occurred among the coronaviruses. • Transmission of 2019-nCoV to human took place by inter-intra species mode of transmission. • Presence of widespread recombination events led to the evolution of new 2019-nCoV strain and/or could be determinant of its pathogenicity. 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.091199; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Introduction Coronaviruses are single-stranded RNA virus of 26 to 32 kilobases (kb) nucleotide chain and consists of both structural and non-structural proteins. They have been known to cause lower and upper respiratory diseases, central nervous system infection and gastroenteritis in a number of avian and mammalian hosts, including humans (Gorbalenya et al 2020). The recent outbreak of novel coronavirus (2019-nCoV/SARS-COV-2) associated with acute respiratory disease called coronavirus disease 19 (commonly known as COVID-19) has caused a global pandemic and has spread over 212 countries (WHO, COVID-19 situation report). As of now, more than 4 million people have been infected and approximately 300 thousand people have died. Today, COVID-19 appears as a global threat to public health as well as to the human civilization (WHO, COVID-19 situation reports). As it was initially outbreak in Wuhan city, Hubei province, China in December 2019 but then rapidly spread to several European countries and subsequently almost the entire world (Wu et al., 2020; Zhu et al., 2019). Coronaviruses are placed within the family Coronaviridae, which has two subfamilies namely Orthocoronavirinae and Torovirinae. Orthocoronavirinae has four genera: Alphacoronavirus (average genome size 28kb), Betacoronavirus (average genome size 30kb), Gammacoronavirus (average genome size 28kb), and Deltacoronavirus (average genome size 26kb) (de Groot et al. 2011). Coronaviruses are typically harbored in mammals and birds. Particularly Alphacoronavirus and Betacoronavirus infect mammals, and Gammacoronavirus and Deltacoronavirus infect avian species (Woo et al., 2009; 2010; Fan et al., 2019). The previous important outbreaks of coronaviruses are severe acute respiratory syndrome coronavirus (SARS-CoV) outbreak in China in 2002/03, Middle East respiratory syndrome coronavirus (MERS-CoV) outbreak in 2012 that resulted severe epidemics in the respective geographical regions (Eickmann et al., 2003; Vijaykrishna et al., 2007; Zumla et al, 2015; Hayes et al., 2019). The present outbreak of 2019-nCoV is the third documented spillover of an animal coronavirus to humans in only two decades that has resulted in a major pandemic (Velavan and Meyer, 2020; Lai et al., 2020). Despite the deadly infection caused by 2019-nCoV, till are no specific vaccines or medicines for COVID-19. 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.091199; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Scientific communities across the world are trying to understand several fundamental and applied questions such as: What is the origin of 2019-nCoV? What are the possible transmission routes? Why 2019-nCoV is more deadly than other CoVs? What is its possible clinical diagnosis & treatment? etc. Consequently, a large number of research outcomes are being consistently published. In this paper, we aim to find out the origin and evolution of 2019-nCoV, and its possible transmission pathway through deep phylogenetic analysis. Materials and Methods Data selection 162 Orthocoronavirinae genomes were retrieved from NCBI (https://www.ncbi.nlm.nih.gov/) and Virus Pathogen Database and Analysis Resource (https://www.viprbrc.org/). We only considered complete genome sequences having no unidentified nucleotide characters. Our dataset included 23 Alphacoronavirus, 92 Betacoronavirus, 32 Deltacoronavirus and 15 Gammacoronavirus genomes from different subgenera, diverse host species and from wide geographical location (Cui et al., 2019). Further for rooting the tree, we used two genome sequences from Torovirus and two from Bafinivirus belonging to domestic cow and fish respectively. Three random sequences were generated by using the average genome length of the Orthocoronavirinae genomes and their average GC%. The random sequences were used to check the reliability of the topology. Overall, the phylogenetic analysis consists of 162 complete Orthocoronavirinae genomes; four outgroups sequences and three random sequences. Phylogenetic reconstruction The genome sequences were aligned using the MAFFT alignment tool (Katoh et al., 2002). Genome tree of the Orthocoronavirinae and Betacoronavirus were reconstructed using maximum likelihood (ML) method and GTR+G+I model of sequence evolution as revealed by the model test with 1000 bootstrap support. Trees were reconstructed using IQ-TREE software (Nguyen et al., 2015) and were visualized with iTOL software (Letunic et al., 2019). Five gene trees namely Orf1ab, Spike (S), Membrane (M), Envelope (E) and Nucleocapsid (N) were reconstructed using amino acid sequences. ML method of tree reconstruction and protein-specific amino acids 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.091199; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. evolution model as revealed by the model test was used for gene trees reconstruction. Bootstrap test with 1000 bootstrap replicates was carried out to check the reliability of the gene trees. Genome and gene recombination analysis Potential recombination events in the history of the Betacoronaviros were assessed using the RDP5 package (Martin et al., 2015). The RDP5 analysis was conducted based on the complete genome sequence using RDP, GENECONV, BootScan, MaxChi, Chimera, SiScan, and 3Scan methods. Putative recombination events were identified with a Bonferroni corrected P-value cut-off of 0.05 supported by more than four methods. Results and Discussion The genome phylogeny of Orthocoronavirinae depicts that Alpha, Beta, Delta and Gamma coronaviruses clustered according to their cladistic relations (Fig. 1). This result is consistent with the other results (Luk et al. 2019; Wu et al., 2020). Furthermore, Gammacoronavirus and Deltacoronavirus appeared in a single clade (i.e.