
bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163188; this version posted July 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Genomic diversity and hotspot mutations in 30,983 SARS- 2 CoV-2 genomes: moving toward a universal vaccine for the 3 “confined virus”? 4 Tarek ALOUANE1†, Meriem LAAMARTI1†, Abdelomunim ESSABBAR1, Mohammed HAKMI1, 5 EL Mehdi BOURICHA1, M.W. CHEMAO-ELFIHRI1, Souad KARTTI1, Nasma BOUMAJDI1, 6 Houda BENDANI1, Rokia LAAMARTI2, Fatima GHRIFI1, Loubna ALLAM1, Tarik AANNIZ1, 7 Mouna OUADGHIRI1 , Naima EL HAFIDI1, Rachid EL JAOUDI1, Houda BENRAHMA3, Jalil 8 ELATTAR4, Rachid MENTAG5, Laila SBABOU6, Chakib NEJJARI7, Saaid AMZAZI8, Lahcen 9 BELYAMANI9 and Azeddine IBRAHIMI1* 10 11 1Medical Biotechnology Laboratory (MedBiotech), Bioinova Research Center, Rabat Medical & 12 Pharmacy School, Mohammed Vth University in Rabat, Morocco; 13 2 MASCIR, Rabat Design, Rabat, Morocco; 14 3Faculty of Medicine, Mohammed VI University of Health Sciences (UM6SS), Casablanca, Morocco; 15 4 Laboratoire Riad, hay Riad, Rabat, Morocco; 16 5Biotechnology Unit, Regional Center of Agricultural Research of Rabat, National Institute of 17 Agricultural Research, Rabat, Morocco; 18 6 Microbiology and Molecular Biology Team, Center of Plant and Microbial Biotechnology, Biodiversity 19 and Environment, Faculty of Sciences, Mohammed V University of Rabat, Morocco; 20 7International School of Public Health, Mohammed VI University of Health Sciences (UM6SS), 21 Casablanca, Morocco; 22 8Laboratory of Human Pathologies Biology, Faculty of Sciences, Mohammed V University in Rabat, 23 Morocco; 24 9Emergency Department, Military Hospital Mohammed V, Rabat Medical & Pharmacy School, 25 Mohammed Vth University in Rabat, Morocco. 26 27 * Corresponding author: [email protected] 28 † Authors had equal contribution. 29 30 31 32 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163188; this version posted July 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 33 Abstract 34 The COVID-19 pandemic has been ongoing since its onset in late November 2019 in 35 Wuhan, China. Understanding and monitoring the genetic evolution of the virus, its 36 geographical characteristics, and its stability are particularly important for controlling the 37 spread of the disease and especially for the development of a universal vaccine covering 38 all circulating strains. From this perspective, we analyzed 30,983 complete SARS-CoV-2 39 genomes from 79 countries located in the six continents and collected from December 24, 40 2019, to May 13, 2020, according to the GISAID database. Our analysis revealed the 41 presence of 3,206 variant sites, with a uniform distribution of mutation types in different 42 geographic areas. Remarkably, a low frequency of recurrent mutations has been 43 observed; only 169 mutations (5.27%) had a prevalence greater than 1% of genomes. 44 Nevertheless, fourteen non-synonymous hotspot mutations (> 10%) have been identified 45 at different locations along the viral genome; eight in ORF1ab polyprotein (in nsp2, nsp3, 46 transmembrane domain, RdRp, helicase, exonuclease, and endoribonuclease), three in 47 nucleocapsid protein and one in each of three proteins: spike, ORF3a, and ORF8. 48 Moreover, 36 non-synonymous mutations were identified in the RBD of the spike protein 49 with a low prevalence (<1%) across all genomes, of which only four could potentially 50 enhance the binding of the SARS-CoV-2 spike protein to the human ACE2 receptor. 51 These results along with mutational frequency dissimilarity and intra-genomic divergence 52 of SARS-CoV-2 could indicate that the SARS-CoV-2 is not yet adapted to its host. 53 Unlike the influenza virus or HIV viruses, the low mutation rate of SARS-CoV-2 makes 54 the development of an effective global vaccine very likely. 55 56 Keywords: COVID-19, SARS-CoV-2, Genomic diversity, divergence, hotspot 57 mutations, spike protein, vaccine. bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163188; this version posted July 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 58 Introduction 59 The year 2019 ended with the appearance of groups of patients with pneumonia of 60 unknown cause. Initial evidence suggested that the outbreak was associated with a 61 seafood market in Wuhan, China, as reported by local health authorities [1]. The results 62 of the investigations led to the identification of a new coronavirus in affected patients [2]. 63 Following its identification on the 7th of January 2020 by the Chinese Center for Disease 64 Control and prevention (CCDC), the new virus and the disease were officially named 65 SARS-CoV-2 (for Severe Acute Respiratory Syndrome CoronaVirus-2) and COVID-19 66 (for Coronavirus Disease 19), respectively, by the World Health Organization (WHO) 67 [3]. On March 11, 2020, WHO publicly announced the SARS-CoV-2 epidemic as a 68 global pandemic. 69 This virus is likely to remain and continue to spread unless an effective vaccine is 70 developed or a high percentage of the population is infected in order to achieve collective 71 immunity. The development of a vaccine is a long process and is not guaranteed for all 72 infectious diseases. Indeed, some viruses such as influenza and HIV have a high rate of 73 genetic mutations, which makes them prone to antigenic leakage [4,5]. It is therefore 74 important to assess the genetic evolution of the virus and more specifically the regions 75 responsible for its interaction and replication within the host cell. Thus, identifying the 76 conserved and variable regions of the virus could help guide the design and development 77 of anti-SARS- CoV-2 vaccines. 78 The SARS-CoV-2 is a single-stranded positive-sense RNA virus belonging to the 79 genus Betacoronavirus. The genome size of the SARS-CoV-2 is approximately 30 kb 80 and its genomic structure has followed the characteristics of known genes of the 81 coronavirus [6]. The ORF1ab polyprotein is covering two-thirds of the viral genome and 82 cleaved into many nonstructural proteins (nsp1 to nsp16). The third part of the SARS- 83 CoV-2 genome codes for the main structural proteins; spike (S), envelope (E), 84 nucleocapsid (N) and membrane (M). In addition, six ORFs, namely ORF3a, ORF6, 85 ORF7a, ORF7b, ORF8, and ORF10, are predicted as hypothetical proteins with no 86 known function [7]. 87 Protein S is the basis of most candidate vaccines; it binds to membrane receptors in 88 host cells via its RBD and ensures a viral fusion with the host cells [8]. Its main receptor bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163188; this version posted July 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 89 is the angiotensin-converting enzyme 2 (ACE2), although another route via CD147 has 90 also been described [9,10]. The glycans attached to S protein assist the initial attachment 91 of the virus to the host cells and act as a coat that helps the virus to evade the host's 92 immune system. In fact, a previous study has shown that glycans cover about 40% of the 93 surface of the spike protein. However, the ACE2-RBD was found to be the largest and 94 most accessible epitope [11]. Thus, it may be possible to develop a vaccine that targets 95 the spike RBD provided it remains accessible and stable over time. Hence the importance 96 of monitoring the introduction of any mutation that could compromise the potential 97 effectiveness of a candidate vaccine. 98 This study aims to deepen our understanding of the intra-genomic diversity of 99 SARS-CoV-2, by analyzing the mutational frequency and divergence rate in 30,983 100 genomes from six geographic areas (Africa, Asia, Europe, North and South America, and 101 Oceania), collected during the first five months after the onset of the virus. These analysis 102 generate new datasets providing a repository of genetic variants from different 103 geographic areas, with particular emphasis on recurrent mutations and their distribution 104 along the viral genome as well as estimating the rate of intraspecific divergence while 105 evaluating the adaptation of SARS-CoV-2 to its host and the possibility of developing a 106 universal vaccine. 107 108 Results 109 Diversity of genetic variants of SARS-CoV-2 in six geographic areas 110 A total of 30,983 SARS-CoV2 genomes from 79 countries in six geographic areas 111 (Africa, Asia, Europe, North and South America, and Oceania) included in this analysis. 112 According to the GISAID database, the date of collection of the strains was within the 113 first five months following the onset of SARS-CoV-2. A total of 3,206 variant sites were 114 detected compared to the reference genome Wuhan-Hu-1/2019. Then, we analyzed the 115 type of each mutation, highlighting the prevalence of these mutations both in all genomes 116 (worldwide) and in each of the geographic areas studied (Figure 1). Worldwide, 67.96% 117 of mutations had a non-synonymous effect (64.16% have missense effects, 3.77% 118 produce a gain or loss of stop codon and 0.33% produce a loss of start codon), 28.60% 119 were synonymous, while 3.43% of the mutations were localized in the intergenic regions, bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163188; this version posted July 31, 2020.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages30 Page
-
File Size-