Genomic diversity and hotspot mutations in 30,983 SARS-CoV-2 genomes: moving toward a universal vaccine for the "confined virus"?
Tarek Alouane, Meriem Laamarti, Abdelomunim Essabbar, Mohammed Hakmi, El Mehdi Bouricha, M.W. Chemao-Elfihri, Souad Kartti, Nasma Boumajdi, Houda Bendani, Rokaia Laamarti, Fatima Ghrifi, Loubna Allam, Tarik Aanniz, Mouna Ouadghiri, Naima El Hafidi, Rachid El Jaoudi, Houda Benrahma, Jalil El Attar, Rachid Mentag, Laila Sbabou, Chakib Nejjari, Saaid Amzazi, Lahcen Belyamani, Azeddine Ibrahimi
doi: https://doi.org/10.1101/2020.06.20.163188
Abstract
The Coronavirus disease 19 (COVID-19) pandemic has been ongoing since its onset in late November 2019 in Wuhan, China. To date, the SARS-CoV-2 virus has infected more than 8 million people worldwide and killed over 5% of them. Efforts are being made all over the world to control the spread of the disease and most importantly to develop a vaccine. Understanding the genetic evolution of the virus, its geographic characteristics and stability is particularly important for developing a universal vaccine covering all circulating strains of SARS-CoV-2 and for predicting its efficacy. In this perspective, we analyzed the sequences of 30,983 complete genomes from 80 countries located in six geographical zones (Africa, Asia, Europe, North & South America, and Oceania) isolated from December 24, 2019 to May 13, 2020, and compared them to the reference genome. Our in-depth analysis revealed the presence of 3,206 variant sites compared to the reference Wuhan-Hu-1 genome, with a distribution that is largely uniform over all continents. Remarkably, a low frequency of recurrent mutations was observed; only 182 mutations (5.67%) had a prevalence greater than 1%. Nevertheless, fourteen hotspot mutations (> 10%) were identified at different locations, seven at the ORF1ab gene (in regions coding for nsp2, nsp3, nsp6, nsp12, nsp13, nsp14 and nsp15), three in the nucleocapsid protein, one in the spike protein, one in orf3a, and one in orf8. Moreover, 35 non-synonymous mutations were identified in the receptor-binding domain (RBD) of the spike protein with a low prevalence (<1%) across all genomes, of which only four could potentially enhance the binding of the SARS-CoV-2 spike protein to the human receptor ACE2. These results along with the phylogenetic analysis demonstrate that the virus does not have a significant divergence at the protein level compared to the reference both among and within different geographical areas. Unlike the influenza virus or HIV viruses, the slow rate of mutation of SARS-CoV-2 makes the potential of developing an effective global vaccine very likely. Keywords: SARS-CoV-2, genetic evolution, divergence, hotspot mutations, spike protein.