Abstract
The amino acid mutations among 28,345 poly-protein sequences corresponding to human SARS-CoV-2 orf1AB gene representing the six geographical locations; Africa, Asia, Europe, North America, Oceania and South America were identified by comparing with the equivalent reference poly-protein sequences derived from the first human SARS-CoV-2 genome sequence, reported from Wuhan-Hu-1, China. The mutations were analysed according to the following three datasets; i) 27,956 poly-proteins comprising 7,096 amino acid residues, ii) 373 poly-proteins comprising between 7,051-7,095 amino acid residues and iii) 16 poly-proteins comprising between 7,097-7,099 amino acid residues. In all, 3,204 distinct mutation sites were observed among the poly-proteins comprising 7,096 amino acid residues contributing to ~45% of the poly-protein sequence in SARS-CoV-2 orf1AB gene that have undergone mutations since the outbreak of COVID-19 pandemic disease in December 2019. Fifteen proteins of the poly-protein sequence were associated with mutations and the mutation propensities for the “leader protein”, nsp2, nsp3, nsp6, nsp7, nsp8, endoRNAse proteins was higher (> 1) compared to nsp4, nsp9, nsp10, 3C-like proteinase, RdRp, helicase, 3’-to-5’ exonuclease and 2’-O-ribose methyltransferase proteins. Relatively higher mutation percentages were observed for the RdRp (35.32%), nsp2 (26.42%), nsp3 (11.73%) and helicase (7.88%) proteins, whereas, mutation percentages for the remaining proteins ranged between 0.16% for nsp10 protein to 4.11% for the 3’ -to-5’ exonuclease proteins. Five mutations; T265I in nsp2 protein, T1246I in nsp3, G3278S in 3C-like proteinase, L3606F in nsp6 and P4715L in RdRp were common across all six geographical locations. The P4715L RdRp mutation was predominant in all geographical locations, except Africa, where G5215S mutation was predominant. The maximum number of distinct mutation sites were observed for the nsp3 protein. In 373 orf1AB poly-protein sequences comprising between 7,051-7,095 amino acid residues, deletion mutations were observed that were associated with “leader protein” between positions; 82-86 (GHVMV) and positions 141-143 (KSF). Among 16 orf1AB poly-proteins comprising between 7,097-7,099 amino acid residues, certain insertion mutations were observed that were associated with the nsp2 (517K), nsp3 (938E, 1901Y), 2’ -O-ribose methyltransferase (7046F) and nsp6 (3610F, 3611L) proteins. In this work, all mutations observed among the 28,345 orf1AB poly-proteins of human SARS CoV-2 relative to the reference sequences are presented.
Supplementary materials
Title
SUPPLEMENTARY-TABLE-1
Description
Distinct mutation sites and mutation types observed in the poly-proteins of 27,956 human SARS-CoV-2 orf1AB gene comprising 7,096 amino acid residues and representing six geographical locations. The same mutation site can be associated with different mutation type(s).
Actions
Title
SUPPLEMENTARY-TABLE-2
Description
Mutation sites associated with more than one mutation type in the poly-proteins of human SARS-CoV-2 orf1AB genes from Asia, North America, Europe and Oceania.
Actions
Title
SUPPLEMENTARY-TABLE-3
Description
Mutations in human SARS-CoV-2 orf1AB poly-proteins comprising 7,051-7,095 amino acid residues that include deletion mutations (-)
Actions
Title
SUPPLEMENTARY-TABLE-4
Description
Mutations in human SARS-CoV-2 orf1AB poly-proteins comprising 7,097-7,099 amino acid residues that include insertion mutations (‘-‘).
Actions
Supplementary weblinks
Title
COVID-19 Coronavirus pandemic status
Description
Worldwide COVID-19 coronavirus cases, deaths, recovery status
Actions
View Title
Phylogenetic analysis software for everyone
Description
A web service for constructing and analyzing phylogenetic relationships between sequences
Actions
View Title
ABREAST™
Description
A Bioinformatics, Research, Education, Services and Training Consultancy
Actions
View