Geographical Distribution of Amino Acid Mutations in Human SARS-CoV-2 Orf1ab Poly-Proteins Compared to the Equivalent Reference Proteins from China

21 July 2021, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The amino acid mutations among 28,345 poly-protein sequences corresponding to human SARS-CoV-2 orf1AB gene representing the six geographical locations; Africa, Asia, Europe, North America, Oceania and South America were identified by comparing with the equivalent reference poly-protein sequences derived from the first human SARS-CoV-2 genome sequence, reported from Wuhan-Hu-1, China. The mutations were analysed according to the following three datasets; i) 27,956 poly-proteins comprising 7,096 amino acid residues, ii) 373 poly-proteins comprising between 7,051-7,095 amino acid residues and iii) 16 poly-proteins comprising between 7,097-7,099 amino acid residues. In all, 3,204 distinct mutation sites were observed among the poly-proteins comprising 7,096 amino acid residues contributing to ~45% of the poly-protein sequence in SARS-CoV-2 orf1AB gene that have undergone mutations since the outbreak of COVID-19 pandemic disease in December 2019. Fifteen proteins of the poly-protein sequence were associated with mutations and the mutation propensities for the “leader protein”, nsp2, nsp3, nsp6, nsp7, nsp8, endoRNAse proteins was higher (> 1) compared to nsp4, nsp9, nsp10, 3C-like proteinase, RdRp, helicase, 3’-to-5’ exonuclease and 2’-O-ribose methyltransferase proteins. Relatively higher mutation percentages were observed for the RdRp (35.32%), nsp2 (26.42%), nsp3 (11.73%) and helicase (7.88%) proteins, whereas, mutation percentages for the remaining proteins ranged between 0.16% for nsp10 protein to 4.11% for the 3’ -to-5’ exonuclease proteins. Five mutations; T265I in nsp2 protein, T1246I in nsp3, G3278S in 3C-like proteinase, L3606F in nsp6 and P4715L in RdRp were common across all six geographical locations. The P4715L RdRp mutation was predominant in all geographical locations, except Africa, where G5215S mutation was predominant. The maximum number of distinct mutation sites were observed for the nsp3 protein. In 373 orf1AB poly-protein sequences comprising between 7,051-7,095 amino acid residues, deletion mutations were observed that were associated with “leader protein” between positions; 82-86 (GHVMV) and positions 141-143 (KSF). Among 16 orf1AB poly-proteins comprising between 7,097-7,099 amino acid residues, certain insertion mutations were observed that were associated with the nsp2 (517K), nsp3 (938E, 1901Y), 2’ -O-ribose methyltransferase (7046F) and nsp6 (3610F, 3611L) proteins. In this work, all mutations observed among the 28,345 orf1AB poly-proteins of human SARS CoV-2 relative to the reference sequences are presented.

Keywords

human SARS-CoV-2
orf1ab poly-proteins
mutations
geographical locations
leader protein
nsp2
nsp3
nsp4
nsp6
nsp7
nsp8
nsp9
nsp10
3C-like proteinase
RNA dependent RNA polymerase
helicase
3’ -to-5’ exonuclease
EndoRNAse
2’ -O-ribose methyltransferase

Supplementary materials

Title
Description
Actions
Title
SUPPLEMENTARY-TABLE-1
Description
Distinct mutation sites and mutation types observed in the poly-proteins of 27,956 human SARS-CoV-2 orf1AB gene comprising 7,096 amino acid residues and representing six geographical locations. The same mutation site can be associated with different mutation type(s).
Actions
Title
SUPPLEMENTARY-TABLE-2
Description
Mutation sites associated with more than one mutation type in the poly-proteins of human SARS-CoV-2 orf1AB genes from Asia, North America, Europe and Oceania.
Actions
Title
SUPPLEMENTARY-TABLE-3
Description
Mutations in human SARS-CoV-2 orf1AB poly-proteins comprising 7,051-7,095 amino acid residues that include deletion mutations (-)
Actions
Title
SUPPLEMENTARY-TABLE-4
Description
Mutations in human SARS-CoV-2 orf1AB poly-proteins comprising 7,097-7,099 amino acid residues that include insertion mutations (‘-‘).
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.