A Program for Automated Data Mining of PubChem to Screen a Billion Compounds and Generate by Machine Learning Based AutoQSAR Algorithm Anti-Corona Viral Drug Leads (Replicase Polyprotein 1ab Inhibitors) and in Silico Study of the Top Drug Lead Compounds

05 June 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Our work is composed of a python program for automatic data mining of PubChem database to collect data associated with the corona virus drug target replicase polyprotein 1ab (UniProt identifier : POC6X7 ) of data set involving active compounds, their activity value (IC50) and their chemical/molecular descriptors to run a machine learning based AutoQSAR algorithm on the data set to generate anti-corona viral drug leads. The machine learning based AutoQSAR algorithm involves feature selection, QSAR modelling, validation and prediction. The drug leads generated each time the program is run is reflective of the constantly growing PubChem database is an important dynamic feature of the program which facilitates fast and dynamic anti-corona viral drug lead generation reflective of the constantly growing PubChem database. The program prints out the top anti-corona viral drug leads after screening PubChem library which is over a billion compounds. The interaction of top drug lead compounds generated by the program and two corona viral drug target proteins, 3-Cystiene like Protease (3CLPro) and Papain like protease (PLpro) was studied and analysed using molecular docking tools. The compounds generated as drug leads by the program showed favourable interaction with the drug target proteins and thus we recommend the program for use in anti-corona viral compound drug lead generation as it helps reduce the complexity of virtual screening and ushers in an age of automatic ease in drug lead generation. The leads generated by the program can further be tested for drug potential through further In Silico, In Vitro and In Vivo testing

The program is hosted, maintained and supported at the GitHub repository link given below

https://github.com/bengeof/Drug-Discovery-P0C6X7

Keywords

QSAR predictions
Datamining
machine Learning Predictions
Drug Discovery Programmes
COVID_19

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.