Abstract
Our work is composed of a python program for automatic data mining of PubChem database to collect data associated with the corona virus drug target replicase polyprotein 1ab (UniProt identifier : POC6X7 ) of data set involving active compounds, their activity value (IC50) and their chemical/molecular descriptors to run a machine learning based AutoQSAR algorithm on the data set to generate anti-corona viral drug leads. The machine learning based AutoQSAR algorithm involves feature selection, QSAR modelling, validation and prediction. The drug leads generated each time the program is run is reflective of the constantly growing PubChem database is an important dynamic feature of the program which facilitates fast and dynamic anti-corona viral drug lead generation reflective of the constantly growing PubChem database. The program prints out the top anti-corona viral drug leads after screening PubChem library which is over a billion compounds. The interaction of top drug lead compounds generated by the program and two corona viral drug target proteins, 3-Cystiene like Protease (3CLPro) and Papain like protease (PLpro) was studied and analysed using molecular docking tools. The compounds generated as drug leads by the program showed favourable interaction with the drug target proteins and thus we recommend the program for use in anti-corona viral compound drug lead generation as it helps reduce the complexity of virtual screening and ushers in an age of automatic ease in drug lead generation. The leads generated by the program can further be tested for drug potential through further In Silico, In Vitro and In Vivo testing
The program is hosted, maintained and supported at the GitHub repository link given below
https://github.com/bengeof/Drug-Discovery-P0C6X7