Abstract
There is an explosion of available chemical and bioactivity information that is used through AI and machine learning for drug discovery. The community needs a computationally efficient and openly available pipeline to harmonize and register disparate sources of chemical structural data to integrate the available bioactivity information. Previously, canSARchem addressed these needs, though it was computationally intensive and required commercial software. Here we describe OpencanSARchem, an open-source and computationally efficient standardization and registration pipeline, which has addressed the limitations of our original pipeline, while still generating chemically valid tautomers. Using DFT and ab initio methods, we the free energy differences of tautomer pairs generated by the two pipelines to understand the energetic consequences of our utility selection. Statistically significant free energy differences were observed between tautomers selected by each utility, with the median difference being approximately 2 kcal/mol. We assess this energetic consequence as an acceptable compromise to democratize this method and meet the needs of the broader community.
Supplementary weblinks
Title
Github Repository for OpencanSARchem
Description
The github respository for OpencanSARchem contains all required code and a sample set of molecules test to test the code and compare outputs.
Actions
View