Abstract
Structure-based virtual screening is a promising in silico technique that integrates computational methods into drug discovery. The most extensively used method in structure-based virtual screening is molecular docking. However, the docking process is not computationally efficient and simultaneously accurate due to classic mechanics-based scoring functions. These can only approximate, but not reach, quantum mechanics precision. In order to reduce the computational cost of the protein-ligand scoring process and use data-driven approaches to boost the scoring function accuracy, deep learning non-docking methods can be used by utilizing 3D structure or 1D sequence information of the protein target. This method can minimize the error inherited from molecular docking methods and avoid the extensive computational cost of docking. Furthermore, these two methods are integrated into an easy-to-use framework, CarbonAI, that provides both choices for researchers. Graph neural network (GNN) is employed in the 3D version and BiLSTM has been adopted in the sequence version of CarbonAI, respectively. To verify our approaches, different experiments were performed on two datasets, an open dataset Directory of Useful Decoys: Enhanced (DUD.E) and an in-house proprietary dataset without computer generated artificial decoys (NoDecoy). On DUD.E we achieved a state-of-the-art AUC of 0.981 and on NoDecoy we achieved an AUC of 0.974 whereas on the conventional docking program, the respective AUC performance is less than 0.8. The CarbonAI engine also reaches a state-of-the-art enrichment factor at top 2 percent for 36.2 folds. We have also retrospectively validated the CarbonAI models with various wet lab experimental data, and the results demonstrated a consistently accurate performance. Furthermore, the inference speed of the engine was benchmarked using the openly available 2021 Enamine REAL Database (RDB), that comprises over 1.36 billion molecules in 4050 core-hours using our CarbonAI non-docking method (CarbonAI-ND). The inference speed of CarbonAI-ND is about 36000 molecule per core-hour, compared to typical docking methods' speed of 20, which is about 16000 times faster than conventional docking method. Overall, the experiments indicate that CarbonAI is accurate and computationally efficient with good generalization to different molecular targets for virtual screening.