Abstract
The idea behind virtual screening is to first test compounds computationally in order to reduce the number of compounds that need to be screened experimentally, thus reducing the time and cost of physical experiments. Molecular docking is the most popular virtual screening technique, it predicts the binding of candidate compounds to the protein target by modeling the interactions at the binding pocket. Despite being widely used, docking accuracy is often low due to the difficulty of modeling inherently complex biological systems. On the other hand, state of the art deep neural networks, like Graph Convolutional Networks (GCNs) are able to capture the complex non-linear relationships between structural and biological data, but they lack the interpretability of structure-based modeling. In this work we took advantage of the activity data from a quantitative High Throughput Screen (HTS) of ~200K compounds against Cruzain (Cz) to retrospectively evaluate the ability of a docking algorithm and a Graph Convolutional Network for prioritizing the active compounds from the dataset. We then propose strategies to combine both techniques in a single virtual screening pipeline in order to exploit their orthogonal benefits. By plugging in the atomic embeddings learned by the GCN into the docking algorithm by means of pharmacophoric restraints, docking ability to retrieve the active ligands was enhanced. Moreover, by applying the GCN as a pre-docking filter, the compound’s library was enriched in active molecules and subsequent docking of the filtered library achieved significantly higher hit rates. This work aims to be a proof of concept of the usefulness of combination strategies involving deep learning and classical molecular docking techniques, in the context of drug discovery.