Abstract
Machine learning offers a promising approach for fast and accurate binding affin- ity predictions. However, current models often fail to generalise beyond their training data and are not robustly evaluated on a diverse range of benchmarks, limiting their application in drug discovery projects. In this work, we address these issues by intro- ducing a novel graph neural network model called AEV-PLIG (Atomic Environment Vector - Protein Ligand Interaction Graph), which encodes protein-ligand interactions via atomic environment vectors to improve generalisation. We evaluate our model on improved benchmarks, including our new out-of-distribution test set we call OOD Test, and two alternative benchmark systems used for free energy perturbation (FEP) calculations, and highlight competitive performance of AEV-PLIG across the board. Moreover, we demonstrate how augmented data can be leveraged to enhance predic- tion accuracy, and how enriching the training data with three complexes from a con- generic series of ligands binding to a target of interest improves performance further. Altogether, we show that these strategies improve the applicability of machine learn- ing scoring functions and enable state-of-the-art performance nearing the accuracy of physics-based simulation methods—but at a fraction of their computational cost. This practical approach extends the predictive capabilities of machine learning for molecular discovery, paving the way for its broader use in computer-aided drug design.
Supplementary materials
Title
Supporting Information
Description
Additional supporting figures and tables.
Actions