Abstract
Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices to identify which molecular substructures are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as graph neural networks (GNNs), especially when compared with simpler modeling alternatives such as random forests coupled with atom masking. To mitigate this problem, in this work a modification of the regression objective for GNNs is proposed to specifically account for common core structures between pairs of molecules. The presented approach showed higher accuracy on a recently-proposed explainability benchmark. This methodology has the potential to assist with model explainability in drug discovery pipelines, particularly in lead optimization efforts where specific chemical series are investigated.
Supplementary materials
Title
Supporting information
Description
Supporting plots and tables to the main manuscript
Actions