Interpretable deep-learning pKa prediction for small molecule drugs via atomic sensitivity analysis

12 June 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning (ML) models play a crucial role in predicting properties essential to drug development, such as a drug’s logscale acid-dissociation constant (pKa). Despite recent architectural advances, these models often generalize poorly to novel compounds due to a scarcity of ground-truth data. Further, these models lack interpretability, in part due to a dependence on explicit encodings of input molecules’ molecular substructures. To this end, atomic-resolution information is accessible in chemical structures by observing model response to atomic perturbations of an input molecule; however, no methods exist that systematically utilize this information for model and molecular analysis. Here, we present BCL-XpKa, a substructure-independent, deep neural network (DNN)-based pKa predictor that generalizes well to novel small molecules. BCL-XpKa discretizes pKa prediction from a regression problem into a multitask-classification problem, which accumulates data for prediction at biologically relevant pH values and records the model’s uncertainty in its prediction as a discrete distribution for each pKa prediction. BCL-XpKa outperforms modern ML pKa predictors and accurately models the effects of common molecular modifications on a molecule’s ionizability. We then leverage BCL-XpKa’s substructure independence to introduce atomic sensitivity analysis (ASA), which quickly decomposes a molecule’s predicted pKa value into its respective atomic contributions without model retraining. When paired with BCL-XpKa, ASA informs that BCL-XpKa has implicitly learned high-resolution information about molecular substructures. We further demonstrate ASA’s utility in structure preparation for protein-ligand docking by identifying ionization sites in 97.8% and 83.4% of complex small molecule acids and bases. We then apply ASA with BCL-XpKa to understand the physicochemical liabilities and guide optimization of a recently published KRAS-degrading PROTAC.

Keywords

pKa prediction
model explainability
model interpretability
QSPR
QSAR
neural networks

Supplementary materials

Title
Description
Actions
Title
Supplementary Tables
Description
Hyperparameter Optimization for BCL-XpKa
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.