Uni-pKa: An Accurate and Physically Consistent pKa Prediction through Protonation Ensemble Modeling

28 August 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Predicting pKa values of small molecules has key applications in drug discovery and molecular simulation. However, current methods face challenges in rigorously interpreting experimental data and ensuring thermodynamic consistency between successive pKa values. This study puts forward a protonation ensemble framework to address these limitations by modeling the full space of possible protonation microstates. Within this framework, we derive rigorous definitions connecting experimental macro-pKas to underlying micro-pKa equilibria. Under this new framework, we develop Uni-pKa, an accurate and reliable pKa predictor. Uni-pKa first pretrains on over 1 million predicted pKas from ChemBL to learn expressive molecular representations. It is then finetuned on experimental datasets that enforce consistency with the protonation ensemble definitions. The high-quality experimental pKa datasets are fitted to this framework by recovering underlying microstates from macro-pKas. Modeling the complete ensemble enables rigorous interpretation of macro-pKa data, and inherently preserves thermodynamic consistency, improving the prediction accuracy of Uni-pKa. Experiments demonstrate that Uni-pKa achieves state-of-the-art performance, outperforming previous methods. This novel protonation ensemble approach significantly advances machine learning for pKa prediction and molecular property modeling. Uni-pKa provides a good example of how to combine chemical knowledge and machine learning methods. Users can utilize Uni-pKa for predicting and ranking the protonation states of molecules under various pH conditions via https://app.bohrium.dp.tech/uni-pka.

Keywords

pKa
protonation
molecular pretraining

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.