MF-SuP-pKa: multi-fidelity modeling with subgraph pooling mechanism for pKa prediction

Jialu Wu; Yue Wan; Zhenxing Wu; Shengyu Zhang; Dongsheng Cao; Chang-Yu Hsieh; Tingjun Hou

doi:10.26434/chemrxiv-2022-t6q61

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

MF-SuP-pKa: multi-fidelity modeling with subgraph pooling mechanism for pKa prediction

11 July 2022, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Acid-base dissociation constant (pKa) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pKa prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pKa (Multi-Fidelity modeling with Subgraph Pooling for pKa prediction), a novel pKa prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledge-aware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pKa prediction. To overcome the scarcity of accurate pKa data, low-fidelity data (computational pKa) was used to fit the high-fidelity data (experimental pKa) through transfer learning. Moreover, we implemented knowledge-guided data augmentation on the pre-training data according to the consistency between acidic pKa and basic pKa. The final MF-SuP-pKa model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. The ablation results prove that MF-SuP-pKa gains essential benefits from subgraph pooling, multi-fidelity learning, and data augmentation. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pKa achieves superior performances to the state-of-the-art pKa prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pKa achieves 23.83% and 20.12% improvement in terms of mean absolute error (MAE) on the acidic and basic sets, respectively.

Keywords

dissociation constant

machine learning

graph neural network

subgraph pooling

multi-fidelity learning

Supplementary materials

Title

Description

Actions

Title

Supplementary materials of MF-SuP-pKa

Description

Table S1. The initial atom and bond features for graph-based methods. Table S2. The performance of different machine learning algorithms on the DataWarrior data set. Table S3. The hyperparameters of each model. Table S4. The performance of MF-SuP-pKa on the external test set. Figure S1. Visualization of the ionizable atom labels on 20 representative amphoteric molecules. Figure S2. Distribution of the pairwise Tanimoto similarities between the DataWarrior data set and the external test set. Figure S3. Results of micro-pKa prediction on the SAMPL6 data set using MF-SuP-pKa. Figure S4. Distribution of molecular size for the DataWarrior data set.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jul 11, 2022 Version 1

Metrics

865

599

Views

Downloads

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2022-t6q61

Funding

National Key Research and Development Program of China

2021YFF1201400

Natural Science Foundation of Zhejiang Province

LZ19H300001, LD22H300001

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

MF-SuP-pKa: multi-fidelity modeling with subgraph pooling mechanism for pKa prediction

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share