Intermediate Knowledge Enhanced the Performance of N-Acylation Yield Prediction Model

Chonghuan Zhang; Qianghua Lin; Hao Deng; Yaxian Kong; Zhunzhun Yu; Kuangbiao Liao

doi:10.26434/chemrxiv-2024-tzsnq

Organic Chemistry

Search within Organic Chemistry

Intermediate Knowledge Enhanced the Performance of N-Acylation Yield Prediction Model

16 August 2024, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Acylation is an important reaction widely applied in medicinal chemistry. However, yield optimization remains a challenging issue due to the broad conditions space. Recently, accurate condition recommendations via machine learning have emerged as a novel and efficient method to achieve the desired transformations without a trial-and-error process. Nonetheless, accurately predicting yields is challenging due to the complex relationships involved. Herein, we present our strategy to address this problem. Two steps were taken to ensure the quality of the dataset. First, we skillfully selected substrates to ensure diversity and representativeness. Second, experiments were conducted using our in-house high-throughput experimentation (HTE) platform to minimize the influence of human factors. Additionally, we proposed an intermediate knowledge-embedded strategy to enhance the model’s robustness. The performance of the model was first evaluated at three different levels—random split, partial substrate novelty, and full substrate novelty. All model metrics in these cases improved dramatically, achieving an R2 of 0.89, MAE of 6.1%, and RMSE of 8.0%. Moreover, the generalization of our strategy was assessed using external datasets from reported literature. The prediction error for nine reactions among 30 was less than 5%, and the model was able to identify which reaction in a reaction pair with a reactivity cliff had a higher yield. In summary, our research demonstrated the feasibility of achieving accurate yield predictions through the combination of HTE and embedding intermediate knowledge into the model. This approach also has the potential to facilitate other related machine learning tasks.

Keywords

reaction informatics

yield prediction

amide coupling reaction

Supplementary materials

Title

Description

Actions

Title

Intermediate Knowledge Enhanced the Performance of N-Acylation Yield Prediction Model Supporting information

Description

Intermediate Knowledge Enhanced the Performance of N-Acylation Yield Prediction Model Supporting information

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 14, 2025 Version 3

Sep 27, 2024 Version 2

Aug 16, 2024 Version 1

Metrics

1,014

577

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2024-tzsnq

Funding

National Natural Science Foundation of China

No. 22393892 and No. 22071249

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Intermediate Knowledge Enhanced the Performance of N-Acylation Yield Prediction Model

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share