Prediction of the chemical context for Buchwald-Hartwig coupling reactions

26 October 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We present machine learning models for predicting the chemical context for Buchwald-Hartwig coupling reactions. Using reaction data from in-house electronic lab notebooks, we train two models: one based on single-label data and one based on multi-label data. Both models show excellent top-3 accuracy around 90%, which suggests strong predictivity. There seems to be an advantage of including multi-label data because the multi-label model shows higher accuracy and better sensitivity for the individual contexts than the single-label model. Although the models are performant, we also show that such models need to be re-trained periodically. There is a strong temporal characteristic to the usage of different contexts. Therefore, a model trained on historical data will decrease in usefulness with time as newer and better contexts emerge and replace older ones. We hypothesize that these significant transitions in the context-use will likely affect any model predicting chemical contexts trained on historical data. Consequently, training such models warrants careful planning of what data is used for training and how often the model needs to be re-trained.

Keywords

condition prediction
reaction informatics
Buchwald-Hartwig coupling

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.