The Effect of Chemical Representation on Active Machine Learning Towards Closed-Loop Optimization

Alexander Pomberger; Antonio Pedrina McCarthy; Ahmad Khan; Simon Sung; Connor Taylor; Matthew Gaunt; Lucy Colwell; David Walz; Alexei Lapkin

doi:10.26434/chemrxiv-2022-htmn0-v2

Organic Chemistry

Search within Organic Chemistry

The Effect of Chemical Representation on Active Machine Learning Towards Closed-Loop Optimization

17 January 2022, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Multivariate chemical reaction optimization involving catalytic systems is a non-trivial task due to the high number of tuneable parameters and discrete choices. Closed-loop optimization featuring active Machine Learning (ML) represents a powerful strategy for automating reaction optimization. However, the translation of chemical reaction conditions into a machine-readable format comes with the challenge of finding highly informative features which accurately capture the factors for reaction success and allow the model to learn efficiently. Herein, we compare the efficacy of different calculated chemical descriptors for a high throughput generated dataset to determine the impact on a supervised ML model when predicting reaction yield. Then, the effect of featurization and size of the initial dataset within a closed-loop reaction optimization was examined. Finally, the balance between descriptor complexity and dataset size was considered. Ultimately, tailored descriptors did not outperform simple generic representations, however, a larger initial dataset accelerated reaction optimization.

Keywords

reaction optimization

machine learning

high-throughput experimentation

molecular parameterization

closed-loop optimization

Supplementary materials

Title

Description

Actions

Title

The Effect of Chemical Representation on Supervised and Active Machine Learning Towards Yield Prediction

Description

Table of Contents General Considerations Analytical Methods High Throughput Experimentation Reaction Scheme and Ligand Structures Synthesis of Materials Preparation of the Dataset Generation of Morgan Fingerprints Density Functional Theory (DFT)-based Geometry Optimization Sterimol Parameters Percentage Buried Volume Natural Bond Orbital (NBO) Analysis CHarges from ELectrostatic Potentials Using a Grid-Based Method (ChELPG) Analysis Summary of DFT Descriptor Values Machine Learning Linear Model Random Forest Gaussian Process Artificial Neural Network Adaptive Boosting Model Support Vector Regression Leave-one-group-out (LOGO) Cross Validation (CV) Feature Importance Assessment of the Random Forest Closed-loop Optimization Expected Improvement Acquisition Function De-full Factorization of the Chemical Space Study Batch-Sequential Active Learning The Impact of Initialization of the Active Learning The Impact of Initialization: Dataset Size vs. Complexity of Parameterization Active Learning Trajectories – Insights

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

The effect of chemical representation on active machine learning towards closed-loop optimization

A. Pomberger, A. A. Pedrina McCarthy, A. Khan, S. Sung, C. J. Taylor, M. J. Gaunt, L. Colwell, D. Walz, A. A. Lapkin journal article

Reaction Chemistry & Engineering , Volume 7, Issue 6

Online publication date: 2022

Version History

Jan 17, 2022 Version 2

Jan 10, 2022 Version 1

Version Notes

Added missing co-author in the list.

Metrics

1,739

750

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-htmn0-v2

Funding

Engineering and Physical Sciences Research Council

EP/S024220

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

The Effect of Chemical Representation on Active Machine Learning Towards Closed-Loop Optimization

Authors

Abstract

Keywords

Supplementary materials

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share