Machine learning of reaction properties via learned representations of the condensed graph of reaction

11 August 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The estimation of chemical reaction properties such as activation energies, rates or yields is a central topic of computational chemistry. In contrast to molecular properties, where machine learning approaches such as graph convolutional neural networks (GCNNs) have excelled for a wide variety of tasks, no general and transferable adaptations of GCNNs for reactions have been developed yet. We therefore combined a popular cheminformatics reaction representation, the so-called condensed graph of reaction (CGR), with a recent GCNN architecture to arrive at a versatile, robust and compact deep learning model. The CGR is a superposition of the reactant and product graphs of a chemical reaction, and thus an ideal input for graph-based machine learning approaches. The model learns to create a data-driven, task dependent reaction embedding that does not rely on expert knowledge, similar to current molecular GCNNs. Our approach outperforms current state-of-the-art models in accuracy, is applicable even to imbalanced reactions and possesses excellent predictive capabilities for diverse target properties, such as activation energies, reaction enthalpies, rate constants, yields or reaction classes. We furthermore curated a large set of atom-mapped reactions along with their target properties, which can serve as benchmark datasets for future work. All datasets and the developed reaction GCNN model are available online, free of charge and open-source.

Keywords

Machine learning
Reaction prediction
Graph-convolutional neural net
Atom-mapped reaction database
Condensed graph of reaction

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Model performances on the Rad-6-RE database and detailed discussion of the influence of data leakage in this system. Details on hyperparameter searches, full list of test set errors for all models with and without hyperparameter optimization.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.