Data-Driven Chemical Reaction Classification with Attention-Based Neural Networks

Philippe Schwaller; Alain C. Vaucher; Vishnu H Nair; Teodoro Laino

doi:10.26434/chemrxiv.9897365.v1

Organic reactions are usually clustered in classes that collect entities undergoing similar structural rearrangement. The classification process is a tedious task, requiring first an accurate mapping of the rearrangement (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present a transformer-based model that infers reaction classes from the SMILES representation of chemical reactions. The model reaches an accuracy of 93.8 % for a multi-class classification task involving several hundred different classes. The attention weights provided by the model give an insight into what parts of the SMILES strings are taken into account for classification, based solely on data. We study the incorrect predictions of our model and show that it uncovers different biases and mistakes in the underlying data set.

Data-Driven Chemical Reaction Classification with Attention-Based Neural Networks

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Share

Data-Driven Chemical Reaction Classification with Attention-Based Neural Networks

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Share