Abstract
Transformer-based models have been successful at learning the language of chemical reactions. Here we show that natural language processing can be well applied to learn and extract the rules describing chemical reactions. Compared to previous approaches in chemical language modelling, we show that the model can learn generalised patterns composed of atom types, properties, logical operators, and internally consistent mapping between the components. Furthermore, we show that without prior knowledge of the reaction’s atom mapping, the models achieve up to 76 % accuracy for the task of rule extraction. This paves the way for using chemical language models for writing the rules of chemistry for tasks such as synthesis planning.