Abstract
The accurate prediction of reaction rates is an integral step in elucidating reaction mechanisms and designing synthetic pathways. Traditionally, kinetic parameters have been derived from activation energies obtained from quantum mechanical (QM) methods and, more recently, machine learning (ML) approaches. Among ML methods, Bidirectional Encoder Representations from Transformers (BERT), a type of transformer-based model, is the state-of-the-art method for both reaction classification and yield prediction. Despite its success, it has yet to be applied to kinetic prediction. In this work, we train a BERT model to predict experimental logk values of SN2 reactions and compare its performance to the top-performing Random Forest (RF) literature model in terms of accuracy, training time, and ability to replicate known reactivity rules. Both BERT and RF models exhibit near-experimental accuracy (RMSE = 1.1 logk units) on similarity-split test data. Interpretation of the predictions from both BERT and RF reveal that both models identify key reaction centers, as well as known electronic and steric effects. However, limitations in logk extrapolation and recognition of aromatic effects are found for the RF and BERT models, respectively.
Supplementary materials
Title
Supporting Information.
Description
Supporting information for the work titled Kinetic predictions for SN2 reactions using the BERT architecture: Comparison and interpretation.
Actions
Supplementary weblinks
Title
Data availability
Description
Data and code for the work titled Kinetic predictions for SN2 reactions using the BERT architecture: Comparison and interpretation.
Actions
View