Abstract
SynTemp is a framework designed to extract and hierarchically cluster reaction templates from large-scale reaction data repositories. Reaction templates are partial Imaginary Transition State graphs representing the reaction center as well as surrounding context. These graphs are equivalent to Double Pushout graph rewriting rules and thus can be applied directly to predict reaction outcomes at structural formula level. Rule inference is based on a consensus of multiple atom-atom mapping (AAM) tools integrating predictions RXNMapper, GraphormerMapper, and LocalMapper based on a robust graph-theoretic methodology for comparing partial atom-atom mappings. SynTemp achieves an exceptional accuracy of 99.5% and a success rate of 71.23% in obtaining AAMs on the Chemical Reaction Dataset. Reaction centers with surrounding contexts are extracted and completed with mechanistically relevant hydrogen atoms to obtain complete reaction templates. Subsequently, they were categorized into distinct groups based on topological features using hierarchical clustering, resulting in a library of 311 transformation rules that explains 86% of the reaction data set. A residual of 14% remained unresolved due to non-equivalent AAMs and ambiguous hydrogen placements. Despite these challenges, the coverage of our templates remains high at approximately 93.5-94.5%, surpassing that of RDChiral using SMARTS templates.
Supplementary materials
Title
Supplementary 1
Description
Additional Figures and Tables
Actions
Title
Supplementary 2
Description
Reactions with reaction step detected by topological descriptors
Actions
Title
Supplementary 3
Description
Reaction rule library
Actions