Abstract
Retrosynthetic analysis is a fundamental strategy in the field of organic synthesis, and many computational methods have been developed to address this significant task. A widely adopted approach is to treat retrosynthetic prediction as a sequence-to-sequence (seq2seq) translation task, where the Simplified Molecular Input Line Entry System (SMILES) of a product is translated into the SMILES of its corresponding reactants. However, these sequence-based models using SMILES also face many issues, including limited performance, lack of interpretability, and controllability. In this work, we introduce a novel chemical language for retrosynthetic prediction named E-SMILES, which is an extension of SMILES specially designed for seq2seq retrosynthetic prediction. This language not only documents the static molecular structure but also encodes the editing operations of the molecule in the retrosynthetic process, enabling it to characterize retrosynthesis reactions more effectively. By using E-SMILES, seq2seq retrosynthetic models can simulate the stepwise retrosynthetic analysis strategy of chemists, ensuring the matching of atoms between the predicted reactants and product, and yielding more interpretable and controllable predictions. Furthermore, E-SMILES is naturally aligned with the product's SMILES, reducing the edit distance between the model's input and output sequences. This liberates the model from learning the complex SMILES syntax and allows it to focus more on the retrosynthesis process itself. Leveraging E-SMILES, our retrosynthesis model achieves top-1 accuracies of 58.9% and 68.5% on the USPTO-50k dataset, with and without given reaction class, respectively, significantly surpassing previous state-of-the-art results. We envisage that E-SMILES can serve as a new foundational tool, promoting the development of sequence-based retrosynthetic prediction methods.
Supplementary materials
Title
Supplementary Information
Description
Supplementary Information for Improve retrosynthesis planning with a molecular editing language
Actions