Abstract
Retrosynthesis, the strategy of devising laboratory pathways for small molecules by working backwards from the target compound, remains a rate limiting step in multi-step synthesis of complex molecules, particularly in drug discovery. Enhancing retrosynthetic efficacy requires overcoming the vast complexity of chemical space, the limited known interconversions between molecules, and the challenges posed by limited experimental datasets. In this study, we introduce generative machine learning methods for retrosynthetic planning that generate reaction templates. Our approach features three key innovations. First, the models generate complete reactions, known as templates, instead of reactants or synthons. Through this abstraction, novel chemical transforms resembling those in the training dataset can be generated. Second, the approach optionally allows users to select the specific bond or bonds to be changed in the proposed reaction, enabling human interaction to influence the synthetic approach. Third, one of our models, based on the conditional kernel-elastic autoencoder (CKAE) architecture, employs a latent space to measure the similarity between generated and known reactions, providing insights into their chemical viability. Together, these features establish a coherent framework for retrosynthetic planning, as validated by our experimental work. We demonstrate the application of our machine learning methodology to design a synthetic pathway for a simple yet challenging small molecule of pharmaceutical interest. The pathway was experimentally proven viable through a 3-step process, which compares favorably to previous 5-9 step approaches. This improvement demonstrates the utility and robustness of the generative machine learning approaches described herein and highlights their potential to address a broad spectrum of challenges in chemical synthesis.