Investigating the Reliability and Interpretability of Machine Learning Frameworks for Chemical Retrosynthesis

11 January 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning models for chemical retrosynthesis have attracted substantial interest in recent years. Unaddressed challenges, particularly the absence of robust evaluation metrics for performance comparison, and the lack of black-box interpretability, obscure model limitations and impede progress in the field. We present an automated benchmarking pipeline designed for effective model performance comparisons. With an emphasis on user-friendly design, we aim to streamline accessibility and facilitate utilisation within the research community. Additionally, we suggest and perform a new interpretability study to uncover the degree of chemical understanding acquired by retrosynthesis models. Our results reveal that frameworks based on chemical reaction rules yield the most diverse, chemically valid, and feasible reactions, whereas purely data-driven frameworks suffer from unfeasible and invalid predictions. The interpretability study emphasises that incorporating reaction rules not only enhances model performance but also improves interpretability. For simple molecules, we demonstrate that Graph Neural Networks identify relevant functional groups within the product molecule, providing thermodynamic stabilisation over the reactant precursors. In contrast, the popular Transformer fails to identify such crucial stabilisation. As the molecule and reaction mechanism grow more complex, both data-driven models propose unfeasible disconnections without offering a chemical rationale. We stress the importance of incorporating chemically meaningful descriptors within deep-learning models. Our study provides valuable guidance for the future development of retrosynthesis frameworks.

Keywords

Chemical Retrosynthesis
Interpretable AI
Benchmarking
Graph Neural Network
Transformer

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.
Comment number 1, Friedrich Hastedt: Jan 11, 2024, 13:59

GitHub repository will be made public within 1/2 days

Response,
Friedrich Hastedt :
Jan 11, 2024, 16:15

The repository is now accessible.