Abstract
We expand our recent work on clustering of synthesis routes and train a deep learning model to predict the distances between arbitrary routes. The model is based on an long short-term memory (LSTM) representation of a synthesis route and is trained as a twin network to reproduce the tree edit distance (TED) between two routes. The ML approach is approximately two orders of magnitude faster than the TED approach and enables clustering many more routes from a retrosynthesis route prediction. The clusters have a high degree of similarity to the clusters given by the TED-based approach and are accordingly intuitive and explainable. We provide the developed model as open-source (https://github.com/MolecularAI/route-distances).