Abstract
Despite growing interest and success in automated in-silico molecular design, doubts remain regarding the ability of goal-directed generation algorithms to perform unbiased exploration of novel chemical spaces. A specific phenomenon has recently been highlighted: goal-directed generation guided with machine learning models produce molecules with high scores according to the optimization model, but low scores according to control models, even when trained on the same data distribution and the same target. In this work, we show that this worrisome behavior is actually due to issues with the predictive models and not the goal-directed generation algorithms. We show that with appropriate predictive models, this issue can be resolved, and molecules generated have high scores according to both the optimization and the control models.
Supplementary materials
Title
Supplementary Information Explaining and avoiding failures modes in goal-directed generation
Description
Supporting information, tables and figures.
Actions
Title
Code to reproduce the figures
Description
Zipped version of the code that reproduces the experiments and figures presented in the manuscript.
Actions
Title
Results
Description
Zipped version of the results of the experiments. Unzip the file at the root of the directory provided in Additional file 2 to be able to fully reproduce the results.
Actions