Abstract
Molecular generative artificial intelligence is drawing significant attention in the drug design community, with several experimentally validated proofs of concepts already published. Nevertheless, generative models are known for sometimes generating unrealistic, unsynthesizable or unstable structures. This calls for methods to constrain those algorithms to generate structures in reasonable portions of the chemical space. While the concept of applicability domains (AD) for predictive models is well studied, its counterpart for generative models is not yet defined. In this work, we examine empirically various possibilities and propose applicability domains suited for generative models. Using both public and internal datasets, we use state-of-the-art generative methods to generate novel structures that are predicted actives by a corresponding QSAR model, while constraining the generative model to stay within a given applicability domain. Our work looks at several applicability domain definitions, combining various criteria, such as structural similarity to the training set, similarity of physico-chemical properties, unwanted substructures, and Quantitative Estimate of Drug- Likeness (QED). We assess both from a qualitative and quantitative point of view the structures generated, and find that the applicability domain definitions have a strong influence on the chemical beauty of generated molecules. An extensive analysis of our results allows us to identify applicability domain definitions that are best suited for generating drug-like molecules with generative models. We anticipate that this work will help foster the adoption of generative models in an industrial context.
Supplementary materials
Title
Supporting Information
Description
- Code availability
- Visualisation of generated molecule sets
- QED distributions
- SAS distributions
- Enrichment in actives/inactives
- Score distributions
- Similarities between generated sets
- Typical problematic structures generated
Actions
Supplementary weblinks
Title
Github repository
Description
This code reproduces the results found in our paper.
Actions
View