Impact of applicability domains to generative artificial intelligence

20 April 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Molecular generative artificial intelligence is drawing significant attention in the drug design community, with several experimentally validated proofs of concepts already published. Nevertheless, generative models are known for sometimes generating unrealistic, unsynthesizable or unstable structures. This calls for methods to constrain those algorithms to generate structures in reasonable portions of the chemical space. While the concept of applicability domains (AD) for predictive models is well studied, its counterpart for generative models is not yet defined. In this work, we examine empirically various possibilities and propose applicability domains suited for generative models. Using both public and internal datasets, we use state-of-the-art generative methods to generate novel structures that are predicted actives by a corresponding QSAR model, while constraining the generative model to stay within a given applicability domain. Our work looks at several applicability domain definitions, combining various criteria, such as structural similarity to the training set, similarity of physico-chemical properties, unwanted substructures, and Quantitative Estimate of Drug- Likeness (QED). We assess both from a qualitative and quantitative point of view the structures generated, and find that the applicability domain definitions have a strong influence on the chemical beauty of generated molecules. An extensive analysis of our results allows us to identify applicability domain definitions that are best suited for generating drug-like molecules with generative models. We anticipate that this work will help foster the adoption of generative models in an industrial context.

Keywords

Applicability Domain
Generative Artificial Intelligence

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
- Code availability - Visualisation of generated molecule sets - QED distributions - SAS distributions - Enrichment in actives/inactives - Score distributions - Similarities between generated sets - Typical problematic structures generated
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.