Human-in-the-loop active learning for goal-oriented molecule generation

Yasmine Nahal; Janosch Menke; Julien Martinelli; Markus Heinonen; Mikhail Kabeshov; Jon Paul Janet; Eva Nittinger; Ola Engkvist; Samuel Kaski

doi:10.26434/chemrxiv-2024-623lx

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Human-in-the-loop active learning for goal-oriented molecule generation

08 August 2024, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning (ML) systems have enabled the modelling of quantitative structure-property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical spaces. However, they often struggle to generalize due to the limited scope of the training data. When optimized by generative agents, this limitation can result in the generation of molecules with artificially high predicted probabilities of satisfying target properties, which subsequently fail experimental validation. To address this challenge, we propose an adaptive approach that integrates active learning (AL) and iterative feedback to refine property predictors, thereby improving the outcomes of their optimization by generative AI agents. Our method leverages the Expected Predictive Information Gain (EPIG) criterion to select additional molecules for evaluation by an oracle. This process aims to provide the greatest reduction in predictive uncertainty, enabling more accurate model evaluations of subsequently generated molecules. Recognizing the impracticality of immediate wet-lab or physics-based experiments due to time and logistical constraints, we propose leveraging human experts for their cost-effectiveness and domain knowledge to effectively augment property predictors, bridging gaps in the limited training data. Empirical evaluations through both simulated and real human-in-the-loop experiments demonstrate that our approach refines property predictors to better align with oracle assessments. Additionally, we observe improved accuracy of predicted properties as well as improved drug-likeness among the top-ranking generated molecules.

Keywords

Goal-oriented molecule generation

Human-in-the-loop machine learning

Active learning

Interactive algorithms

Supplementary materials

Title

Description

Actions

Title

Additional file for Human-in-the-loop active learning for goal-oriented molecule generation

Description

Supplementary material for manuscript Human-in-the-loop active learning for goal-oriented molecule generation

Actions

Supplementary weblinks

Title

Description

Actions

Title

Source code

Description

Github repository containing source code and datasets used to produce the results in this article.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Aug 08, 2024 Version 1

Metrics

1,044

681

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2024-623lx

Funding

Horizon 2020

956832

Academy of Finland

Flagship program: the Finnish Center for Artificial Intelligence FCAI

UKRI Turing AI World-Leading Researcher Fellowship

EP/W002973/1

Wallenberg AI, Autonomous Systems and Software Program

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Human-in-the-loop active learning for goal-oriented molecule generation

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share