Denoising Drug Discovery Data for Improved ADMET Property Prediction

Matthew Adrian; Yunsie Chung; Alan Cheng

doi:10.26434/chemrxiv-2024-v4jvc

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

Denoising Drug Discovery Data for Improved ADMET Property Prediction

22 April 2024, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Predicting ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of small molecules is a key task in drug discovery. A major challenge in building better ADMET models is the experimental error inherent in the data. Furthermore, ADMET predictors are typically regression tasks due to the continuous nature of the data. This makes it difficult to apply existing methods as most focus on classification tasks. Here, we develop denoising schemes based on deep learning to address this. We find that the training error can be used to identify the noise in regression tasks while ensemble-based and forgotten event-based metrics fail to detect the noise. The most significant performance increase occurs when the original model is finetuned with the denoised data using training error as the noise detection metric. Our method has the ability to improve models with medium noise and does not degrade the performance of models with noise outside this range. To our knowledge, our denoising scheme is the first to improve model performance for ADMET data and has implications for improving models for experimental assay data in general.

Keywords

Denoising for Regression

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Additional noise detection, adaptive threshold determination, QM9 result, sample imbalance, dataset size effects, and noise effects on multitask models.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Hi, interesting paper and nice results. I noticed a minor typo - on page 16 of the PDF you write "Walter et el. defined the structure activity landscape index (SALI) as shown in Equation 3" and ref 42 is cited. In fact SALI was original defined by Guha & Van Drie in https://pubs.acs.org/doi/10.1021/ci7004093

Version History

Apr 22, 2024 Version 1

Metrics

1,167

669

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-v4jvc

Funding

Merck & Co., Inc.

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Denoising Drug Discovery Data for Improved ADMET Property Prediction

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share