Practically significant method comparison protocols for machine learning in small molecule drug discovery.

Jeremy R. Ash; Cas Wognum; Raquel  Rodríguez-Pérez; Matteo Aldeghi; Alan C. Cheng; Djork-Arné Clevert; Ola Engkvist; Cheng Fang; Daniel J.  Price; Jacqueline M.  Hughes-Oliver; W. Patrick  Walters

doi:10.26434/chemrxiv-2024-6dbwv

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

Practically significant method comparison protocols for machine learning in small molecule drug discovery.

04 November 2024, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine Learning (ML) methods that relate molecular structure to properties are frequently proposed as in-silico surrogates for expensive or time-consuming experiments. In small molecule drug discovery, such methods inform high-stakes decisions like compound synthesis and in-vivo studies. This application lies at the intersection of multiple scientific disciplines. When comparing new ML methods to baseline or state-of-the-art approaches, statistically rigorous method comparison protocols and domain-appropriate performance metrics are essential to ensure replicability and ultimately the adoption of ML in small molecule drug discovery. This paper proposes a set of guidelines to incentivize rigorous and domain-appropriate techniques for method comparison tailored to small molecule property modeling. These guidelines, accompanied by annotated examples and open-source software tools, lay a foundation for robust ML benchmarking and thus the development of more impactful methods.

Keywords

Method Comparison

Practical Significance

Supplementary weblinks

Title

Description

Actions

Title

Code

Description

All code associated with this paper has been made available in this Github repository.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Nov 07, 2024 Version 2

Nov 04, 2024 Version 1

Metrics

8,342

4,774

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2024-6dbwv

Author’s competing interest statement

All authors were employed by for-profit companies during the writing of this Comment. While there may be financial or non-financial interests related to their employer, the authors affirm their commitment to scientific integrity. The article is presented objectively, and steps were taken to minimize any potential influence from their employment. The corresponding author is available for further inquiries.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Practically significant method comparison protocols for machine learning in small molecule drug discovery.

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share