Enabling Open Machine Learning of DNA Encoded Library Selections to Accelerate the Discovery of Small Molecule Protein Binders

James Wellnitz; Shabbir  Ahmad; Nabin  Begale; Jermiah  Joseph; Hong Zeng; Albina Bolotokova; Aiping Dong; Shaghayegh  Reza; Pegah Ghiabi; Gibson Elisa; Xuemin Cheng; Guiping Tu; Xianyang Li; Jian Liu; Dengfeng Dou; Jin Li; Rachel  J. Harding; Aled  M. Edwards; Benjamin  Haibe-Kains; Levon  Halabelian; Alexander  Tropsha; Rafael Couñago

doi:10.26434/chemrxiv-2024-xd385

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

Enabling Open Machine Learning of DNA Encoded Library Selections to Accelerate the Discovery of Small Molecule Protein Binders

18 October 2024, Version 1

Working Paper

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Recent advances in DNA-encoded library (DEL) screening have created bioactivity datasets containing billions of molecules, unlocking new opportunities for machine learning (ML) in drug discovery. However, most ultra-large DEL libraries are proprietary, limiting the advancement of ML tools for big chemical data analytics and hindering the democratization of DEL-ML technology. We address this gap by developing an open, end-to-end DEL-ML framework using public datasets, where enriched binders are represented by common chemical fingerprints, ensuring proprietary data protection. We demonstrate that ML models can be built and validated on fingerprinted DEL data and then applied to virtual screening (VS) of billion-sized, publicly accessible chemical libraries. As a proof-of-concept, we screened the human protein WDR91 using the HitGen OpenDEL library (3 billion molecules) and trained ML models, which were used to screen the Enamine REAL Space library (37 billion molecules). Fifty potential binders were identified, 48 of which were tested, and seven were confirmed as novel binders with dissociation constants (KD) from 2.7 to 21 μM that were successfully co-crystalized with WDR91. This fully automated, open-source workflow demonstrates the potential of DEL-ML models in discovering novel binders and promotes the use of open chemical bioactivity datasets and ML to accelerate drug discovery.

Keywords

machine learning

DNA-encoded libraries

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

Table S1, Table S2, Table S3, Table S4, Table S5, Table S6, Figure S1, Figure S2, Supplementary References

Actions

Supplementary weblinks

Title

Description

Actions

Title

Modeling and nomination pipeline code

Description

Modeling and nomination pipeline code

Actions

View

Title

WDR91 DEL dataset

Description

Full WDR91 DEL dataset.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Oct 18, 2024 Version 1

Metrics

2,216

970

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2024-xd385

Funding

Innovative Medicines Initiative

875510

National Institute of Health Sciences

GM140154

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Enabling Open Machine Learning of DNA Encoded Library Selections to Accelerate the Discovery of Small Molecule Protein Binders

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share