Abstract
Recent advances in DNA-encoded library (DEL) screening have created bioactivity datasets containing billions of molecules, unlocking new opportunities for machine learning (ML) in drug discovery. However, most ultra-large DEL libraries are proprietary, limiting the advancement of ML tools for big chemical data analytics and hindering the democratization of DEL-ML technology. We address this gap by developing an open, end-to-end DEL-ML framework using public datasets, where enriched binders are represented by common chemical fingerprints, ensuring proprietary data protection. We demonstrate that ML models can be built and validated on fingerprinted DEL data and then applied to virtual screening (VS) of billion-sized, publicly accessible chemical libraries. As a proof-of-concept, we screened the human protein WDR91 using the HitGen OpenDEL library (3 billion molecules) and trained ML models, which were used to screen the Enamine REAL Space library (37 billion molecules). Fifty potential binders were identified, 48 of which were tested, and seven were confirmed as novel binders with dissociation constants (KD) from 2.7 to 21 μM that were successfully co-crystalized with WDR91. This fully automated, open-source workflow demonstrates the potential of DEL-ML models in discovering novel binders and promotes the use of open chemical bioactivity datasets and ML to accelerate drug discovery.
Supplementary materials
Title
Supplementary Information
Description
Table S1, Table S2, Table S3, Table S4, Table S5, Table S6, Figure S1, Figure S2, Supplementary References
Actions