Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries

Toni Sivula; Laxman Yetukuri; Tuomo Kalliokoski; Heikki Käsnänen; Antti Poso; Ina Pöhner

doi:10.26434/chemrxiv-2023-g34tx

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries

10 February 2023, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The emergence of ultra-large screening libraries, filled to the brim with billions of readily available compounds, poses a growing challenge for docking-based virtual screening. Machine Learning (ML)-boosted strategies like the tool HASTEN combine rapid ML prediction with the brute-force docking of small fractions of such libraries to increase screening throughput and take on giga-scale libraries. In our case study of an anti-bacterial chaperone and an anti-viral kinase, we first generated a brute-force docking baseline for 1.56 billion compounds in the Enamine REAL lead-like library with the fast Glide HTVS protocol. With HASTEN, we observed robust recall of 90% of the true 1000 top-scoring virtual hits in both targets when docking only 1% of the entire library. This reduction of the required docking experiments by 99% significantly shortens the screening time.In the kinase target, the employment of a hydrogen bonding constraint resulted in a major proportion of unsuccessful docking attempts and hampered ML predictions. We demonstrate the optimization potential in the treatment of failed compounds when performing ML-boosted screening and showcase HASTEN as a fast and robust tool in a growing arsenal of approaches to unlock the chemical space covered by giga-scale screening libraries for everyday drug discovery campaigns.

Keywords

Machine Learning

Virtual screening

ultra-large scale docking

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Supporting Figures S1-S5. Supporting Tables S1-S6. Summary of utilized Chemprop parameters.

Actions

Supplementary weblinks

Title

Description

Actions

Title

Schrodinger Phase databases for Enamine REAL lead-like library of 1.56 billion compounds (March 2021)

Description

A collection of Phase databases created from the Enamine REAL lead-like library as downloaded in March 2021 (1.56 billion compounds).

Actions

View

Title

Glide HTVS docking results of Enamine REAL lead-like library (1.56 billion compounds) for two targets

Description

Docking results for 1.56 billion compounds of the Enamine REAL lead-like library (obtained March 2021) for the targets SurA and GAK. The intended use of the data is to serve as a giga-scale benchmarking dataset, e.g. for machine learning approaches.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Aug 07, 2023 Version 2

Feb 10, 2023 Version 1

Metrics

1,477

564

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-g34tx

Funding

Academy of Finland

336473

Academy of Finland

333191

Jane and Aatos Erkko Foundation

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share