Endless Data for Drug Discovery Pipeline Validation for Free – Computational Chemistry’s Gift

Stefan Ivanov

doi:10.26434/chemrxiv-2024-54m5x

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Endless Data for Drug Discovery Pipeline Validation for Free – Computational Chemistry’s Gift

23 October 2024, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Stefan Ivanov

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Modern virtual high-throughput screening (vHTS) pipelines tend to be overmarketed and undervalidated, with no rigorous studies conclusively demonstrating that every one of their steps reliably adds increasing enrichment atop the baseline random hit rate. Moreover, what little benchmarking studies are available primarily focus on the docking aspect of the pipelines, which is usually only the beginning or near the beginning, and even there, authors tend to use flawed data sets that artificially inflate performance metrics. Herein, we present an alternative method to pipeline validation and data set generation that requires no additional experimental work and expenditure, yet offers negative data that is vastly superior both in terms of quality and quantity to any data set used in vHTS pipeline validation up to now. By randomizing ligands across published experimental structures and generating structural isomers of known binders, we can generate practically unlimited amounts of negative data. Such sets of positive and negative data points match closely in molecular properties and are much more suitable for pipeline validation and have far greater evidentiary value than any of the current sets. Once such sets are generated, they are to be run through any proposed pipeline, assessing performance at every step. We stress the importance of using negative data of adequate quality and quantity in validation studies to definitively and verifiably demonstrate the utility of a given tool or workflow. Our goal is to help distinguish tools and pipelines that truly accelerate hit discovery and lead optimization from ones that promise to do so but actually do not, whereupon academia and industry can begin to tackle the many unaddressed medical needs of the 21st century.

Keywords

virtual high-throughput screening

Supplementary materials

Title

Description

Actions

Title

Protein – ligand pairs used in the present study.

Description

PDBbind sheet. The 754 protein – ligand pairs and their binding status (binding/nonbinding) are given. The first row is the 1FCX ligand docked to the protein from the 1FCX crystal structure (redocking), the second is the 1FCX ligand docked to the protein from 1G74 (crossdocking), the third is the 1FCX ligand docked to the protein from 2P3I (crossdocking), etc. MAYGEN Sheet. The 4QSW, 5A5N, 6EPU ligands, and their structural isomers used in the present study, are given in SMILES format.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Oct 28, 2024 Version 2

Oct 23, 2024 Version 1

Metrics

764

232

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2024-54m5x

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Endless Data for Drug Discovery Pipeline Validation for Free – Computational Chemistry’s Gift

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share