Combined physics- and machine-learning-based method to identify druggable binding sites using SILCS-Hotspots

Erik Nordquist; Mingtian Zhao; Anmol Kumar; Alex MacKerell

doi:10.26434/chemrxiv-2024-hrqq9

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Combined physics- and machine-learning-based method to identify druggable binding sites using SILCS-Hotspots

25 April 2024, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Identifying druggable binding sites on proteins is an important and challenging problem, particularly for cryptic, allosteric binding sites that may not be obvious from X-ray, cryo-EM, or predicted structures. The Site-Identification by Ligand Competitive Saturation (SILCS) method accounts for the flexibility of the target protein using all-atom molecular simulations that include various small molecule solutes in aqueous solution. During the simulations the combination of protein flexibility and comprehensive sampling of the water and solute spatial distributions can identify buried binding pockets absent in experimentally-determined structures. Previously, we reported a method for leveraging the information in the SILCS sampling to identify binding sites (termed Hotspots) of small mono- or bi-cyclic compounds, a subset of which coincide with known binding sites of drug-like molecules. Here we build in that physics-based approach and present a machine learning model for ranking the Hotspots according to the likelihood they can accommodate drug-like molecules (e.g. molecular weight > 200 daltons). In the independent validation set, which includes various enzymes and receptors, our model recalls 65% and 88% of experimentally-validated ligand binding sites in the top 10 and 20 ranked Hotspots, respectively. Furthermore, we show that the model’s output Decision Function is a useful metric to predict binding sites and their potential druggability in new targets. Given the utility the SILCS method for ligand discovery and optimization the tools presented represent an important advancement in the identification of orthosteric and allosteric binding sites and the discovery of drug-like molecules targeting those sites.

Keywords

Site identification by ligand competitive saturation

protein-ligand interaction

orthosteric

allosteric

computer-aided drug design

CADD

binding site prediction

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Aug 20, 2024 Version 2

Apr 25, 2024 Version 1

Metrics

686

497

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-hrqq9

Funding

NIH Office of the Director

GM131710

NIH Office of the Director

T32CA154274

Author’s competing interest statement

A.D.M. Jr. is co-founder and Chief Scientific Officer of SilcsBio, LLC.

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Combined physics- and machine-learning-based method to identify druggable binding sites using SILCS-Hotspots

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share