Abstract
Make-on-demand chemical libraries have drastically increased the reach of molecular docking, with the enumerated ready-to-dock ZINC library approaching 5 billion molecules. While ever-growing libraries result in better-scoring molecules, the computational resources required to dock all of ZINC make this endeavor infeasible for most. Here, we organize and traverse chemical space with hierarchical navigable small world graphs, a method we term retrieval augmented docking (RAD). RAD recovers most virtual actives despite docking only a fraction of the library. Furthermore, RAD is protein-agnostic, supporting screens against many targets without additional computational overhead. In depth, we assess RAD on published large-scale docking campaigns against D4 and AmpC spanning 99.5 million and 138 million molecules, respectively. RAD recovers 95% of DOCK virtual actives for both targets after evaluating only 10% of the libraries. In breadth, RAD shows widespread applicability against 43 DUDE-Z proteins, evaluating 50.3 million associations. On average, RAD recovers 87% of virtual actives while docking 10% of the library without sacrificing chemical diversity.
Supplementary materials
Title
Supplementary Information
Description
Supplementary Figures 1-10, Supplementary Tables 1-2
Actions
Title
Supplementary Data 1
Description
Supplementary Tables 3-5
Actions
Supplementary weblinks
Title
RAD GitHub Repository
Description
Open-source code repository for the Retrieval Augmented Docking (RAD) package developed in this study. It contains the complete source code and the modified hnswlib library and code for constructing and traversing HNSWs with user-implemented scoring functions.
Actions
View Title
RAD Paper Zenodo Data Repository
Description
DOCK scores for the DUDE-Z "goldilocks" molecules docked to each of the 43 DUDE-Z proteins in the paper.
Actions
View