Abstract
Software to more rapidly and accurately predict protein--ligand binding affinities is of high interest for early-stage drug discovery, and physics-based methods are among the most widely used technologies for this purpose. The accuracy of these methods depends critically on the accuracy of the potential functions they use. Potential functions are typically trained against a combination of quantum chemical and experimental data. However, although binding affinities are among the most important quantities to predict, experimental binding affinities have not to date been integrated into the experimental dataset used to train potential functions. In recent years, the use of host--guest complexes as simple and tractable models of binding thermodynamics has gained popularity due to their small size and simplicity, relative to protein--ligand systems. Host--guest complexes can also avoid ambiguities that arise in protein--ligand systems, such as uncertain protonation states. Thus, experimental host--guest binding data are an appealing additional data type to integrate into the experimental dataset used to optimize potential functions. Here, we report the extension of the Open Force Field Evaluator framework to enable the systematic calculation of host--guest binding free energies and their gradients with respect to force field parameters, coupled with the curation of 126 host--guest complexes with available experimental binding free energies. As an initial application of this novel infrastructure, we optimized generalized Born (GB) cavity radii for the OBC2 GB implicit solvent model against experimental data for 36 host--guest systems. This refitting led to a dramatic improvement in accuracy for both the training set and a separate test set with 90 additional host--guest systems. The optimized radii also showed encouraging transferability from host--guest systems to 59 protein-ligand systems. However, the new radii are significantly smaller than the baseline radii and lead to excessively favorable hydration free energies (HFE). Thus, users of the OBC2 GB model currently may choose between GB cavity radii that yield more accurate binding affinities or GB cavity radii that yield more accurate HFEs. We suspect that achieving good accuracy on both will require more far-reaching adjustments to the GB model. We note that binding free energy calculations using the OBC2 model in OpenMM gain about a 10x speedup relative to corresponding explicit solvent calculations, suggesting a future role for implicit solvent absolute binding free energy (ABFE) calculations in virtual compound screening. This study proves the principle of using host--guest systems to train potential functions that are transferrable to protein--ligand systems, and provides an infrastructure that enables a range of applications.
Supplementary materials
Title
Supplementary Information
Description
Supplementary information that includes tables summarizing the host-guest and protein-ligand binding free energies, hydration free energies, and figures of the test data set.
Actions
Supplementary weblinks
Title
Host-guest ABFE calculation with gradients with respect to FF parameters
Description
This fork of OpenFF Evaluator was created for the present work.
Actions
View Title
Python tool for ABFE calculations
Description
Implements the Attach-Pull-Release method. Additional documentation here: https://readthedocs.org/projects/paprika/
Actions
View Title
Repository for running force field optimization fitted to host-guest binding data using OpenFF-Evaluator and ForceBalance
Description
This repository contains the files for running force field optimization fitted to host-guest binding data using OpenFF-Evaluator and ForceBalance. This is part of the work of optimizing generalized Born surface area (GBSA) parameters to host-guest systems, detailed in the paper titled "Tuning Potential Functions to Host-Guest Binding Data".
Actions
View